20240115星期一:
# 使用how参数合并
# 通过how参数可以确定 DataFrame 中要包含哪些键,如果在左表、右表都不存的键,那么合并后该键对应的值为 NaN。为了便于大家学习,我们将 how 参数和与其等价的 SQL 语句做了总结:
# Merge方法 等效 SQL 描述
# left LEFT OUTER JOIN 使用左侧对象的key
# right RIGHT OUTER JOIN 使用右侧对象的key
# outer FULL OUTER JOIN 使用左右两侧所有key的并集
# inner INNER JOIN 使用左右两侧key的交集
# 1) left join
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4],
'Name': ['Smith', 'Maiki', 'Hunter', 'Hilen'],
'subject_id':['sub1','sub2','sub4','sub6']})
right = pd.DataFrame({
'id':[1,2,3,4],
'Name': ['Bill', 'Lucy', 'Jack', 'Mike'],
'subject_id':['sub2','sub4','sub3','sub6']})
#以left侧的subject_id为键
print(pd.merge(left,right,on='subject_id',how="left"))
'''
id_x Name_x subject_id id_y Name_y
0 1 Smith sub1 NaN NaN
1 2 Maiki sub2 1.0 Bill
2 3 Hunter sub4 2.0 Lucy
3 4 Hilen sub6 4.0 Mike
'''
# 2) right join
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4],
'Name': ['Smith', 'Maiki', 'Hunter', 'Hilen'],
'subject_id':['sub1','sub2','sub4','sub6']})
right = pd.DataFrame({
'id':[1,2,3,4],
'Name': ['Bill', 'Lucy', 'Jack', 'Mike'],
'subject_id':['sub2','sub4','sub3','sub6']})
#以right侧的subject_id为键
print(pd.merge(left,right,on='subject_id',how="right"))
'''
id_x Name_x subject_id id_y Name_y
0 2.0 Maiki sub2 1 Bill
1 3.0 Hunter sub4 2 Lucy
2 4.0 Hilen sub6 4 Mike
3 NaN NaN sub3 3 Jack
'''
# 3) outer join(并集)
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4],
'Name': ['Smith', 'Maiki', 'Hunter', 'Hilen'],
'subject_id':['sub1','sub2','sub4','sub6']})
right = pd.DataFrame({
'id':[1,2,3,4],
'Name': ['Bill', 'Lucy', 'Jack', 'Mike'],
'subject_id':['sub2','sub4','sub3','sub6']})
#求出两个subject_id的并集,并作为键
print(pd.merge(left,right,on='subject_id',how="outer"))
'''
id_x Name_x subject_id id_y Name_y
0 1.0 Smith sub1 NaN NaN
1 2.0 Maiki sub2 1.0 Bill
2 3.0 Hunter sub4 2.0 Lucy
3 4.0 Hilen sub6 4.0 Mike
4 NaN NaN sub3 3.0 Jack
'''
# 4) inner join(交集)
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4],
'Name': ['Smith', 'Maiki', 'Hunter', 'Hilen'],
'subject_id':['sub1','sub2','sub4','sub6']})
right = pd.DataFrame({
'id':[1,2,3,4],
'Name': ['Bill', 'Lucy', 'Jack', 'Mike'],
'subject_id':['sub2','sub4','sub3','sub6']})
#求出两个subject_id的交集,并将结果作为键
print(pd.merge(left,right,on='subject_id',how="inner"))
'''
id_x Name_x subject_id id_y Name_y
0 2 Maiki sub2 1 Bill
1 3 Hunter sub4 2 Lucy
2 4 Hilen sub6 4 Mike
'''