玖叶教程网

前端编程开发入门

20pandas.merge: 合并数据

pd.merge( )

Definition : 
merge(left: 'DataFrame | Series', right: 'DataFrame | Series', 
how: 'str'='inner', 
on: 'IndexLabel | None'=None, 
left_on: 'IndexLabel | None'=None, 
right_on: 'IndexLabel | None'=None, 
left_index: 'bool'=False, 
right_index: 'bool'=False, 
sort: 'bool'=False, 
suffixes: 'Suffixes'=('_x', '_y'), 
copy: 'bool'=True, 
indicator: 'bool'=False, 
validate: 'str | None'=None) -> 'DataFrame'

参数含义

left:待链接的左侧数据集。

right:待链接的右侧数据集。

how:左右数据集的连接方式。可选‘left’、‘right’、‘outer’、‘inner’,'cross',默认为inner。

on:左右两个待链接数据集有共同列名,且按该列链接两个数据集合时使用该参数。

left_on:链接两个数据集时,左数据集对应连接关键字(可为列表)。

right_on:链接两个数据集时,右数据集对应连接关键字(可为列表)。

left_index:若为True,则按左数据集的索引连接两个数据集。

right_index:若为True,则按右数据集的索引连接两个数据集。

sort:对结果数据集进行排序。

suffixes:为左右数据集中重复列名定义后缀。默认加('_x','_y')。

Merge method

Merge

method

SQL Join Name

Description

left

LEFT OUTER JOIN

Use keys from left frame only

right

RIGHT OUTER JOIN

Use keys from right frame only

outer

FULL OUTER JOIN

Use union of keys from both frames

inner

INNER JOIN

Use intersection of keys from both frames

Examples

import pandas as pd
left = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2", "K3"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
    }
)
right = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2", "K3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)
result = pd.merge(left, right, on="key" ,how= 'inner')
left = pd.DataFrame(
    {
        "key1": ["K0", "K0", "K1", "K2"],
        "key2": ["K0", "K1", "K0", "K1"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
    }
)
right = pd.DataFrame(
    {
        "key1": ["K0", "K1", "K1", "K2"],
        "key2": ["K0", "K0", "K0", "K0"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)
result = pd.merge(left, right, how="left",  on=["key1", "key2"])
result = pd.merge(left, right, how="right", on=["key1", "key2"])
result = pd.merge(left, right, how="outer", on=["key1", "key2"])
result = pd.merge(left, right, how="inner", on=["key1", "key2"])

DataFrame与Series链接

df = pd.DataFrame({"Let": ["A", "B", "C"], "Num": [1, 2, 3]})
ser = pd.Series(
    ["a", "b", "c", "d", "e", "f"],
    index=pd.MultiIndex.from_arrays(
        [["A", "B", "C"] * 2, [1, 2, 3, 4, 5, 6]], names=["Let", "Num"]
    ),
)
pd.merge(df, ser.reset_index(), on=["Let", "Num"])

检查重复关键字

left = pd.DataFrame({"A": [1, 2], "B": [2, 2]})
right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]})
result = pd.merge(left, right, on="B", how="outer")
left = pd.DataFrame({"A": [1, 2], "B": [1, 2]})
right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]})
result = pd.merge(left, right, on="B", how="outer", validate="one_to_one")
pd.merge(left, right, on="B", how="outer", validate="one_to_many")

合并指示器

df1 = pd.DataFrame({"col1": [0, 1], "col_left": ["a", "b"]})
df2 = pd.DataFrame({"col1": [1, 2, 2], "col_right": [2, 2, 2]})
pd.merge(df1, df2, on="col1", how="outer", indicator=True)
pd.merge(df1, df2, on="col1", how="outer", indicator="indicator_column")

总结

根据不同的使用场景,设置不同的参数值,达到预期的数据集链接目的。此处仅仅是个人的学习笔记,还有很多灵活的用法没有列举出来,感兴趣的可以深入操练。同时,强烈建议您要多读pandas官方文档,很多用法均来自官方文档,毕竟“您要学会打猎”。

关于数据集链接,我之前的文章也有介绍,建议您系统性的浏览:

有关其他主题文章,也可以关注我的微信公众号“曾老师会计与统计”查看。

发表评论:

控制面板
您好,欢迎到访网站!
  查看权限
网站分类
最新留言