Pandas 连接

Pandas 连接的操作实例

Pandas具有与SQL等关系数据库非常相似的功能齐全的高性能内存中连接操作。
Pandas提供单个功能merge作为DataFrame对象之间所有标准数据库联接操作的入口点

pd.merge(left,right,how='inner',on=None,left_on=None,right_on=None,left_index=False,right_index=False,sort=True)

在这里,我们使用了以下参数:

left − 一个DataFrame对象。right − 另一个DataFrame对象。on − 列(名)加入上。必须在左右DataFrame对象中都找到。left_on − 左侧DataFrame中的列用作键。可以是列名,也可以是长度等于DataFrame长度的数组。right_on − 右侧DataFrame中的列用作键。可以是列名,也可以是长度等于DataFrame长度的数组。left_index − 如果为True,则使用左侧DataFrame的索引(行标签)作为其连接键。如果DataFrame具有MultiIndex(分层),则级别数必须与右侧DataFrame中的连接键数匹配。right_index − 相同的使用作为left_index为正确的数据帧。how − “左”,“右”,“外”,“内”之一。默认为内部。每种方法已在下面描述。sort − 排序的结果数据框中加入字典顺序按键。默认情况下为True,在许多情况下,设置为False将大大提高性能。

现在让我们创建两个不同的DataFrame并对其执行合并操作。

#importthepandaslibraryimportpandasaspdleft=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Alex','Amy','Allen','Alice','Ayoung'],'subject_id':['sub1','sub2','sub4','sub6','sub5']})right=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Billy','Brian','Bran','Bryce','Betty'],'subject_id':['sub2','sub4','sub3','sub6','sub5']}))print(leftprint(right)

运行结果如下:

Nameidsubject_id0Alex1sub11Amy2sub22Allen3sub43Alice4sub64Ayoung5sub5Nameidsubject_id0Billy1sub21Brian2sub42Bran3sub33Bryce4sub64Betty5sub5

在一个键上合并两个数据框

importpandasaspdleft=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Alex','Amy','Allen','Alice','Ayoung'],'subject_id':['sub1','sub2','sub4','sub6','sub5']})right=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Billy','Brian','Bran','Bryce','Betty'],'subject_id':['sub2','sub4','sub3','sub6','sub5']})print(pd.merge(left,right,on='id'))

运行结果如下:

Name_xidsubject_id_xName_ysubject_id_y0Alex1sub1Billysub21Amy2sub2Briansub42Allen3sub4Bransub33Alice4sub6Brycesub64Ayoung5sub5Bettysub5

在多个键上合并两个数据框

importpandasaspdleft=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Alex','Amy','Allen','Alice','Ayoung'],'subject_id':['sub1','sub2','sub4','sub6','sub5']})right=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Billy','Brian','Bran','Bryce','Betty'],'subject_id':['sub2','sub4','sub3','sub6','sub5']})print(pd.merge(left,right,on=['id','subject_id']))

运行结果如下:

Name_xidsubject_idName_y0Alice4sub6Bryce1Ayoung5sub5Betty

合并使用“how”参数

合并的how参数指定如何确定要在结果表中包括哪些键。如果左侧或右侧表中均未出现组合键,则联接表中的值为NA。

这里的一个总结如何选择和他们的SQL等价的名字:

合并方法SQL等效描述
leftLEFT OUTER JOIN使用左侧对象的key
rightRIGHT OUTER JOIN使用正确对象的key
outerFULL OUTER JOIN使用联合key
innerINNER JOIN使用key的交集

左连接

importpandasaspdleft=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Alex','Amy','Allen','Alice','Ayoung'],'subject_id':['sub1','sub2','sub4','sub6','sub5']})right=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Billy','Brian','Bran','Bryce','Betty'],'subject_id':['sub2','sub4','sub3','sub6','sub5']})print(pd.merge(left,right,on='subject_id',how='left'))

运行结果如下:

Name_xid_xsubject_idName_yid_y0Alex1sub1NaNNaN1Amy2sub2Billy1.02Allen3sub4Brian2.03Alice4sub6Bryce4.04Ayoung5sub5Betty5.0

右连接

importpandasaspdleft=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Alex','Amy','Allen','Alice','Ayoung'],'subject_id':['sub1','sub2','sub4','sub6','sub5']})right=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Billy','Brian','Bran','Bryce','Betty'],'subject_id':['sub2','sub4','sub3','sub6','sub5']})print(pd.merge(left,right,on='subject_id',how='right'))

运行结果如下:

Name_xid_xsubject_idName_yid_y0Amy2.0sub2Billy11Allen3.0sub4Brian22Alice4.0sub6Bryce43Ayoung5.0sub5Betty54NaNNaNsub3Bran3

外连接

importpandasaspdleft=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Alex','Amy','Allen','Alice','Ayoung'],'subject_id':['sub1','sub2','sub4','sub6','sub5']})right=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Billy','Brian','Bran','Bryce','Betty'],'subject_id':['sub2','sub4','sub3','sub6','sub5']})print(pd.merge(left,right,how='outer',on='subject_id'))

运行结果如下:

Name_xid_xsubject_idName_yid_y0Alex1.0sub1NaNNaN1Amy2.0sub2Billy1.02Allen3.0sub4Brian2.03Alice4.0sub6Bryce4.04Ayoung5.0sub5Betty5.05NaNNaNsub3Bran3.0

内连接

连接将在索引上执行。联接操作接受调用它的对象。因此,a.join(b)不等于b.join(a)。

importpandasaspdleft=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Alex','Amy','Allen','Alice','Ayoung'],'subject_id':['sub1','sub2','sub4','sub6','sub5']})right=pd.DataFrame({'id':[1,2,3,4,5],'Name':['Billy','Brian','Bran','Bryce','Betty'],'subject_id':['sub2','sub4','sub3','sub6','sub5']})print(pd.merge(left,right,on='subject_id',how='inner'))

运行结果如下:

Name_xid_xsubject_idName_yid_y0Amy2sub2Billy11Allen3sub4Brian22Alice4sub6Bryce43Ayoung5sub5Betty5
编辑于2024-05-20 14:47