pandas 的拼接merge和concat函数小结

pandas中数据的合并方案主要有concat,merge,join等函数。

其中concat主要是根据索引进行行或列的拼接，只能取行或列的交集或并集。
merge主要是根据共同列或者索引进行合并，可以取内连接，左连接、右连接、外连接等。
join的功能跟merge类似，因此不再赘述。

import pandas as pd
from pandas import Series,DataFrame
# 定义一个函数，根据行和列名对元素设置值
def make_df(cols,inds):
    data = {c:[c+str(i) for i in inds] for c in cols}
    return DataFrame(data,index=inds)

df1 = make_df(list("abc"),[1,2,4])
df1

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	a	b	c
1	a1	b1	c1
2	a2	b2	c2
4	a4	b4	c4

df2 = make_df(list("abcd"),[2,4,6])
df2

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	a	b	c	d
2	a2	b2	c2	d2
4	a4	b4	c4	d4
6	a6	b6	c6	d6

df11=df1.set_index('a')
df22=df2.set_index('a')

1. concat函数

axis :默认为0，为按行拼接；1 为按列拼接
ignore_index: 默认为False,会根据索引进行拼接；True 则会忽略原有索引，重建新索引
join: 为拼接方式，包括 inner,outer
sort: True 表示按索引排序

(1) 简单的按索引的行列拼接

# 按行拼接
pd.concat([df1,df2],sort=False)

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	a	b	c	d
1	a1	b1	c1	NaN
2	a2	b2	c2	NaN
4	a4	b4	c4	NaN
2	a2	b2	c2	d2
5	a5	b5	c5	d5
6	a6	b6	c6	d6

# 按列拼接
pd.concat([df1,df2],axis=1)

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	a	b	c	a	b	c	d
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

(2）去掉原索引的拼接

# 按行拼接，去掉原来的行索引重新索引
pd.concat([df1,df2],sort=False,ignore_index=True)

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	a	b	c	d
0	a1	b1	c1	NaN
1	a2	b2	c2	NaN
2	a4	b4	c4	NaN
3	a2	b2	c2	d2
4	a5	b5	c5	d5
5	a6	b6	c6	d6

# 按列拼接，去掉原来的列索引重新索引
pd.concat([df1,df2],axis=1,ignore_index=True)

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	0	1	2	3	4	5	6
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

(3)指定连接方式的拼接

拼接方式有 inner,outer

# 交集,inner join
pd.concat([df1,df2],sort=False,join='inner')

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	a	b	c
1	a1	b1	c1
2	a2	b2	c2
4	a4	b4	c4
2	a2	b2	c2
5	a5	b5	c5
6	a6	b6	c6

# 并集,outer join
pd.concat([df1,df2],sort=False,join='outer')

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	a	b	c	d
1	a1	b1	c1	NaN
2	a2	b2	c2	NaN
4	a4	b4	c4	NaN
2	a2	b2	c2	d2
5	a5	b5	c5	d5
6	a6	b6	c6	d6

2.merge函数

how：数据合并的方式。left：基于左dataframe列的数据合并；right：基于右dataframe列的数据合并；outer：基于列的数据外合并（取并集）；inner：基于列的数据内合并（取交集）；默认为’inner’。
on：基于相同列的合并
left_on/right_on：左/右dataframe合并的列名。
left_index/right_index：是否以index作为数据合并的列名，True表示是。可与left_on/right_on合并使用
sort：根据dataframe合并的keys排序，默认是。
suffixes：若有相同列且该列没有作为合并的列，可通过suffixes设置该列的后缀名，一般为元组和列表类型。

(1) 基于相同列的合并

df3 = pd.merge(df1,df2,how='inner',on='a')        # 基于单列的合并
df4 = pd.merge(df1,df2,how='inner',on=['a','b'])  # 基于多列的合并
df5 = pd.merge(df1,df2,how='left',on='a',suffixes=['_1','_2']) # 左连接,且指定后缀
df5

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	a	b_1	c_1	b_2	c_2	d
0	a1	b1	c1	NaN	NaN	NaN
1	a2	b2	c2	b2	c2	d2
2	a4	b4	c4	b4	c4	d4

(2) 基于不同列名，或者列和索引，或者索引和索引间的合并

df6 = pd.merge(df1,df2,how='inner',left_on='a',right_on='b')             # 基于不同列名
df7 = pd.merge(df1,df22,how='inner',left_on='a',right_index=True)        #基于列和索引
df8 = pd.merge(df1,df2,how='inner',left_index=True,right_index=True)    #基于两边都是索引
df8

.dataframe tbody tr th:only-of-type { vertical-align: middle }
{ vertical-align: top }
.dataframe thead th { text-align: right }

	a_x	b_x	c_x	a_y	b_y	c_y	d
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	a4	b4	c4	d4

pandas 的拼接merge和concat函数小结

1. concat函数

(1) 简单的按索引的行列拼接

(2）去掉原索引的拼接

(3)指定连接方式的拼接

2.merge函数

(1) 基于相同列的合并

(2) 基于不同列名，或者列和索引，或者索引和索引间的合并

Word 固定文字或签名在底部

POI设置Excel单元格样式

最新文章

请问中国电信来安装路由器怎么样路由器是怎么安装和设置

如何设置路由器让别人进不去怎样设置路由器不让别人使用

link的路由总是断线怎么回事这个路由为什么老是掉线

小米路由器里面的智能限速怎么设置路由器qos智能限速如何设置

路由器不用了怎么关闭路由器登陆后不能退出怎么办

夏普确认旗下大型 LCD 面板厂堺显示器停产至九月底

消息称因 EQE 和 EQS 销量不佳，奔驰暂停研发大型豪华电动汽车平台

腾讯发布2024 Q1财报：营收1595亿元净利润大涨54%

零跑国际合资公司今日成立，联手巨头 Stellantis 集团挺进欧洲汽车市场

美国联邦政府向高压半导体企业 Polar Semiconductor 提供至多 1.2 亿美元补贴

赞助商推荐

标签

关注我们么么哒！

	a	b	c	a	b	c	d
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

	0	1	2	3	4	5	6
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

	a	b	c	a	b	c	d
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

	0	1	2	3	4	5	6
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

pandas 的拼接merge和concat函数小结

1. concat函数

(1) 简单的按索引的行列拼接

(2）去掉原索引的拼接

(3)指定连接方式的拼接

2.merge函数

(1) 基于相同列的合并

(2) 基于不同列名，或者列和索引，或者索引和索引间的合并

Word 固定文字或签名在底部

POI设置Excel单元格样式

最新文章

请问中国电信来安装路由器怎么样路由器是怎么安装和设置

赞助商推荐

标签

关注我们 么么哒！

关注我们的公众号

关注我们么么哒！

	a	b	c	a	b	c	d
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

	0	1	2	3	4	5	6
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6