我在csv中有下面的数据。
dataCenter,customer,companyID,UID,uba
dc1,customer1,companyID1,uid1,"uba1,uba2,uba3,uba4"
dc2,customer2,companyID1,uid2,"ubaA"
dc3,customer3,companyID3,uid3,"uba1,uba4"
dc4,customer4,companyID4,uid4,"uba1,uba2,uba5,uba6,uba10"现在,我想将数据转换为下面的格式,将'uba‘列中的多个值分配给其他新列。
dataCenter,customer,companyID,UID,action1,action2,action3,action4,action5,...,
dc1,customer1,companyID1,uid1,uba1,uba2,uba3,uba4
dc2,customer2,companyID1,uid2,uba
dc3,customer3,companyID3,uid3,uba1,uba4
dc4,customer4,companyID4,uid4,uba1,uba2,uba5,uba6,uba10,uba11,uba12,uba13我在下面试过了,但不起作用。
a = a.explode('uba')
a = pd.concat([a,pd.DataFrame(a.pop('uba').tolist(),index=a.index)],axis=1)我不想使用str.split('',expand=True),因为当数据很大时,性能确实很差。
对我来说,还有其他好的选择吗?
发布于 2022-01-14 06:58:14
如果做到了,希望使用str.split('',expand=True),并且没有丢失的值是可能的,请使用列表理解:
a = pd.concat([a,pd.DataFrame([x.split(',') for x in a.pop('uba')],index=a.index).add_prefix('action')],axis=1)
print (a)
dataCenter customer companyID UID action0 action1 action2 action3 \
0 dc1 customer1 companyID1 uid1 uba1 uba2 uba3 uba4
1 dc2 customer2 companyID1 uid2 ubaA None None None
2 dc3 customer3 companyID3 uid3 uba1 uba4 None None
3 dc4 customer4 companyID4 uid4 uba1 uba2 uba5 uba6
action4
0 None
1 None
2 None
3 uba10 编辑:要处理第一个N值,请使用:
N = 2
a = pd.concat([a,pd.DataFrame([x.split(',', N)[:N] for x in a.pop('uba')],index=a.index).add_prefix('action')],axis=1)
print (a)
dataCenter customer companyID UID action0 action1
0 dc1 customer1 companyID1 uid1 uba1 uba2
1 dc2 customer2 companyID1 uid2 ubaA None
2 dc3 customer3 companyID3 uid3 uba1 uba4
3 dc4 customer4 companyID4 uid4 uba1 uba2https://stackoverflow.com/questions/70706988
复制相似问题