我有一张桌子,看起来像这样:
Bank Our Credit Rating External Credit Rating Deviation
A 11 12 1
D 10 8 2
A 4 4 0
B 6 7 1
C 12 11 1
A 9 10 1将提取所有偏差之和为>=50的银行。我也通过上面给出的代码做了同样的事情。
输出:
[IN]
workbbok = pd.read_csv("Credit_Rating_comparison.csv")
df33= workbook.groupby('Bank').aggregate({"Deviation":np.sum})
df44=df33[df33['Deviation']>=50]
[OUT]
Bank Deviation
B 68.0
A 72.0
and so on for the relevant banks. (Basically sum of all deviations for
one bank where sum of all deviations is at least 50)我是,无法访问第1列,这是df44中所有银行的名称。
[IN]: df44.columns
[OUT]: Index(['Deviation'], dtype='object')
[IN]: df44.iloc[:,0]
[OUT]
Bank
B 68.0
A 72.0
#Using df44.iloc[:,0] doesnt give column name deviation also and
returns deviation results along with Bank name. I want only bank names list. 基本上,我只需要一个银行名称的列表(没有偏差之和),这样我就可以在下面的操作中进一步使用该列表。
在得到所有银行的名称后,我需要找到偏离列的频率分布。
下面的代码给出了对应于所有行的频率bin。我希望只提取银行名称在in 44‘bank’中的行。任何帮助都将不胜感激。
[IN]:
bins = [0, 1,2,3,4,5]
workbook['Deviation Bins'] = pd.cut(workbook['Deviation'], bins,
include_lowest =True)
workbook
[OUT]:
Bank Our Credit Rating External Credit Rating Deviation Deviation Bins
A 11 12 1 (-inf.,1]
D 10 8 2 (1,2]
A 4 4 0 (-inf.,1]
B 6 7 1 (-inf.,1]
C 12 11 1 (-inf.,1]
A 9 10 1 (-inf.,1]发布于 2019-09-28 17:27:35
应用.aggregate()时,组将进入返回数据帧的索引,而不是列。您可以做的是将索引转换为新列,例如:
df33['Bank'] = df33.index然后,你可以过滤掉兴趣集团:
df44=df33[df33['Deviation']>=50]对于第二部分,您需要使用.isin()
workbook[workbook['Bank'].isin(df44['Bank'])]https://stackoverflow.com/questions/58148118
复制相似问题