假设一个人有一张表总结了这个星球上几个人的忙碌生活.
import pandas as pd
import numpy as np
import datetime as dt
from datetime import datetime as dt
t=pd.Timestamp
lu = pd.DataFrame({ 'name' : ['Bill','Elon','Larry','Jeff','Marissa'],
'feels' : ['charitable','Alcoa envy','Elon envy','like the number 7','sassy'],
'last ate' : [t('20151209'),t('20151201'),t('20151208'),t('20151208'),t('20151209')],
'boxers' : [True,True,True,False,True]})说一个人也知道这些人住在哪里当他们做了某些事情..。
af = pd.DataFrame({ 'name' : ['Bill','Elon','Larry','Elon','Jeff','Larry','Larry'],
'address' : ['in my computer','moon','internet','mars','cardboard box','autonomous car','every where'],
'sq_ft' : [2,2135,69,84535, 1.32, 54,168],
'forks' : [7,1,2,1,0,np.nan,1]})
rand_dates=[t('20141202'),t('20130804'),t('20120508'),t('20150411'),
t('20141209'),t('20091023'),t('20130921'),t('20110102'),
t('20130728'),t('20141119'),t('20151024'),t('20130824')]
df = pd.DataFrame({ 'name' : ['Elon','Bill','Larry','Elon','Jeff','Larry','Larry','Bill','Larry','Elon','Marissa','Jeff'],
'activity' : ['slept','tripped','spoke','swam','spooked','liked','whistled','up dog','smiled','donated','grant men paternity leave','fondled'],
'date' : rand_dates})人们可以将这些人按他们居住的地址排列如下:
af.name.value_counts()
Larry 3
Elon 2
Jeff 1
Bill 1需要1:使用上面的排名,如何创建一个由查找表lu中的信息组成的新的“排名”数据?简单地说,一个人是如何做展览A的?
# Exhibit A
boxers feels last ate name addresses
0 True Elon envy 2015-12-08 Larry 3
1 True Alcoa envy 2015-12-01 Elon 2
2 False like the number 7 2015-12-08 Jeff 1
3 True charitable 2015-12-09 Bill 1需要2:观察下面的groupby操作的输出。如何确定最古老的日期和最新的日期之间的时间三角洲,根据这样的时间三角洲排名的成员?简单地说,一个人如何从群中获得展示D?
df.groupby(['name','date']).size()
name date
Bill 2011-01-02 1
2013-08-04 1
Elon 2014-11-19 1
2014-12-02 1
2015-04-11 1
Jeff 2013-08-24 1
2014-12-09 1
Larry 2009-10-23 1
2012-05-08 1
2013-07-28 1
2013-09-21 1
Marissa 2015-10-24 1#Exhibit B - Calculate time deltas
name time_delta
Bill Timedelta('945 days 00:00:00')
Elon Timedelta('143 days 00:00:00')
Jeff Timedelta('472 days 00:00:00')
Larry Timedelta('1429 days 00:00:00')
Marissa Timedelta('0 days 00:00:00')
#Exhibit C - Rank time deltas (this is easy)
name time_delta
Larry Timedelta('1429 days 00:00:00')
Bill Timedelta('945 days 00:00:00')
Jeff Timedelta('472 days 00:00:00')
Elon Timedelta('143 days 00:00:00')
Marissa Timedelta('0 days 00:00:00')
#Exhibit D - Add to and re-rank the table built in Exhibit A according to time_delta
boxers feels last ate name addresses time_delta
0 True Elon envy 2015-12-08 Larry 3 1429 days 00:00:00
1 True charitable 2015-12-09 Bill 1 945 days 00:00:00
2 False like the number 7 2015-12-08 Jeff 1 472 days 00:00:00
3 True Alcoa envy 2015-12-01 Elon 2 143 days 00:00:00
4 True sassy 2015-12-09 Marissa NaN 0 days 00:00:00Prior Research:,这是关于使用groupby和transform获取最大值的文章。和另一篇文章是关于查找和选择最频繁的数据。提供了丰富的信息,但不要在系列剧中工作(count_values()的结果),也不要把我绊倒.实际上,我已经完成了第一部分的工作,但是代码是错误的,而且可能效率很低。
轻松编写代码共享,请查看此IPython笔记本,它列出了所有的内容。否则,请查看Python2.7 代码在这里。
发布于 2015-12-10 08:38:13
我想你可以用join,sort_values。聚合在文档中。
#join value count to lu dataframe, renaming ans sorting
Exhibit_A = lu.set_index('name').join(af.name.value_counts()).rename(columns={'name': 'addresses'}).sort_values('addresses', ascending=False)
#drop rows with NaN, reset index
print Exhibit_A.dropna().reset_index()
name boxers feels last ate addresses
0 Larry True Elon envy 2015-12-08 3
1 Elon True Alcoa envy 2015-12-01 2
2 Bill True charitable 2015-12-09 1
3 Jeff False like the number 7 2015-12-08 1#aggregate to min and max date
g = df.groupby(['name']).agg({'date' : [np.max, np.min]})
#reset columns multiindex
levels = g.columns.levels
labels = g.columns.labels
g.columns = levels[1][labels[1]]
g['time_delta'] = g['amax'] - g['amin']
#drop columns
g = g.drop(['amax', 'amin'], axis=1)
#join to Exhibit_A, sort, reset index
Exhibit_D = Exhibit_A.join(g).sort_values('time_delta', ascending=False).reset_index()
#reorder columns
Exhibit_D = Exhibit_D[['boxers', 'feels', 'last ate', 'name', 'addresses' , 'time_delta' ]]
print Exhibit_D
boxers feels last ate name addresses time_delta
0 True Elon envy 2015-12-08 Larry 3 1429 days
1 True charitable 2015-12-09 Bill 1 945 days
2 False like the number 7 2015-12-08 Jeff 1 472 days
3 True Alcoa envy 2015-12-01 Elon 2 143 days
4 True sassy 2015-12-09 Marissa NaN 0 dayshttps://stackoverflow.com/questions/34191746
复制相似问题