首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何像潘达斯老板一样分割、排行和争吵

如何像潘达斯老板一样分割、排行和争吵
EN

Stack Overflow用户
提问于 2015-12-10 00:38:11
回答 1查看 167关注 0票数 2

假设一个人有一张表总结了这个星球上几个人的忙碌生活.

代码语言:javascript
复制
import pandas as pd
import numpy as np
import datetime as dt
from datetime import datetime as dt
t=pd.Timestamp

lu = pd.DataFrame({ 'name' : ['Bill','Elon','Larry','Jeff','Marissa'],
                                        'feels' : ['charitable','Alcoa envy','Elon envy','like the number 7','sassy'],
                                        'last ate' : [t('20151209'),t('20151201'),t('20151208'),t('20151208'),t('20151209')],
                                        'boxers' : [True,True,True,False,True]})

说一个人也知道这些人住在哪里当他们做了某些事情..。

代码语言:javascript
复制
af = pd.DataFrame({ 'name' : ['Bill','Elon','Larry','Elon','Jeff','Larry','Larry'],
                                        'address' : ['in my computer','moon','internet','mars','cardboard box','autonomous car','every where'],
                                        'sq_ft' : [2,2135,69,84535, 1.32, 54,168],
                                        'forks' : [7,1,2,1,0,np.nan,1]})

rand_dates=[t('20141202'),t('20130804'),t('20120508'),t('20150411'),
                        t('20141209'),t('20091023'),t('20130921'),t('20110102'),
                        t('20130728'),t('20141119'),t('20151024'),t('20130824')]

df = pd.DataFrame({ 'name' : ['Elon','Bill','Larry','Elon','Jeff','Larry','Larry','Bill','Larry','Elon','Marissa','Jeff'],
                                        'activity' : ['slept','tripped','spoke','swam','spooked','liked','whistled','up dog','smiled','donated','grant men paternity leave','fondled'],
                                        'date' : rand_dates})

人们可以将这些人按他们居住的地址排列如下:

af.name.value_counts()

代码语言:javascript
复制
Larry    3
Elon     2
Jeff     1
Bill     1

需要1:使用上面的排名,如何创建一个由查找表lu中的信息组成的新的“排名”数据?简单地说,一个人是如何做展览A的?

代码语言:javascript
复制
# Exhibit A
  boxers              feels   last ate     name  addresses
0   True          Elon envy 2015-12-08    Larry          3
1   True         Alcoa envy 2015-12-01     Elon          2
2  False  like the number 7 2015-12-08     Jeff          1
3   True         charitable 2015-12-09     Bill          1

需要2:观察下面的groupby操作的输出。如何确定最古老的日期和最新的日期之间的时间三角洲,根据这样的时间三角洲排名的成员?简单地说,一个人如何从群中获得展示D?

df.groupby(['name','date']).size()

代码语言:javascript
复制
name     date      
Bill     2011-01-02    1
         2013-08-04    1
Elon     2014-11-19    1
         2014-12-02    1
         2015-04-11    1
Jeff     2013-08-24    1
         2014-12-09    1
Larry    2009-10-23    1
         2012-05-08    1
         2013-07-28    1
         2013-09-21    1
Marissa  2015-10-24    1
代码语言:javascript
复制
#Exhibit B - Calculate time deltas
name     time_delta
Bill     Timedelta('945 days 00:00:00')
Elon     Timedelta('143 days 00:00:00')
Jeff     Timedelta('472 days 00:00:00')
Larry    Timedelta('1429 days 00:00:00')
Marissa  Timedelta('0 days 00:00:00')

#Exhibit C - Rank time deltas (this is easy)
name     time_delta
Larry    Timedelta('1429 days 00:00:00')
Bill     Timedelta('945 days 00:00:00')
Jeff     Timedelta('472 days 00:00:00')
Elon     Timedelta('143 days 00:00:00')
Marissa  Timedelta('0 days 00:00:00')

#Exhibit D - Add to and re-rank the table built in Exhibit A according to time_delta
  boxers              feels   last ate     name  addresses          time_delta
0   True          Elon envy 2015-12-08    Larry          3  1429 days 00:00:00
1   True         charitable 2015-12-09     Bill          1   945 days 00:00:00
2  False  like the number 7 2015-12-08     Jeff          1   472 days 00:00:00
3   True         Alcoa envy 2015-12-01     Elon          2   143 days 00:00:00
4   True              sassy 2015-12-09  Marissa        NaN     0 days 00:00:00

Prior Research:这是关于使用groupby和transform获取最大值的文章。另一篇文章是关于查找和选择最频繁的数据。提供了丰富的信息,但不要在系列剧中工作(count_values()的结果),也不要把我绊倒.实际上,我已经完成了第一部分的工作,但是代码是错误的,而且可能效率很低。

轻松编写代码共享,请查看此IPython笔记本,它列出了所有的内容。否则,请查看Python2.7 代码在这里

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-12-10 08:38:13

我想你可以用joinsort_values聚合在文档中。

代码语言:javascript
复制
#join value count to lu dataframe, renaming ans sorting
Exhibit_A = lu.set_index('name').join(af.name.value_counts()).rename(columns={'name': 'addresses'}).sort_values('addresses', ascending=False)
#drop rows with NaN, reset index
print Exhibit_A.dropna().reset_index()

    name boxers              feels   last ate  addresses
0  Larry   True          Elon envy 2015-12-08          3
1   Elon   True         Alcoa envy 2015-12-01          2
2   Bill   True         charitable 2015-12-09          1
3   Jeff  False  like the number 7 2015-12-08          1
代码语言:javascript
复制
#aggregate to min and max date 
g = df.groupby(['name']).agg({'date' : [np.max, np.min]})

#reset columns multiindex
levels = g.columns.levels
labels = g.columns.labels
g.columns = levels[1][labels[1]]

g['time_delta'] = g['amax'] - g['amin']

#drop columns
g = g.drop(['amax', 'amin'], axis=1)

#join to Exhibit_A, sort, reset index
Exhibit_D = Exhibit_A.join(g).sort_values('time_delta', ascending=False).reset_index()
#reorder columns
Exhibit_D = Exhibit_D[['boxers', 'feels', 'last ate', 'name', 'addresses' , 'time_delta' ]]
print Exhibit_D

  boxers              feels   last ate     name  addresses  time_delta
0   True          Elon envy 2015-12-08    Larry          3   1429 days
1   True         charitable 2015-12-09     Bill          1    945 days
2  False  like the number 7 2015-12-08     Jeff          1    472 days
3   True         Alcoa envy 2015-12-01     Elon          2    143 days
4   True              sassy 2015-12-09  Marissa        NaN      0 days
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34191746

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档