实际上,我需要绘制所有的变化只发生在2012年10月,所以我是计数30行,以便我可以使用他们在xlim绘图。
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
poll_df=pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv')
row_in=0
xlimit=[]
poll_df=poll_df[poll_df['Start Date'].str[:7] == '2012-10']
for date in poll_df['Start Date']:
if date[0:7] == '2012-10':
xlimit.append(row_in)
row_in += 1
else:
row_in+=1
print(min(xlimit))
print(max(xlimit))但是,我不明白为什么尽管执行了操作,xlimit仍然是空的。
发布于 2018-07-04 22:01:16
通过下载该URL,我可以用np.genfromtxt加载它
In [232]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None)
/usr/local/bin/ipython3:1: ConversionWarning: Some errors were detected !
Line #77 (got 13 columns instead of 17)
Line #238 (got 13 columns instead of 17)
Line #460 (got 18 columns instead of 17)
Line #488 (got 18 columns instead of 17)
Line #493 (got 13 columns instead of 17)
Line #507 (got 18 columns instead of 17)
Line #515 (got 18 columns instead of 17)
Line #538 (got 18 columns instead of 17)
Line #550 (got 18 columns instead of 17)
#!/usr/bin/python3在处理较短/较长的行时,它不像pandas那样宽容。
In [233]: data.shape
Out[233]: (577,)
In [234]: data.dtype
Out[234]: dtype([('Pollster', '<U56'), ('Start_Date', '<U10'), ('End_Date', '<U10'), ('Entry_DateTime_ET', '<U20'), ('Number_of_Observations', '<i8'), ('Population', '<U26'), ('Mode', '<U15'), ('Obama', '<f8'), ('Romney', '<f8'), ('Undecided', '<f8'), ('Other', '<f8'), ('Pollster_URL', '<U113'), ('Source_URL', '<U189'), ('Partisan', '<U11'), ('Affiliation', '<U5'), ('Question_Text', '?'), ('Question_Iteration', '<i8')])start_date字段如下所示:
在235:数据‘’Start_Date‘Out235:数组(’2012-11-04‘,'2012-11-03','2012-11-03','2012-11-03','2012-11-03','2012-11-03','2012-11-03','2012-11-01','2012-11-02','2012-11-02',’2012-11-02‘,’dtype=‘
我可以用where搜索它。我使用astype将字段限制为7个字符。
In [236]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[236]:
array([18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])我可以使用usecols绕过可变的行长度-假设“坏”行在后一个字段中只是不同。
In [237]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None,usecols=range(10))
In [238]: data.shape
Out[238]: (586,)
In [239]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[239]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])我可以通过像您这样的迭代搜索获得相同的列表:
In [244]: alist = []
In [245]: for i,date in enumerate(data['Start_Date']):
...: if date[:7] == '2012-10':
...: alist.append(i)
...:
In [246]: len(alist)
Out[246]: 82
In [247]: np.array(alist)
Out[247]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])https://stackoverflow.com/questions/51178785
复制相似问题