让我们考虑一下我有两个列表
第1人:
2012-08 person 1 23
2012-09 person 1 63
2012-10 person 1 99
2012-11 person 1 62 和
第2人:
2012-08 person 2 45
2012-09 person 2 69
2012-10 person 2 12
2012-11 person 2 53 如果我想要一个具有以下模式的表格数据,你有什么建议:
Date Person 1 Person 2
----- --------- ---------
2012-08 23 45
2012-09 63 69
2012-10 99 12
2012-11 62 53 更新
以下是清单:
List1 = [(u'201206', u'Customer_1', 0.19048299999999993), (u'201207', u'Customer_1', 15.409000999998593), (u'201208', u'Customer_1', 71.1695730000299), (u'201209', u'Customer_1', 135.73918600011424), (u'201210', u'Customer_1', 235.26299999991522), (u'201211', u'Customer_1', 271.768984999485), (u'201212', u'Customer_1', 355.90968299883934), (u'201301', u'Customer_1', 508.39194049821526), (u'201302', u'Customer_1', 631.136656500077), (u'201303', u'Customer_1', 901.9127695088399), (u'201304', u'Customer_1', 951.9143960094264)]
List 2 = [(None, None, None), (None, None, None), (None, None, None), (None, None, None), (None, None, None), (None, None, None), (None, None, None), (u'201301', u'Customer_2', 3.7276289999999657), (u'201302', u'Customer_2', 25.39122749999623), (u'201303', u'Customer_2', 186.77777299985306), (u'201304', u'Customer_2', 387.97834699805617)]发布于 2013-05-26 20:38:51
在处理过程中使用itertools.izip()组合两个输入序列
import itertools
reader1 = csv.reader(file1)
reader2 = csv.reader(file2)
for row1, row2 in itertools.izip(reader1, reader2):
# process row1 and row2 together.这也适用于列表;izip()使长列表的合并非常有效;它是 function的迭代器版本,在python 2中,它在内存中实现了整个组合列表。
如果可以将创建输入列表的函数重新配置为生成器,请使用:
def function_for_list1(inputfilename):
with open(inputfilename, 'rb') as f:
reader = csv.reader(f)
for row in reader:
# process row
yield row
def function_for_list2(inputfilename):
with open(inputfilename, 'rb') as f:
reader = csv.reader(f)
for row in reader:
# process row
yield row
for row1, row2 in itertools.izip(function_for_list1(somename), function_for_list2(someothername)):
# process row1 and row2 together这种安排使得您可以处理千兆字节的信息,同时只将处理一小组行所需的信息保存在内存中。
发布于 2013-05-26 20:48:50
l1=[ ['2012-08','person 1',23], ['2012-09','person 1',63],
['2012-10','person 1',99], ['2012-11','person 1',62]]
l2=[ ['2012-08','person 2',45], ['2012-09','person 2',69],
['2012-10','person 2',12], ['2012-11','person 2',53]]
h1 = { x:z for x,y,z in l1}
h2 = { x:z for x,y,z in l2}
print "{:<10}{:<10}{:<10}".format("Date", "Person 1", "Person 2")
print "{:<10}{:<10}{:<10}".format('-'*5, '-'*8, '-'*8)
for d in sorted(h1): print "{:<10} {:<10}{:<10}".format(d,h1[d],h2[d])输出
Date Person 1 Person 2
----- -------- --------
2012-08 23 45
2012-09 63 69
2012-10 99 12
2012-11 62 53 发布于 2013-05-26 20:48:54
如果Python不是必需的,并且两个CSV文件的生成是在一个普通的bash脚本中进行的,那么您可以将join和awk (甚至cut)组合起来。
示例:
假设这个文件名为one
2012-08 person1 23
2012-09 person1 63
2012-10 person1 99
2012-11 person1 62 这个文件名为two
2012-08 person2 45
2012-09 person2 69
2012-10 person2 12
2012-11 person2 53 然后命令
join one two | awk '{print $1 " " $3 " " $5}'将产出:
2012-08 23 45
2012-09 63 69
2012-10 99 12
2012-11 62 53要将CSV头放在输出上,或者选择不同的分隔符,并不困难。
注意,一个注意事项是,两个文件必须在join列上排序才能工作。但是你可能已经知道了,因为你说这两个CSV文件很大。因此,您可能不想一次将它们全部读入内存。简单的Unix工具非常适合这类东西,IMHO。
https://stackoverflow.com/questions/16763433
复制相似问题