下面是我试图操作的CSV文件的一个小片段,CSV中的每一行都是一个字符串:
"address,bathrooms,bedrooms,built,lot,saledate,sale price,squarefeet"
"1116 Fountain St, Ann Arbor, MI Real Estate",2,4,1949,0.62 ac,20140905,469900,"1,910"
"3277 Chamberlain Cir, Ann Arbor, MI Real Estate",3,3,2002,0.32 ac,20140905,315000,"1,401"
"2889 Walnut Ridge Dr, Ann Arbor, MI Real Estate",4,4,2005,0.50 ac,20140904,790000,"3,972"
"1336 Nottington Ct, Ann Arbor, MI Real Estate",3,3,2002,,20140904,332350,"1,521"
"344 Sedgewood Ln # 14, Ann Arbor, MI Real Estate",,,,"6,534",20140904,345000,
"545 Allison Dr, Ann Arbor, MI Real Estate",2,2,,0.29 ac,20140904,159900,"1,400"我想把每一行都列成一张单子,分开如下:
"1116喷泉街,安娜堡,MI房地产“,1949年,2,4,0.62,20140905,469900,1910年
我希望第一个项目(地址)是一个字符串,其余的是ints和浮动。我用粗体表示0.62的原因是因为我想用0.62替换0.62ac。我试着拆分每一行,但是执行line.split(',')不能工作,因为地址中包含两个逗号,而且我也会分割它。有更简单的方法吗?
如有任何建议,我将不胜感激。
谢谢。
发布于 2014-10-02 03:28:30
首先,使用csv模块。它将为您处理引用的字段,如果字段包含内嵌逗号,则不会将字段拆分。
import csv
with open('input.csv') as f:
reader = csv.reader(f)
next(reader) # thow away the header
for row in reader:
print row产
['1116 Fountain St, Ann Arbor, MI Real Estate', '2', '4', '1949', '0.62 ac', '20140905', '469900', '1,910']
['3277 Chamberlain Cir, Ann Arbor, MI Real Estate', '3', '3', '2002', '0.32 ac', '20140905', '315000', '1,401']
['2889 Walnut Ridge Dr, Ann Arbor, MI Real Estate', '4', '4', '2005', '0.50 ac', '20140904', '790000', '3,972']
['1336 Nottington Ct, Ann Arbor, MI Real Estate', '3', '3', '2002', '', '20140904', '332350', '1,521']
['344 Sedgewood Ln # 14, Ann Arbor, MI Real Estate', '', '', '', '6,534', '20140904', '345000', '']
['545 Allison Dr, Ann Arbor, MI Real Estate', '2', '2', '', '0.29 ac', '20140904', '159900', '1,400']因此,您可以看到CSV读取器正确地处理字段。接下来,您需要将字段转换为ints并酌情浮动。
https://stackoverflow.com/questions/26153945
复制相似问题