来自东京的问候。
让我解释一下我试图用python2.7实现什么:
我在每一行上都有一个带有JSON 的文件,下面是一个捕获:
1 {"res":0, "res_message":"OK", "debug_info":{"id-info":"9089"}, "visits":[{"id":"237000080507750613","siteId":1551642,"startTime":1483217576324,"endTime":1483217696000,"clientIPs":["69.61.12.70"],"country" :["United States"],"countryCode":["US"],"clientType":"Vulnerability Scanner","clientApplication":"Grabber","clientApplicationId":780,"httpVersion":"1.1","clientApplicationVersion":"null","userAgent":"Mozi lla/5.0 CommonCrawler Node 3AEHGF7VNEKJUWOPKJJIJ7ODKPM4XXVZQUTHNWS5B2O5AEAGHIG4HVC42LLEUSO.CQYXO3ZFD.GB5RZ5EG2SRWW335PUSOSIVLZUXPCTJUGV2MDJGQJDJPE5UH.cdn0.common.crawl.zone","os":"","osVersion":"","suppor tsCookies":false,"supportsJavaScript":false,"hits":1,"pageViews":0,"entryReferer":"","servedVia":["Ashburn,VA"],"securitySummary": {"api.threats.bot_access_control":1},"actions":[{"postData":"","requestResult":"api.request_result.req_blocked_security","isSecured":false,"responseTime":0,"thinkTime":0,"incidentId":"237000080507750613-304992946328 764549","threats":[{"securityRule":"api.threats.bot_access_control","alertLocation":"api.alert_location.alert_location_path","attackCodes":["200.0"],"securityRuleAction ":"api.rule_action_type.rule_action_block"}]}]}, ...
2 {"res":0, "res_message":"OK", "debug_info":{"id-info":"9089"}, "visits":[{"id":"520000110618442601","siteId":1551642,"startTime":1482666233524,"endTime":1482666353000,"clientIPs":["93.175.201.18"],"countr y":["Ukraine"],"countryCode":["UA"],"clientType":"Spam Bot","clientApplication":"DTS Agent","clientApplicationId":99,"httpVersion":"1.1","clientApplicationVersion":"null","userAgent":"Mozilla/4.0 (compati ble; MSIE 5.0; Windows NT; DigExt; DTS Agent","os":"","osVersion":"","supportsCookies":false,"supportsJavaScript":false,"hits":1,"pageViews":0,"entryReferer":"","served Via":["Warsaw, Poland"],"securitySummary":{"api.threats.bot_access_control":1},"actions":[{"postData":"","requestResult":"api.request_result.req_blocked_security","isSecured":false,"responseTime":2,"thinkTime":1,"incidentId":"520000110618442601-1233371267206742195","threats":[{"securityRule":"api.threats.bot_access_control","alertLocation":"api.alert_location.alert_location_path","attackCodes":["200.0"],"securityRuleAction":"api.rule_action_type.rule_action_block"}]}]}, ...
3 {"res":0, "res_message":"OK", "debug_info":{"id-info":"9089"}, "visits":[{"id":"520000110602830007","siteId":1551642,"startTime":1482429957001,"endTime":1482430077000,"clientIPs":["93.175.201.18"],"countr y":["Ukraine"],"countryCode":["UA"],"clientType":"Spam Bot","clientApplication":"DTS Agent","clientApplicationId":99,"httpVersion":"1.1","clientApplicationVersion":"null","userAgent":"Mozilla/4.0 (compati ble; MSIE 5.0; Windows NT; DigExt; DTS Agent","os":"","osVersion":"","supportsCookies":false,"supportsJavaScript":false,"hits":1,"pageViews":0,"entryReferer":"","served Via":["Warsaw, Poland"],"securitySummary":{"api.threats.bot_access_control":1},"actions":[{"postData":"","requestResult":"api.request_result.req_blocked_security","isSecured":false","responseTime":4,"thinkTime":4,"incidentId":"520000110602830007-3073954101470953658","threats":[{"securityRule":"api.threats.bot_access_control","alertLocation":"api.alert_location.alert_location_path","attackCodes":["200.0"],"securityRuleAction":"api.rule_action_type.rule_action_block"}]}]}, ...我试着用json.loads()处理整个文件,但没有成功。
这是我的代码
g = open('monthlyLogShort.txt', 'w')
with open("page.txt") as f:
data = f.read()
parse = json.loads(data) # <-load the JSON dict
field_list = parse["visits"]
for fields in field_list: # <-extract the the following field
print >> g , "visit_id=",(fields["id"]),",","src_country=",(fields["country"]),",", "event_timestamp=",(fields["startTime"]),",","src_ip=",(fields["clientIPs"]),",","dest_name=", rwdname," ,","dest_id=",(fields["siteId"]),",","signature=",(fields["securitySummary"])
g.close()正如您可以想象的那样,我只能用这段代码解析一行。什么是处理整个文件的最佳(pythonic)方法?
谢谢你读我的文章
发布于 2017-01-31 04:57:19
由于行数总是相同的,所以我想出了这样的解决方案:
g = open('monthlyLogShort.txt', 'w')
with open('page.txt','r') as f:
data = f.readlines()
countp = 0
page = 0
while countp < 10:
parse = json.loads(data[page]) # load the JSON dict
field_list = parse["visits"]
for fields in field_list: # extract the the following field
print >> g , "visit_id=",(fields["id"]),",","src_country=",(fields["country"]),",", "event_timestamp=",(fields["startTime"]),",","src_ip=",(fields["clientIPs"]),",","dest_name=", dname,",","dest_id=",(fields["siteId"]),",","signature=",(fields['securitySummary'])
countp = countp + 1
page = page + 1
else:
g.close()就像一种魅力。
发布于 2017-01-30 16:06:39
文件作为一个整体不是一个有效的JSON,但是可以逐行解析它
with open("page.txt") as f:
for line in f:
obj = json.loads(line.split(" ", 1)[1])
print(obj["visits"])https://stackoverflow.com/questions/41940228
复制相似问题