我正在尝试从nessus文件中提取compliance-solution之间的文本,但由于有多行文本,我无法提取其中的一些文本。
Here is a sample of the compliance-solution text:
<cm:compliance-solution>Adjust the number of logs to prevent data loss. The default value of 6 may be insufficient for a production environment.
1. Open SQL Server Management Studio.
2. Open Object Explorer and connect to the target instance.
3. Navigate to the Management tab in Object Explorer and expand. Right click on the SQL Server Logs file and select Configure
4. Check the Limit the number of error log files before they are recycled
5. Set the Maximum number of error log files to greater than or equal to 12</cm:compliance-solution>
#!/usr/bin/python
import re
f = open('C:\\<file_path>\\test_file.nessus', 'r')
xml_content = f.readlines()
for line in xml_content:
m=re.compile('<cm:compliance-solution>(.*?)</cm:compliance-solution>').search(line)
w=re.compile('<cm:compliance-check-name>(.*?)</cm:compliance-check-name>').search(line)
x=re.compile('<cm:compliance-result>(.*?)</cm:compliance-result>').search(line)
y=re.compile('<cm:compliance-reference>(.*?)</cm:compliance-reference>').search(line)
q=re.compile('<cm:compliance-info>(.*?)</cm:compliance-info>').search(line)
if x is not None:
print(x.group(1).split('''))
if m is not None:
print(m.group(1).split('''))
if w is not None:
print(w.group(1))
if y is not None:
print(y.group(1))
if q:
print(q.group(1))
f.close()发布于 2020-10-26 04:04:31
您的regex不编译,因为您是逐行遍历文本,并且开始标记<cm:compliance-solution>与结束标记</cm:compliance-solution>不在同一行
一种选择是将整个文件读入一个字符串,如下所示:
f = open('C:\\<file_path>\\test_file.nessus', 'r')
xml_content = f.read()
m=re.compile('<cm:compliance-solution>(.*?)</cm:compliance-solution>').search(xml_content)
w=re.compile('<cm:compliance-check-name>(.*?)</cm:compliance-check-name>').search(xml_content)
x=re.compile('<cm:compliance-result>(.*?)</cm:compliance-result>').search(xml_content)
y=re.compile('<cm:compliance-reference>(.*?)</cm:compliance-reference>').search(xml_content)
q=re.compile('<cm:compliance-info>(.*?)</cm:compliance-info>').search(xml_content)
if x is not None:
print(x.group(1).split('''))
if m is not None:
print(m.group(1).split('''))
if w is not None:
print(w.group(1))
if y is not None:
print(y.group(1))
if q:
print(q.group(1))
f.close()如果XML数据更复杂,我建议查看python中可用的xml解析器库。
Etree:https://docs.python.org/3/library/xml.etree.elementtree.html
Defusedxml:https://pypi.org/project/defusedxml/
发布于 2020-10-26 16:27:40
因为nessus文件是xml格式的,所以应该使用xml解析器来提取数据。因此,如果您想要的是<cm:compliance-solution的文本,请尝试一些非常简单的内容,如:
from lxml import etree
doc = etree.parse(r'C:\\<file_path>\\test_file.nessus')
print(doc.xpath('//*[local-name()="compliance-solution"]/text()')[0])产出:
调整日志数量以防止数据丢失。对于生产环境,默认值6可能不够。
打开SQLServerManagementStudio.
发布于 2020-10-27 12:48:03
。表示除换行符以外的任何单个字符的匹配。您可以使用\s\S来匹配任何字符。因此,您可以使用以下正则表达式来匹配所需的数据。此外,正如fritz所说,您可以逐行获得数据,而不是完整的数据。
m=re.compile('<cm:compliance-solution>([\S\s]*?)</cm:compliance-solution>').search(xml_content) 但是,对于解析XML,我建议您使用XML解析库。这是你的参考资料。
from simplified_scrapy import SimplifiedDoc, utils
xml_content = utils.getFileContent('C:\\<file_path>\\test_file.nessus')
doc = SimplifiedDoc(xml_content)
m = doc.select('cm:compliance-solution>html()')
print (m)下面是更多的例子。https://github.com/yiyedata/simplified-scrapy-demo/blob/master/doc_examples
https://stackoverflow.com/questions/64531268
复制相似问题