我有一个正则表达式模式,它能部分捕捉到我想要的东西。这个模式可以看起来像任何一个
"caller command"
"caller command specifier"
"caller command 'two-worded specifier'"
"caller 'two-worded command' specifier"
"caller 'two-worded command' 'two-worded specifier'"我的当前代码正在将它们匹配到命名组中,并使用Python的re库文档中显示的yes/no模式。
messages = ["your.majesty hello", "proclamation honor Dom", "your.majesty query 'Weekly Coding Challenge'", "your.majesty 'build test' submissions", "your.majesty 'build test' 'Weekly Coding Challenge'"]
call = "(?P<call>.*?)"
command = "(?P<command>'(.*?)'|(.*?))"
specifier = "(?P<specifier>'(.*?.)'|(.*?))"
duo = f"{call}\s{command}"
trio = f"({call}\s{command}\s{specifier})"
regex_duo = re.compile(duo, flags=re.DOTALL)
regex_trio = re.compile(trio)
for msg in messages:
match = regex_trio.match(msg)
if match is None:
match = regex_duo.match(msg)
print(match)它的输出是
<re.Match object; span=(0, 13), match='your.majesty '>
<re.Match object; span=(0, 19), match='proclamation honor '>
<re.Match object; span=(0, 44), match="your.majesty query 'Weekly Coding Challenge'">
<re.Match object; span=(0, 26), match="your.majesty 'build test' ">
<re.Match object; span=(0, 51), match="your.majesty 'build test' 'Weekly Coding Challeng>当我想
<re.Match object; span=(0, ...), match='your.majesty hello'>
<re.Match object; span=(0, ...), match='proclamation honor Dom'>
<re.Match object; span=(0, ...), match="your.majesty query 'Weekly Coding Challenge'">
<re.Match object; span=(0, ...), match="your.majesty 'build test' submissions">
<re.Match object; span=(0, ...), match="your.majesty 'build test' 'Weekly Coding Challenge'>发布于 2020-10-01 01:40:17
解决方案1: csv.reader (重用车轮)
只需将问题转换为csv.reader使用io.StringIO可读的格式即可。
码
from io import StringIO
import csv
messages = [
"your.majesty hello",
"proclamation honor Dom",
"your.majesty query 'Weekly Coding Challenge'",
"your.majesty 'build test' submissions",
"your.majesty 'build test' 'Weekly Coding Challenge'"
]
# Avoid creating StringIO object multiple times
# for s in messages:
# reader = csv.reader(StringIO(s), delimiter=" ", quotechar="'")
# load at once
ss = "\n".join(messages)
reader = csv.reader(StringIO(ss), delimiter=" ", quotechar="'")
for row in reader: # type(row) is a list
caller = row[0]
command = row[1]
specifier = row[2] if len(row) == 3 else ""
# check
print(f"caller = {caller}, command = {command}, specifier = {specifier}")
# do something with the parsed components here输出
caller = your.majesty, command = hello, specifier =
caller = proclamation, command = honor, specifier = Dom
caller = your.majesty, command = query, specifier = Weekly Coding Challenge
caller = your.majesty, command = build test, specifier = submissions
caller = your.majesty, command = build test, specifier = Weekly Coding Challenge此解决方案不生成re.match对象,而是直接解析这三个组件。作为字符串而不是匹配的组,后续操作应该更容易。
优点是:我们知道现有的csv加载器可以正确地处理引号和空格分隔格式,对吗?所以不要重新发明轮子,试着重新利用它。这样,代码也更易于维护。
使用pandas.read_csv
注意:还可以使用pandas.read_csv()直接生成pandas.Dataframe。同样的语法也适用,只是列名必须手动分配。可能缺少的列(最后一列)得到适当处理。
import pandas as pd
pd.read_csv(StringIO(ss), delimiter=" ", quotechar="'", names=["caller", "command", "specifier"])
Out[38]:
caller command specifier
0 your.majesty hello NaN
1 proclamation honor Dom
2 your.majesty query Weekly Coding Challenge
3 your.majesty build test submissions
4 your.majesty build test Weekly Coding Challenge解决方案2:改进的Regex (更一般)
对于正则表达式,是的,它也可以改进很多。我个人认为这也是值得阐述的,因为许多解析任务(可能还有大部分)不能由现有的库来解决。
文摘
.*。?量词表示可选的存在。码
regex_uni = re.compile(r"""
(?P<call>\S+)
\ # a space character
(?P<command> # group 2:
(?: # 1st option (non-capturing group):
' # begins with SQ
[^']+ # followed by one or more consecutive non-SQ chars
' # ends with SQ
)
| # or
\S+ # 2nd option: consecutive non-space chars (assuming no SQ)
)
\ ? # optional space character
(?P<specifier> # group 3:
(?:'[^']+')|\S+ # same as group 2
)? # but the existence is optional
""", re.VERBOSE
)
for msg in messages:
match = regex_uni.match(msg)
if match is not None:
print(f"* input = {match.group()}")
print(f" call = {match.group('call')}")
print(f" command = {match.group('command')}")
print(f" specifier = {match.group('specifier')}")输出
* input = your.majesty hello
call = your.majesty
command = hello
specifier = None
* input = proclamation honor Dom
call = proclamation
command = honor
specifier = Dom
* input = your.majesty query 'Weekly Coding Challenge'
call = your.majesty
command = query
specifier = 'Weekly Coding Challenge'
* input = your.majesty 'build test' submissions
call = your.majesty
command = 'build test'
specifier = submissions
* input = your.majesty 'build test' 'Weekly Coding Challenge'
call = your.majesty
command = 'build test'
specifier = 'Weekly Coding Challenge'https://stackoverflow.com/questions/64130504
复制相似问题