我有一个文本文档,每当出现\n或.时都必须拆分。使用split(),我们可以拆分,但是基于\n和.,我们可以这样做。
码
text = 'Christmas Perot 2021 TSO\nSkip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items. BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM\nPOPS I Christmas at The Perot\nCLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401\nA Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family.\nDon’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition\nBack to Events 2019 Texarkana Symphony Orchestra'
sentences = text.split('\n')
print(sentences)输出
['Christmas Perot 2021 TSO',
'Skip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items. BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM',
'POPS I Christmas at The Perot',
'CLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401',
'A Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family.',
'Don’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition',
'Back to Events 2019 Texarkana Symphony Orchestra']期望输出
['Christmas Perot 2021 TSO',
'Skip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items.',
'BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM',
'POPS I Christmas at The Perot',
'CLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401',
'A Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family.',
'Don’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition',
'Back to Events 2019 Texarkana Symphony Orchestra']发布于 2021-10-19 11:14:39
有一种方法是这样
text = 'Christmas Perot 2021 TSO\nSkip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items. BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM\nPOPS I Christmas at The Perot\nCLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401\nA Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family.\nDon’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition\nBack to Events 2019 Texarkana Symphony Orchestra'
semiSentences = text.replace('.','\n').split('\n')
sentences=[]
for s in semiSentences:
if s.isalnum():
sentences[-1]=sentences[-1]+'.'+s
else:
sentences.append(s)
print(sentences)输出量
['Christmas Perot 2021 TSO', 'Skip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items', ' BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM', 'POPS I Christmas at The Perot', 'CLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401', 'A Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family', '', 'Don’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition', 'Back to Events 2019 Texarkana Symphony Orchestra']https://stackoverflow.com/questions/69629130
复制相似问题