我将一个txt文件上传到R中,如下所示:Election_Parties <- readr::read_lines("Election_Parties.txt"),假设文件中有以下文本:
P23-Andalusian Social Democratic Party (Partido Social-Demócrata Andaluz [PSDA])
P24-Andalusian Socialist Movement (Movimiento Socialista Andaluz [MSA])
P235-Andalusian Socialist Party-Andalucian Party (Partido Socialista Andalucista-Partido
Andalucista [PSA-PA])
P26-Andalusist Party (Partido Andalucista [PA])
P217-Andecha Astur (Andecha Astur [AA])我想把关于聚会的所有信息都放在一条线上,不管它有多长。所以:
P25-Andalusian Socialist Party-Andalucian Party (Partido Socialista Andalucista-Partido
Andalucista [PSA-PA])应成为:
P25-Andalusian Socialist Party-Andalucian Party (Partido Socialista Andalucista-Partido Andalucista [PSA-PA])我想我应该先把所有的文字放在一起:
Election_Parties <- paste(Election_Parties, collapse=" ")然后在找到组合P**-或P***-时将其拆分。最后这部分我该怎么写?
编辑:
我想要应用到的实际数据如下所示:
BOLIVIA
P17-Nationalist Revolutionary Movement-Free Bolivia Movement (Movimiento
Nacionalista Revolucionario [MNR])
P19-Liberty and Justice (Libertad y Justicia [LJ])
P20-Tupak Katari Revolutionary Movement (Movimiento Revolucionario Tupak Katari [MRTK])
COLOMBIA
P1-Democratic Aliance M-19 (Alianza Democratica M-19 [AD-M19])
P2-National Popular Alliance (Alianza Nacional Popular [ANAPO])
P3-Indigenous Authorities of Colombia (Autoridades Indígenas
de Colombia)期望产出:
BOLIVIA
P17-Nationalist Revolutionary Movement-Free Bolivia Movement (Movimiento Nacionalista Revolucionario
P19-Liberty and Justice (Libertad y Justicia [LJ])
P20-Tupak Katari Revolutionary Movement (Movimiento Revolucionario Tupak Katari [MRTK])
COLOMBIA
P1-Democratic Aliance M-19 (Alianza Democratica M-19 [AD-M19])
P2-National Popular Alliance (Alianza Nacional Popular [ANAPO])
P3-Indigenous Authorities of Colombia (Autoridades Indígenas de Colombia)发布于 2019-11-22 11:26:47
你可以用
strsplit(paste(Election_Parties, collapse=" "), "\\s+(?=P\\d+-)", perl=TRUE)[[1]]见R演示在线。
输出:
[1] "P23-Andalusian Social Democratic Party (Partido Social-Demócrata Andaluz [PSDA])"
[2] "P24-Andalusian Socialist Movement (Movimiento Socialista Andaluz [MSA])"
[3] "P235-Andalusian Socialist Party-Andalucian Party (Partido Socialista Andalucista-Partido Andalucista [PSA-PA])"
[4] "P26-Andalusist Party (Partido Andalucista [PA])"
[5] "P217-Andecha Astur (Andecha Astur [AA])" \s+(?=P\d+-)模式与P、1+数字、-后面的1+空白空间相匹配,但P<numbers>-不会被消耗,因为该模式驻留在正的前瞻性构造中,即零宽度断言。由于这种展望,perl=TRUE参数对于使用PCRE引擎处理正则表达式是必要的。
https://stackoverflow.com/questions/58993208
复制相似问题