我将一个.txt文件上传到R中,如下所示:Election_Parties <- readr::read_lines("Election_Parties.txt")。假设文件中有以下文本:
BOLIVIA
P17-Nationalist Revolutionary Movement-Free Bolivia Movement (Movimiento Nacionalista Revolucionario
P19-Liberty and Justice (Libertad y Justicia [LJ])
P20-Tupak Katari Revolutionary Movement (Movimiento Revolucionario Tupak Katari [MRTK])
COLOMBIA
P1-Democratic Aliance M-19 (Alianza Democratica M-19 [AD-M19])
P2-National Popular Alliance (Alianza Nacional Popular [ANAPO])
P3-Indigenous Authorities of Colombia (Autoridades Indígenas de Colombia)一句话:在每一句空话之后,一个新的国家开始了。我想将这个文本文件转换为一个数据文件,在这里,国家名称成为一个向量,而当事方名单成为一个向量。
期望产出:
Bolivia P1-Nationalist Revolutionary Movement-Free Bolivia Movement (Movimiento Nacionalista
Bolivia P19-Liberty and Justice (Libertad y Justicia [LJ])
Bolivia P20-Tupak Katari Revolutionary Movement (Movimiento Revolucionario Tupak Katari [MRTK])
Colombia P1-Democratic Aliance M-19 (Alianza Democratica M-19 [AD-M19])
Colombia P2-National Popular Alliance (Alianza Nacional Popular [ANAPO])
Colombia P3-Indigenous Authorities of Colombia (Autoridades Indígenas de Colombia)如果可能的话,我希望解决方案是基于标题的。
编辑:我刚刚意识到每一个新的国家都是从P1开始的,所以解决方案也可以基于这一点。
发布于 2019-11-22 12:21:23
如果分隔符总是"",那么一旦将文本放在向量中,就使用它作为分界线,并进行累加,将它们分成几个组。
TXT = readr::read_lines("Election_Parties.txt")
#we add a separator for your first country
TXT = c("",TXT)
idx <- cumsum(TXT=="")
# use idx <- cumsum(!grepl("^[A-Z]",TXT)) if weird newline你可以看到玻利维亚进1,哥伦比亚进2
tibble::tibble(TXT,idx)
# A tibble: 10 x 2
TXT idx
<chr> <int>
1 "" 1
2 BOLIVIA 1
3 "P17-Nationalist Revolutionary Movement-Free Bolivia Movement (Movimie… 1
4 P19-Liberty and Justice (Libertad y Justicia [LJ]) 1
5 P20-Tupak Katari Revolutionary Movement (Movimiento Revolucionario Tup… 1
6 "" 2
7 COLOMBIA 2
8 P1-Democratic Aliance M-19 (Alianza Democratica M-19 [AD-M19]) 2
9 P2-National Popular Alliance (Alianza Nacional Popular [ANAPO]) 2
10 P3-Indigenous Authorities of Colombia (Autoridades Indígenas de Colomb… 2我们只对每个组应用一个函数,并创建一个数据格式。
func = function(x){
data.frame(Country=x[2],Parties=x[3:length(x)])
}
do.call(rbind,by(TXT,idx,func))https://stackoverflow.com/questions/58993738
复制相似问题