我有一个dataframe,其中的一列包含冒号和井号分隔的字符串。
data$col1
col1
1: 3#Tier_III_Uncertain EVS=[1, 0, 0, 1, 0, 0, 0, 0, 0, -1, 1, 1]
2: 3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0]
3: 4#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 1, 0]
4: 2#Tier_IV_benign EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0]
5: 3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0]
6: 5#Tier_III_Uncertain EVS=[0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1]我想提取字符串的元素并将其拆分为不同的列。
col1 col2 col3 EVS1 ... EVS12
3#Tier_III_Uncertain EVS=[1, 0, 0, 1, 0, 0, 0, 0, 0, -1, 1, 1] 3 Tier_III_Uncertain 1 1
3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0] 3 Tier_III_Uncertain 0 0
4#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 1, 0] 4 Tier_III_Uncertain 0 0
2#Tier_IV_benign EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0] 2 Tier_IV_benign 0 0
3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0] 3 Tier_III_Uncertain 0 0
5#Tier_III_Uncertain EVS=[0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1] 5 Tier_III_Uncertain 0 1发布于 2021-07-20 00:59:01
read.table(text=gsub("[^A-Za-z_0-9-]", " ", data$col1),
col.names = c(paste0('col', 2:4), paste0('EVS', 1:12)))[-3]
col2 col3 EVS1 EVS2 EVS3 EVS4 EVS5 EVS6 EVS7 EVS8 EVS9 EVS10 EVS11 EVS12
1 3 Tier_III_Uncertain 1 0 0 1 0 0 0 0 0 -1 1 1
2 3 Tier_III_Uncertain 0 0 0 1 0 0 0 0 0 1 1 0
3 4 Tier_III_Uncertain 0 0 0 1 0 0 0 0 2 0 1 0
4 2 Tier_IV_benign 0 0 0 1 0 0 0 0 0 0 1 0
5 3 Tier_III_Uncertain 0 0 0 1 0 0 0 0 1 0 1 0
6 5 Tier_III_Uncertain 0 0 1 1 0 0 0 0 1 0 1 1发布于 2021-07-20 01:11:40
假设在结尾处的注释中可重复显示DT,则用空格替换非单词字符和EVS=。然后使用fread读取并设置名称。最后,cbind DT到它。
DT2 <- fread(text = gsub("EVS=|\\W", " ", DT$col1))
names(DT2) <- c("col2", "col3", paste0("EVS", 1:(ncol(DT2)-2)))
cbind(DT, DT2)备注
library(data.table)
L <- "3#Tier_III_Uncertain EVS=[1, 0, 0, 1, 0, 0, 0, 0, 0, -1, 1, 1]
3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0]
4#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 1, 0]
2#Tier_IV_benign EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0]
3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0]
5#Tier_III_Uncertain EVS=[0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1]"
DT <- data.table(col1 = trimws(readLines(textConnection(L))))https://stackoverflow.com/questions/68444211
复制相似问题