我有一个数据框架如下:
source <- data.frame("name" = c('name1', 'name2', 'name3', 'name4'),
"section" = c('section1', 'section2', 'section3', 'section4'),
"values" = c("Type of information:experimental study\nReliability:1 (reliable without restriction)\n\n\nTest guideline, Qualifier:according to\n\n\nGLP compliance:yes\n\n\nEffect concentrations, Effect conc.:0.01 mg/L\n\nEffect concentrations, Effect conc.:0.01 mg/L\n\n\n",
"Type of information:experimental study\nReliability:2 (reliable with restrictions)\n\n\nTest guideline, Qualifier:according to\n\nTest guideline, Qualifier:according to\n\n\nGLP compliance:yes\n\n\nEffect concentrations, Effect conc.:0.002 mg/L\n\nEffect concentrations, Effect conc.:0.003 mg/L\n\nEffect concentrations, Effect conc.:0.002 mg/L\n\nEffect concentrations, Effect conc.:0.005 mg/L\n\n\n",
"Type of information:experimental study\nReliability:2 (reliable with restrictions)\n\n\nTest guideline, Qualifier:according to\n\nTest guideline, Qualifier:according to\n\nTest guideline, Qualifier:according to\n\n\nGLP compliance:yes Good laboratory practice compliance statement of July 11, 2014\n\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\n\n",
"Type of information:experimental study\nReliability:2 (reliable with restrictions)\n\n\nTest guideline, Qualifier:according to\n\n\nGLP compliance:yes\n\n\nEffect concentrations, Effect conc.:ca. 0.007 mg/L\n\n\n"
))我的理想结果是:
source_1 <- data.frame("name" = c('name1', 'name2', 'name3', 'name4'),
"section" = c('section1', 'section2', 'section3', 'section4'),
"key1" = c('value1'),
"key2" = c('value2'),
"key3" = c(NA, NA, 'value3', NA),
"key4" = c(NA, 'value4', NA, 'value4'),
"key5" = c(NA, NA, 'value5', 'value5'))我能够得到一个键列表和一个值列表,但是不知道如何将键转到列名并分配值。如果我能得到你的帮助,我会非常感激的。
发布于 2020-12-03 04:19:45
以下是使用tidyr库实现此操作的一种方法:
我们首先通过对新行字符('\n')的分割得到长格式的数据,然后在冒号(:)上分出两列的数据,最后得到宽格式的数据。
library(dplyr)
library(tidyr)
source %>%
separate_rows(values, sep = '\n') %>%
separate(values, c('key', 'value')) %>%
pivot_wider(names_from = key, values_from = value)
# name section key1 key2 key4 key3 key5
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 name1 section1 value1 value2 NA NA NA
#2 name2 section2 value1 value2 value4 NA NA
#3 name3 section3 value1 value2 NA value3 value5
#4 name4 section4 value1 value2 value4 value3 value5您的原始数据集需要一些数据清理。
source %>%
separate_rows(values, sep = '\n+') %>%
filter(values != '') %>%
separate(values, c('key', 'value'), sep = ':') %>%
group_by(name, section, key) %>%
summarise(value = toString(unique(value))) %>%
pivot_wider(names_from = key, values_from = value)https://stackoverflow.com/questions/65119827
复制相似问题