首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >有什么方法可以将键和值从R中的数据帧的列中分割出来,并为数据帧的每一行附加键作为列名和值?

有什么方法可以将键和值从R中的数据帧的列中分割出来,并为数据帧的每一行附加键作为列名和值?
EN

Stack Overflow用户
提问于 2020-12-03 04:14:42
回答 1查看 50关注 0票数 1

我有一个数据框架如下:

代码语言:javascript
复制
source <- data.frame("name" = c('name1', 'name2', 'name3', 'name4'),
                 "section" = c('section1', 'section2', 'section3', 'section4'),
                 "values" = c("Type of information:experimental study\nReliability:1 (reliable without restriction)\n\n\nTest guideline, Qualifier:according to\n\n\nGLP compliance:yes\n\n\nEffect concentrations, Effect conc.:0.01 mg/L\n\nEffect concentrations, Effect conc.:0.01 mg/L\n\n\n",
                              "Type of information:experimental study\nReliability:2 (reliable with restrictions)\n\n\nTest guideline, Qualifier:according to\n\nTest guideline, Qualifier:according to\n\n\nGLP compliance:yes\n\n\nEffect concentrations, Effect conc.:0.002 mg/L\n\nEffect concentrations, Effect conc.:0.003 mg/L\n\nEffect concentrations, Effect conc.:0.002 mg/L\n\nEffect concentrations, Effect conc.:0.005 mg/L\n\n\n",
                              "Type of information:experimental study\nReliability:2 (reliable with restrictions)\n\n\nTest guideline, Qualifier:according to\n\nTest guideline, Qualifier:according to\n\nTest guideline, Qualifier:according to\n\n\nGLP compliance:yes Good laboratory practice compliance statement of July 11, 2014\n\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\nEffect concentrations, Dose descriptor:NOEC\nEffect concentrations, Effect conc.:9 µg/L\n\n\n",
                              "Type of information:experimental study\nReliability:2 (reliable with restrictions)\n\n\nTest guideline, Qualifier:according to\n\n\nGLP compliance:yes\n\n\nEffect concentrations, Effect conc.:ca. 0.007 mg/L\n\n\n"
                              ))

我的理想结果是:

代码语言:javascript
复制
source_1 <- data.frame("name" = c('name1', 'name2', 'name3', 'name4'),
                     "section" = c('section1', 'section2', 'section3', 'section4'),
                     "key1" = c('value1'),
                     "key2" = c('value2'),
                     "key3" = c(NA, NA, 'value3', NA),
                     "key4" = c(NA, 'value4', NA, 'value4'),
                     "key5" = c(NA, NA, 'value5', 'value5'))

我能够得到一个键列表和一个值列表,但是不知道如何将键转到列名并分配值。如果我能得到你的帮助,我会非常感激的。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-12-03 04:19:45

以下是使用tidyr库实现此操作的一种方法:

我们首先通过对新行字符('\n')的分割得到长格式的数据,然后在冒号(:)上分出两列的数据,最后得到宽格式的数据。

代码语言:javascript
复制
library(dplyr)
library(tidyr)

source %>%
  separate_rows(values, sep = '\n') %>%
  separate(values, c('key', 'value')) %>%
  pivot_wider(names_from = key, values_from = value)

#  name  section  key1   key2   key4   key3   key5  
#  <chr> <chr>    <chr>  <chr>  <chr>  <chr>  <chr> 
#1 name1 section1 value1 value2 NA     NA     NA    
#2 name2 section2 value1 value2 value4 NA     NA    
#3 name3 section3 value1 value2 NA     value3 value5
#4 name4 section4 value1 value2 value4 value3 value5

您的原始数据集需要一些数据清理。

代码语言:javascript
复制
source %>%
  separate_rows(values, sep = '\n+') %>%
  filter(values != '') %>%
  separate(values, c('key', 'value'), sep = ':') %>%
  group_by(name, section, key) %>%
  summarise(value = toString(unique(value))) %>%
  pivot_wider(names_from = key, values_from = value)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65119827

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档