有着这样的数据:
df1 <- data.frame(stock = c("Google, Yahoo", "Google", "Yahoo, Google", "Amazon, Google", "Google, Amazon"), investor = c("Nathalie","George","Nathalie, George", "Melanie, George","Melanie"))这是可能的,使用这个频率的每一个股票频率
table(sapply(strsplit(as.character(df1$stock), ", "), function(x) toString(sort(x))))怎样才能在每只股票的频率上增加一个过滤器,但却以第三列为基础,以显示投资者的选择偏好。以下是预期输出的示例:
data.frame(investor = c("Nathalie", "George", "George", "George", "Melanie", "Malanie"), stock = c("Google, Yahoo", "Google", "Google, Yahoo", "Amazon, Google", "Amazon, Google", "Amazon"), frq = c(2,1,1,1,1,1))投资者股票frq 1 Nathalie Google,Yahoo 2 2 George Google 1 3 George Google,Yahoo 1 4 George Amazon,Google 1 5 Melanie Amazon,Google 1 6 Malanie Amazon 1
再增加一栏:
df1 <- data.frame(stock = c("Google, Yahoo", "Google", "Yahoo, Google", "Amazon, Google", "Google, Amazon"), investor = c("Nathalie","George","Nathalie, George", "Melanie, George","Melanie"), year = c("2017", "2018", "2017", "2018", "2017"))发布于 2019-07-15 17:22:11
按照前面的步骤对值进行排序之后,而不是直接执行table,而是通过赋值来更新列,然后我们可以使用tidyverse方法拆分‘投资者’行,并使用add_count创建一个计数列。
library(tidyverse)
df1$stock <- sapply(strsplit(as.character(df1$stock), ", "),
function(x) toString(sort(x)))
df1 %>%
mutate_if(is.factor, as.character) %>%
separate_rows(investor) %>%
add_count(stock, investor, year)https://stackoverflow.com/questions/57042943
复制相似问题