首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >需要帮助计算组在R中的滚动总和

需要帮助计算组在R中的滚动总和
EN

Stack Overflow用户
提问于 2019-04-10 03:42:31
回答 1查看 43关注 0票数 1

我有一个包含4列的数据表: ID、Cat、Date和Val。我想在此表中创建2个新列,其中包含Val的3个月和12个月滚动总和,按ID和Cat分组。理想情况下,我喜欢data.table解决方案,因为我有一个非常大的数据集。此外,对于特定的ID/Cat组,我可能在某些月份缺少数据,因此应该在计算中跳过该月份。

这将创建一个要使用的示例数据表。

代码语言:javascript
复制
sample_data <- as.data.frame(matrix(nrow =34, ncol = 4))

colnames(sample_data) <- c("ID", "Cat", "Date", "Val")

sample_data$ID <- c("PMM", "PMM", "PMM", "PMM", "PMM", 
                    "PMM", "PMM", "PMM", "PMM", "PMM", 
                    "PMM", "PMM", "PMM", "LCG", "LCG", 
                    "LCG", "LCG", "LCG", "LCG", "LCG", 
                    "LCG", "LCG", "LCG", "LCG", "LCG", 
                    "LCG", "LCG", "LCG", "LCG", "LCG", 
                    "LCG", "LCG", "LCG", "LCG")

sample_data$Cat <- c("MM", "MM", "MM", "MM", "MM", 
                      "MM", "MM", "MM", "MM", "MM", 
                      "MM", "MM", "MM", "OE", "OE", 
                      "OE", "OE", "OE", "OE", "OE", 
                      "OE", "OE", "OE", "OE", "OE", 
                      "OE", "OE", "OE", "OE", "OE", 
                      "OE", "OE", "OE", "OE")

sample_data$Date <- c(
"2018-12-31", "2018-11-30", "2018-10-31", "2018-09-30", "2018-08-31",
"2018-07-31", "2018-06-30", "2018-05-31", "2018-04-30", "2018-03-31",
"2018-02-28", "2018-01-31", "2017-12-31", "2018-12-31", "2018-11-30",
"2018-10-31", "2018-09-30", "2018-08-31", "2018-07-31", "2018-06-30",
"2018-05-31", "2018-04-30", "2018-03-31", "2018-02-28", "2018-01-31",
"2017-12-31", "2017-11-30", "2017-10-31", "2017-09-30", "2017-08-31",
"2017-07-31", "2017-06-30", "2017-05-31", "2017-04-30")

sample_data$Val <- c(-11, 84, 74, 80, -9, 
                     -40, -76, -47, -50, -50, 
                     97, 42, 44, 53, 1, 
                     13, 65, 52, -5, 75, 
                     -41, -6, 8, -79, 53, 
                     22, -100, -57, -89, 28, 
                     37, -24, 17, -53)

我已经搜索了各种堆栈溢出解决方案,但无法获得任何工作。大多数解决方案只处理一列上的一个组,而不是多个列,并且只做一次滚动求和,而不是多个。

这是我开始做的事情,但是没有正确地实现。

代码语言:javascript
复制
sample_data <- sample_data[, cumsum3mo := sample_data[.(ID, Cat, Date, (Date - day(Date)+1) %m-% months(2) - 1), 
                       on = .(ID = V1, Cat = V2, Date <= V3, Date > V4), 
                       sum(Val), by = .EACHI]][]

这是预期的输出:

代码语言:javascript
复制
sample_data <- as.data.frame(matrix(nrow =34, ncol = 6))

colnames(sample_data) <- c("ID", "Cat", "Date", "Val", "cumsum3mo", "cumsum12mo")

sample_data$ID <- c("PMM", "PMM", "PMM", "PMM", "PMM", 
                    "PMM", "PMM", "PMM", "PMM", "PMM", 
                    "PMM", "PMM", "PMM", "LCG", "LCG", 
                    "LCG", "LCG", "LCG", "LCG", "LCG", 
                    "LCG", "LCG", "LCG", "LCG", "LCG", 
                    "LCG", "LCG", "LCG", "LCG", "LCG", 
                    "LCG", "LCG", "LCG", "LCG")

sample_data$Cat <- c("MM", "MM", "MM", "MM", "MM", 
                      "MM", "MM", "MM", "MM", "MM", 
                      "MM", "MM", "MM", "OE", "OE", 
                      "OE", "OE", "OE", "OE", "OE", 
                      "OE", "OE", "OE", "OE", "OE", 
                      "OE", "OE", "OE", "OE", "OE", 
                      "OE", "OE", "OE", "OE")

sample_data$Date <- c("2018-12-31", "2018-11-30", "2018-10-31", "2018-09-30", "2018-08-31", 
                      "2018-07-31", "2018-06-30", "2018-05-31", "2018-04-30", "2018-03-31", 
                      "2018-02-28", "2018-01-31", "2017-12-31", "2018-12-31", "2018-11-30", 
                      "2018-10-31", "2018-09-30", "2018-08-31", "2018-07-31", "2018-06-30", 
                      "2018-05-31", "2018-04-30", "2018-03-31", "2018-02-28", "2018-01-31", 
                      "2017-12-31", "2017-11-30", "2017-10-31", "2017-09-30", "2017-08-31", 
                      "2017-07-31", "2017-06-30", "2017-05-31", "2017-04-30")

sample_data$Val <- c(-11, 84, 74, 80, -9, 
                     -40, -76, -47, -50, -50, 
                     97, 42, 44, 53, 1, 
                     13, 65, 52, -5, 75, 
                     -41, -6, 8, -79, 53, 
                     22, -100, -57, -89, 28, 
                     37, -24, 17, -53)

sample_data$cumsum3mo <- c(147, 238, 145, 31, -125, 
                     -163, -173, -147, -3, 89, 
                     183, 86, 44, 67, 79, 
                     130, 112, 122, 29, 28, 
                     -39, -77, -18, -4, -25, 
                     -135, -246, -118, -24, 41, 
                     30, -60, -36, -53)

sample_data$cumsum12mo <- c(94, 149, 65, -9, -89, 
                      -80, -40, 36, 83, 133, 
                      183, 86, 44, 189, 158, 
                      57, -13, -167, -191, -149, 
                      -248, -190, -237, -245, -166, 
                      -219, -241, -141, -84, 5, 
                      -23, -60, -36, -53)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-04-10 08:27:55

假设你的数据是截至月末的,这应该可以让你开始:

代码语言:javascript
复制
library(data.table)
setDT(sample_data)
sample_data[, Date := as.Date(Date, format="%Y-%m-%d")]
sample_data[, c("cumsum3mo", "cumsum12mo") := .(
        sapply(Date, function(d) sum(Val[between(Date, seq(d+1L, by="-3 month", len=2L)[2L], d)])), 
        sapply(Date, function(d) sum(Val[between(Date, seq(d+1L, by="-12 months", len=2L)[2L], d)]))
    ),
    by=.(ID, Cat)]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55600184

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档