我用data.table编写了以下代码:
library(data.table)
dat <- structure(list(barcodes = c("scA22_CAACAGCAACAG", "scA22_CAACAGCAACAG",
"scA22_CAACAGCAACAG", "scA22_CAACAGCAACAG", "scA22_CAACAGCAACAG",
"scA22_CAACAGCAACAG", "scA22_CAACAGCAACAG", "scA22_TTTTTTTTTTTT"
), gene_name = c("A930037H05Rik", "A930037H05Rik", "A930037H05Rik",
"A930037H05Rik", "Lgals8", "Lgals8", "Lgals8", "Lgals8"), tsse = c(0.152777777777778,
0.152777777777778, 0.152777777777778, 0.00192307692307692, 0.055,
0.0485294117647059, 0.033, 0.0294642857142857)), na.action = structure(integer(0), .Names = character(0)), row.names = c(NA,
8L), class = "data.frame")
setDT(dat)
dat它会产生这样的结果:
barcodes gene_name tsse
1: scA22_CAACAGCAACAG A930037H05Rik 0.152777778
2: scA22_CAACAGCAACAG A930037H05Rik 0.152777778
3: scA22_CAACAGCAACAG A930037H05Rik 0.152777778
4: scA22_CAACAGCAACAG A930037H05Rik 0.001923077
5: scA22_CAACAGCAACAG Lgals8 0.055000000
6: scA22_CAACAGCAACAG Lgals8 0.048529412
7: scA22_CAACAGCAACAG Lgals8 0.033000000
8: scA22_TTTTTTTTTTTT Lgals8 0.029464286我想要做的是按c("barcodes", "gene_name")分组,然后选择based on tsse。
结果是:
barcodes gene_name tsse
1: scA22_CAACAGCAACAG A930037H05Rik 0.152777778
2: scA22_CAACAGCAACAG Lgals8 0.055000000
3: scA22_TTTTTTTTTTTT Lgals8 0.029464286我如何使用data.table来实现这一点。实际上,我有大约3亿行代码要生成。所以我需要data.table的速度。
发布于 2020-04-09 10:41:59
我们可以使用which.max:
library(data.table)
setDT(dat)[, .SD[which.max(tsse)], .(barcodes, gene_name)]
# barcodes gene_name tsse
#1: scA22_CAACAGCAACAG A930037H05Rik 0.1528
#2: scA22_CAACAGCAACAG Lgals8 0.0550
#3: scA22_TTTTTTTTTTTT Lgals8 0.0295发布于 2020-04-09 11:32:10
或者,可以设置关键点并拾取每组中的最后一个:
setkey(dat, barcodes, gene_name, tsse)
dat[, tail(.SD, 1), .(barcodes, gene_name)]
#> barcodes gene_name tsse
#> 1: scA22_CAACAGCAACAG A930037H05Rik 0.15277778
#> 2: scA22_CAACAGCAACAG Lgals8 0.05500000
#> 3: scA22_TTTTTTTTTTTT Lgals8 0.02946429发布于 2020-04-09 11:38:15
这是另一个选择:
setorder(dat, barcodes, gene_name, -tsse)
dat[c(TRUE, diff(rleid(barcodes, gene_name))>0L)]将有兴趣了解实际数据集的时间
https://stackoverflow.com/questions/61113058
复制相似问题