我有几个数据框架,有一个没有标题的基因名称列表。每个文件大致如下所示:
表1
SCA-6_Chr1v1_00001
SCA-6_Chr1v1_00002
SCA-6_Chr1v1_00003
SCA-6_Chr1v1_00004
SCA-6_Chr1v1_00005
SCA-6_Chr1v1_00006
SCA-6_Chr1v1_00009
SCA-6_Chr1v1_00010
SCA-6_Chr1v1_00014
SCA-6_Chr1v1_00015
SCA-6_Chr1v1_00017表2
SCA-6_Chr1v1_00001
SCA-6_Chr1v1_00002
SCA-6_Chr1v1_00003
SCA-6_Chr1v1_00007
SCA-6_Chr1v1_20005
SCA-6_Chr1v1_00006
SCA-6_Chr1v1_00009
SCA-6_Chr1v1_00200
SCA-6_Chr1v1_00014
SCA-6_Chr1v1_10075
SCA-6_Chr1v1_00100这些数据帧中的每一个都被写入一个单独的.txt文件,我已经将它们全部上传到一个列表中,如下所示:
temp = list.files(pattern = "*.txt")
myfiles = lapply(temp, FUN=read.table, header=FALSE)使用myfiles列表,我希望比较所有数据帧之间的相互关系,并找到该文件中找到的值,这些值仅在该文件中找到一次引用列表中的其他项,然后在一个列表中返回它们,其中新列表中的每个数据帧只有在任何其他列表中找不到这些字符(假设我可以使用一个lapply函数来实现这一点)。我试过运行以下代码,但它没有删除共享值:
unique.genes = lapply(1:length(myfiles), function(n) setdiff(myfiles[[n]], unlist(myfiles[-n])))任何帮助都将不胜感激。
发布于 2022-07-28 19:29:54
这是一种方法。首先,提供可重复的数据:
set.seed(42)
myfiles <- replicate(2, sample(LETTERS, 25, replace=TRUE), simplify=FALSE)
myfiles
# [[1]]
# [1] "Q" "E" "A" "Y" "J" "D" "R" "Z" "Q" "O" "X" "G" "D" "Y" "E" "N" "T" "Z" "R" "O" "C" "I" "Y" "D" "E"
#
# [[2]]
# [1] "M" "E" "T" "B" "H" "C" "Z" "A" "J" "X" "K" "O" "V" "Z" "H" "D" "D" "V" "R" "M" "E" "D" "B" "X" "R"现在找到唯一的值:
result <- lapply(myfiles, unique)
result
# [[1]]
# [1] "Q" "E" "A" "Y" "J" "D" "R" "Z" "O" "X" "G" "N" "T" "C" "I"
#
# [[2]]
# [1] "M" "E" "T" "B" "H" "C" "Z" "A" "J" "X" "K" "O" "V" "D" "R"或者,为了便于比较,这将对它们进行排序:
result2 <- lapply(myfiles, function(x) sort(unique(x)))https://stackoverflow.com/questions/73158071
复制相似问题