我有一个文件,其中包含一个标题和信息。
zcat majorfile.gz | head -n 3 | cut -d ' ' -f1-10
marker alleleA alleleB FINCH_WB_633_splitMerged FINCH_WB_633_splitMerged FINCH_WB_633_splitMerged FINCH_WB_C049985_splitMerged FINCH_WB_C049985_splitMerged FINCH_WB_C049985_splitMerged FINCH_WB_C071898_splitMerged
LR761571.1_34273 G C 0.9955 0.0045 0 0.9996 0.0004 0 1
LR761571.1_34285 G A 0.9934 0.0066 0 0.9999 0.0001 0 0.9435我想根据列名对该文件进行子集:
cat header.subset.txt | head
marker
alleleA
alleleB
FINCH_WB_633_splitMerged
FINCH_WB_ES1B002_splitMerged
FINCH_WB_JH1417_splitMerged
FINCH_WB_JH1452_splitMerged
FINCH_WB_JH1495_splitMerged
FINCH_WB_JP000_splitMerged
FINCH_WB_JP004_splitMerged我有多个"header.subset.txt“文件,所以我要遍历它们。
for file1 in header.subset.txt
do
awk 'NR==FNR{a[$1]++;next} {if(FNR==1){for(i=1;i<=NF;i++){if(a[$i]){printf $i" ";b[i]=$i}}}else{printf "\n";for(j=1;j<=NF;j++){if(b[j]) {printf $j" "}}}}END {printf "\n"}' \
$file1 \
majorfile.gz > majorfile_sub.gz
done awk命令适用于带选项卡分隔字段的文件,但不适用于空格(如本例中所示)。
在这个例子中,它会给出:
marker alleleA alleleB FINCH_WB_633_splitMerged FINCH_WB_633_splitMerged FINCH_WB_633_splitMerged
LR761571.1_34273 G C 0.9955 0.0045 0
LR761571.1_34285 G A 0.9934 0.0066 0编辑:下面是由gawk -o-格式化的awk代码,以便更容易阅读(但显然仍然缺少有意义的变量名):
NR == FNR {
a[$1]++
next
}
{
if (FNR == 1) {
for (i = 1; i <= NF; i++) {
if (a[$i]) {
printf $i " "
b[i] = $i
}
}
} else {
printf "\n"
for (j = 1; j <= NF; j++) {
if (b[j]) {
printf $j " "
}
}
}
}
END {
printf "\n"
}发布于 2022-11-15 20:47:08
OP当前代码的一个变体:
awk '
#BEGIN { FS=OFS="\t" } # uncomment if input/output fields are tab delimited
FNR==NR { headers[$1]; next }
{ sep=""
for (i=1; i<=NF; i++) {
if (FNR==1 && ($i in headers)) {
fldids[i]
}
if (i in fldids) {
printf "%s%s",sep,$i
sep=OFS # if not set elsewhere (eg, in a BEGIN{}block) then default OFS == <space>
}
}
print ""
}
' header.subset.txt <(zcat majorfile.gz)这就产生了:
marker alleleA alleleB FINCH_WB_633_splitMerged FINCH_WB_633_splitMerged FINCH_WB_633_splitMerged
LR761571.1_34273 G C 0.9955 0.0045 0
LR761571.1_34285 G A 0.9934 0.0066 0https://stackoverflow.com/questions/74451358
复制相似问题