我有一个文本文件,其中包含如下文本:
Somename of someone 1234 7894
Even some more name 2345 5343
Even more of the same 6572 6456
I am a customer 1324 7894
I am another customer 5612 3657
Also I am a customer and I am number Three 9631 7411
And I am number four and not the latest one in list 8529 9369
And here I am 4567 9876我需要从中创建一个CSV文件,但问题是名称包含12列,因此我需要将前12列中的所有列合并为1列,这样CSV文件将如下所示:
Somename of someone,123456,789456cut -d ' ' -f1-11 test | sed "s/[[:space:]]/\\ /g" | sed "s/\t/\\ /g" > test1给我一个包含前12列的文件。
发布于 2019-05-13 21:48:20
使用GNU sed for \s/\S表示空格/非空格,并使用-E启用ERE:
$ sed -E 's/\s+(\S+)\s+(\S+)$/,\1,\2/' file
Somename of someone,1234,7894
Even some more name,2345,5343
Even more of the same,6572,6456
I am a customer,1324,7894
I am another customer,5612,3657
Also I am a customer and I am number Three,9631,7411
And I am number four and not the latest one in list,8529,9369
And here I am,4567,9876和任何POSIX sed的功能等价物:
$ sed 's/[[:space:]]*\([^[:space:]]\{1,\}\)[[:space:]]*\([^[:space:]]\{1,\}\)$/,\1,\2/' file
Somename of someone,1234,7894
Even some more name,2345,5343
Even more of the same,6572,6456
I am a customer,1324,7894
I am another customer,5612,3657
Also I am a customer and I am number Three,9631,7411
And I am number four and not the latest one in list,8529,9369
And here I am,4567,9876或者使用任何awk:
$ awk -v OFS=',' '{x=$(NF-1) OFS $NF; sub(/([[:space:]]+[^[:space:]]+){2}$/,""); print $0, x}' file
Somename of someone,1234,7894
Even some more name,2345,5343
Even more of the same,6572,6456
I am a customer,1324,7894
I am another customer,5612,3657
Also I am a customer and I am number Three,9631,7411
And I am number four and not the latest one in list,8529,9369
And here I am,4567,9876发布于 2019-05-13 20:35:36
如果与名称相关的不同列是同一CSV列的一部分,因此应该保持不变,为什么不只处理最后两列呢?
$ sed 's/\t* *\([0-9]\+\)\t* *\([0-9]\+\)$/,\1,\2/' input_file
Somename of someone,123456,789456
Even some more name,234567,534312
Even more of the same,657212,645613发布于 2019-05-13 21:34:29
如果你不介意改用GNU AWK,你可以这样做:
gawk 'BEGIN {FIELDWIDTHS = "54 5 5"; OFS = ","} {print $1, $2, $3}' FILE进一步解释:
实际上,您有3列固定宽度的数据,因此FIELDWIDTHS = "54 5 5"
OFS = ","为
注意,FIELDWIDTHS是GNU AWK的一个特性。
如果您不介意在CSV中保留空格,那么您就完成了。
或者,如果您还需要删除空格,则:
# test.gawk
BEGIN {
FIELDWIDTHS = "54 5 5"
OFS = ","
}
{
for (f=1; f<=NF; f++) {
sub(/ +$/, "", $f) # Delete whitespace.
}
print
}测试:
▶ gawk -f test.gawk FILE
Somename of someone,1234,7894
Even some more name,2345,5343
Even more of the same,6572,6456
I am a customer,1324,7894
I am another customer,5612,3657
Also I am a customer and I am number Three,9631,7411
And I am number four and not the latest one in list,8529,9369
And here I am,4567,9876(请注意,在第二个版本中,正如Ed Morton在评论中所建议的那样,我在最后只能使用print,因为我们修改了有效地更新$0的字段,并且字段分隔符被OFS取代。)
https://stackoverflow.com/questions/56112158
复制相似问题