我有两个csv文件
alexa_products.csv
name, sku, urle, product, data
amazon, amazon.com, current, mobile, seller
vinnes, vinnes.com, current, cellular, Aircel_IndoreData.csv
name, sku, urle, product, data
linkedin.com, linkeidn, current, local, blah
airtel.com, airtel, current, sim, Airtel
amazon.com, amazon, face, network, buyier
vinnes.com, vinnes, look, hands, ddde现在我必须从alexa_products.csv和data.csv匹配名称,如果有匹配的话,我必须只从两个csv文件的特定列打印出所有的数据到另一个csv文件?
预期产出
amazon.com, amazon, face, network, buyier, current, mobile, seller
vinnes.com, vinnes, look, hands, ddde, current, cellular, Aircel_Indore发布于 2014-01-29 10:47:25
你可以尝试一些这样的方法:
sed "1d;s/ //g" alexa_products.csv | sort > a
sed "1d;s/ //g" data.csv | sort > b
join -t, -1 1 -2 2 a b > newfile.csv是的,我知道Perl不是很好;-)
"sed“命令从alexa_products.csv中移除标题行(第1行)和所有空格。然后使用“排序”对文件的其余部分进行排序,并将其保存为文件"a“。
同样,文件"data_products“的标题和空格被移除,排序并存储在文件"b”中。
然后“联接”使用文件"a“的字段1,并将其与文件b中的字段"2”匹配,并打印它们匹配的位置。
您可以使用命令"man“或"man”来阅读有关命令的手册--按空格键获取下一页,"q“退出阅读。
发布于 2014-01-29 10:30:23
由于您没有提到您感兴趣的列,所以我只是说,当第一个文件与第一个文件匹配时,这个命令将打印第二个文件的所有列。
awk -F, 'FNR==NR && NR!=1 && FNR!=1
{
a[$1]=$0;next
}{if($2 in a)
{
split(a[$2],b," ");
print $0,b[3],b[4],b[5]
}
}' alexa_products.csv data.csv发布于 2014-01-29 11:32:28
下面是一些Perl来帮助您入门,只是为了满足您的需要!
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %alexa;
my ($name,$sku,$urle,$product,$data);
# Parse first file
my $line=1;
open(my $fh,"<","alexa_products.csv")|| die "ERROR: Could not open alexa_products.csv";
while (<$fh>)
{
next if $line++==1; # Ignore header
chomp; # Remove LF
s/ //g; # Remove spaces
($name,$sku,$urle,$product,$data) = split(','); # Split line on commas
$alexa{$name}{'sku'}=$sku;
$alexa{$name}{'urle'}=$urle;
$alexa{$name}{'product'}=$product;
$alexa{$name}{'data'}=$data;
}
close($fh);
# Next line for debugging, comment out if necessary
print Dumper \%alexa;
# Now read data file
$line=1;
open($fh,"<","Data.csv")|| die "ERROR: Could not open Data.csv";
while(<$fh>)
{
next if $line++==1; # Ignore header line
chomp; # Remove LF
s/ //g; # Remove spaces
my ($name,$sku,$urle,$product,$data) = split(','); # Split line on commas
if(defined $alexa{$sku}){
print "$alexa{$sku}{'sku'},$alexa{$sku}{'data'},$alexa{$sku}{'product'}\n"; # You may want different fields
}
}https://stackoverflow.com/questions/21428156
复制相似问题