文章/答案/技术大牛

发布

社区首页 >问答首页 >根据其域名使URL成为唯一

问根据其域名使URL成为唯一
EN

Stack Overflow用户

提问于 2021-05-24 05:24:07

回答 2查看 128关注 0票数 3

我有一个名为urls.list的URL列表

https://target.com/?first=one
https://target.com/something/?first=one
http://target.com/dir/?first=summer
https://fake.com/?first=spring
https://example.com/about/?third=three
https://example.com/?third=three

我想让它们基于它们的域(如https://target.com )而独一无二，这意味着每一个域都有自己的协议打印一次，并且避免了下一个URL。因此，结果将是：

https://target.com/?first=one
http://target.com/dir/?first=summer
https://fake.com/?first=spring
https://example.com/about/?third=three

这就是我想做的：

cat urls.list | cut -d"/" -f1-3 | awk '!a[$0]++' >> host_unique.del

for urls in $(cat urls.list); do

    for hosts in $(cat host_unique.del); do
        if [[ $hosts == *"$urls"* ]]; then
            echo "$hosts"
        fi
    done
done

bash

awk

unique

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-05-24 05:30:02

这个awk可能会做你想做的事。

awk -F'/' '!seen[$1,$3]++' urls.list

bash替代方案在大型数据/文件集上非常缓慢，但它就是这样的。

使用mapfile aka readarray，这是一个bash4+特性，关联数组。再加上一些bash功能。

#!/usr/bin/env bash

declare -A uniq
mapfile -t urls < urls.list

for uniq_url in "${urls[@]}"; do
  IFS='/' read -ra url <<< "$uniq_url"
  if ((!uniq["${url[0]}","${url[2]}"]++)); then
    printf '%s\n' "$uniq_url"
  fi
done

票数 5

Stack Overflow用户

发布于 2021-05-24 05:29:18

带着你所展示的样品，请试一试。

awk 'match($0,/https?:\/\/[^/]*/){val=substr($0,RSTART,RLENGTH)} !arr[val]++' Input_file

解释：添加了上面的详细说明。

awk '                               ##Starting awk program from here.
match($0,/https?:\/\/[^/]*/){       ##using match to match http or https followedby ://
  val=substr($0,RSTART,RLENGTH)     ##Creating val which has matched string value here.
}
!arr[val]++                         ##Checking condition if val not present in arr then print current line.
' Input_file                        ##Mentioning Input_file name here.

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67667026

复制

相似问题

问根据其域名使URL成为唯一
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问根据其域名使URL成为唯一EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问根据其域名使URL成为唯一
EN