首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >过滤网站上的特定评论

过滤网站上的特定评论
EN

Stack Overflow用户
提问于 2018-08-16 13:47:41
回答 2查看 58关注 0票数 0
代码语言:javascript
复制
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2
#import re
from BeautifulSoup import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}

req = urllib2.Request('https://www.sikayetvar.com/onedio', 
None,headers)
resp  = urllib2.urlopen(req)
html = resp.read()
soup = BeautifulSoup(html)

complaints = soup.findAll('p', attrs = {'class' : 'complaint-summary'})


for complaint in complaints:
   if complaint.text.find("genç") is not -1:
      print complaint.text

我想过滤某些网站上有特定单词的投诉,但我无法搜索其中包含nonascii字符的单词。我用的是python2.7和漂亮的汤。知道为什么会这样吗?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-08-16 14:14:10

如果您的测试在p标记内,YouTube应该将od语句更改为

代码语言:javascript
复制
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2
from BeautifulSoup import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}

req = urllib2.Request('https://www.sikayetvar.com/onedio', 
None,headers)
resp  = urllib2.urlopen(req)
html = resp.read()
soup = BeautifulSoup(html)

complaints = soup.findAll('p', attrs = {'class' : 'complaint-summary'})

for complaint in complaints:
    if b"genç".decode("utf-8") in complaint.text:
        print(complaint.text)
票数 0
EN

Stack Overflow用户

发布于 2018-08-16 17:13:33

请勿使用python2。他们将在未来几年停止对它的支持。

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup 

response = requests.get('https://www.sikayetvar.com/onedio',headers = {'User-Agent': 'Mozilla/5.0'})

soup = BeautifulSoup(response.content,'lxml')

complaints = soup.select('p.complaint-summary')
for complaint in complaints:
    if "genç" in complaint.text:
        print(complaint.text.strip())

输出将是

代码语言:javascript
复制
Ne yazık ki bir sosyal sitede ahlak dışı içerikli haberler durulmuyor. Çocuk ve gençler için sakıncalı olduğunu düşünüyorum. Fotoğraflarda saçma başlıkları görebilirsiniz. Başlıklardan anlaşılacağı üzere cinsel…
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51870338

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档