文章/答案/技术大牛

发布

问bs4 python web抓取
EN

Stack Overflow用户

提问于 2020-09-06 18:13:09

回答 3查看 71关注 0票数 0

我只想从这个特定的div访问文本。结构是这样的：

<div class="edgtf-pli-text"><h4 class="edgtf-pli-title entry-title" itemprop="name">
Crash Landing on You</h4></div>

代码是：

import requests
from bs4 import BeautifulSoup
page = requests.get('https://kdramaclicks.com/kdrama/romantic-comedy/')
soup = BeautifulSoup(page.content,'html.parser')
names = soup.find_all('div',class_='edgtf-pli-text')
print(names)

我如何塑造代码，使其只输出文本，即“坠落在你身上？”

我对抓取很陌生，所以请帮我一点忙，如果有什么好的api用来抓取wiki表，也推荐给我一个

python

web-scraping

beautifulsoup

回答 3

Stack Overflow用户

发布于 2020-09-06 18:18:02

使用get_text()方法提取标记内的文本。

for name in names:
    print(name.get_text(strip=True))

Crash Landing on You
Meow, The Secret Boy
Seven First Kisses
What’s Wrong with Secretary Kim
Touch Your Heart
The Secret Life of My Secretary
Strong Girl Bong-soon
Suspicious Partner
Secret Garden
She Was Pretty
Shopping King Louis
Oh My Venus
My Love from the Star
My First First Love
Legend of the Blue Sea
The Big Hit
Her Private Life
Beating Again
Emergency Couple
Clean with Passion for Now
Be Melodramatic

票数 1

Stack Overflow用户

发布于 2020-09-06 22:44:07

import requests
from bs4 import BeautifulSoup


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = [item.get_text(strip=True) for item in soup.select(
        "h4.edgtf-pli-title.entry-title")]
    print(target)


main("https://kdramaclicks.com/kdrama/romantic-comedy/")

输出：

['Crash Landing on You', 'Meow, The Secret Boy', 'Seven First Kisses', 'What’sWrong with Secretary Kim', 'Touch Your Heart', 'The Secret Life of My Secretary', 'Strong Girl Bong-soon', 'Suspicious Partner', 'Secret Garden', 'She Was Pretty', 'Shopping King Louis', 'Oh My Venus', 'My Love from the Star', 'My FirstFirst Love', 'Legend of the Blue Sea', 'The Big Hit', 'Her Private Life', 'Beating Again', 'Emergency Couple', 'Clean with Passion for Now', 'Be Melodramatic']

票数 1

Stack Overflow用户

发布于 2020-09-06 18:19:27

您可以使用BeautifulSoup标签的.text属性，然后对其进行.strip() (删除每个韩剧名称中前面的"\n“(换行符))。

import requests
from bs4 import BeautifulSoup


page = requests.get('https://kdramaclicks.com/kdrama/romantic-comedy/')
soup = BeautifulSoup(page.content,'html.parser')
names = soup.find_all('div',class_='edgtf-pli-text')
for name in names:
    print(name.text.strip())

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63763022

复制

相似问题

问bs4 python web抓取
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问bs4 python web抓取EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问bs4 python web抓取
EN