首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >基于HTMLSession和bs4的网络抓取网站

基于HTMLSession和bs4的网络抓取网站
EN

Stack Overflow用户
提问于 2022-08-21 19:54:56
回答 1查看 59关注 0票数 1

我正试图从https://app.ens.domains/name/2354.eth/register网站上抓取“这个名字已经注册”的信息。在第一步,我试着刮起整个网站:

代码语言:javascript
复制
from requests_html import HTMLSession
from bs4 import BeautifulSoup

url = 'https://app.ens.domains/name/2354.eth/register'

this_session = HTMLSession()
response = this_session.get(url)
response.html.render()

print(response.html.raw_html)
print("---------------------------------------------------------------")

soup = BeautifulSoup(response.html.raw_html, "html.parser")
names = soup.findAll("div")

但是输出不包含我可以用来指定搜索的信息:

代码语言:javascript
复制
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释 
* b'<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name="theme-color" content="#000000"><link rel="manifest" href="/manifest.json"><link rel="apple-touch-icon" sizes="152x152" href="/apple-touch-icon.png"><link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png"><link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png"><link rel="manifest" href="/site.webmanifest"><link rel="mask-icon" href="/safari-pinned-tab.svg" color="#5bbad5"><meta name="msapplication-TileColor" content="#2b5797"><meta name="theme-color" content="#ffffff"><link href="https://fonts.googleapis.com/css?family=Overpass:100,200,300,400,600,700,800,900|Overpass+Mono:300,400" rel="stylesheet"><link rel="search" type="application/opensearchdescription+xml" title="ENS App" href="/opensearch.xml"><title>ENS App</title><script async="" src="https://www.google-analytics.com/analytics.js"></script><script src="https://www.googleoptimize.com/optimize.js?id=OPT-KTCR9V9"></script><script defer="defer" src="/static/js/main.64b76e39-3ba85.js"></script><style data-emotion="css"></style><link rel="prefetch" as="script" href="/static/js/TestRegistrar.987088ac-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/729.362306d1-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/Home.ea8fd0b4-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/171.a237641d-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/804.45936964-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/926.ce58711c-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/973.eb3a3ec8-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/333.55449608-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/SearchResults.8083ff69-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/787.796394e1-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/944.b11bdaf9-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/146.efe6852e-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/10.82d34550-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/877.ef102d75-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/409.51df15df-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/535.b54d1b5b-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/554.72170ba5-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/SingleName.9d36bc40-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/825.31eb5e1d-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/Favourites.5a13335b-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/Faq.4e16b951-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/Address.b6964377-3ba85.chunk.js"><link rel="prefetch" as="script" href="/static/js/Renew.2f93065c-3ba85.chunk.js"><style data-emotion="css"></style><script charset="utf-8" data-webpack="ens-app:chunk-787" src="/static/js/787.796394e1-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-171" src="/static/js/171.a237641d-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-944" src="/static/js/944.b11bdaf9-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-146" src="/static/js/146.efe6852e-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-10" src="/static/js/10.82d34550-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-877" src="/static/js/877.ef102d75-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-409" src="/static/js/409.51df15df-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-729" src="/static/js/729.362306d1-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-926" src="/static/js/926.ce58711c-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-535" src="/static/js/535.b54d1b5b-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-554" src="/static/js/554.72170ba5-3ba85.chunk.js"></script><script charset="utf-8" data-webpack="ens-app:chunk-333" src="/static/js/333.55449608-3ba85.chunk.js"></script><link rel="stylesheet" type="text/css" href="/static/css/SingleName.6f4b1931.chunk.css"><script charset="utf-8" data-webpack="ens-app:chunk-61" src="/static/js/SingleName.9d36bc40-3ba85.chunk.js"></script></head><body><noscript>You need to enable JavaScript to run this app.</noscript><script src="https://browser.sentry-cdn.com/6.13.2/bundle.min.js" integrity="sha384-fcgCrdIqrZ6d6fA8EfCAfdjgN9wXDp0EOkueSo3bKyI3WM4tQCE0pOA/kJoqHYoI" crossorigin="anonymous"></script><script>Sentry.init({dsn:"https://7b24dc49f7014d56a422c24e18212ef3@o1010257.ingest.sentry.io/5974691"})</script><div id="root"><div class="css-lfsk6e e1506hml0" style="display: none !important;"><main class="css-pwvrwp e1iaa33a0"></main></div><div class="css-42k21n e1bb2sm1"><div class="lds-css css-197eqvq e1bb2sm0"><div class="lds-dual-ring"><div></div></div></div></div></div><div id="modal-root"></div><script defer="" src="https://static.cloudflareinsights.com/beacon.min.js/v652eace1692a40cfa3763df669d7439c1639079717194" integrity="sha512-Gi7xpJR8tSkrpF7aordPZQlW2DLtzUlZcumS8dMQjwDHEnw9I7ZLyiOj/6tZStRBGtGgN6ceN6cMH8z7etPGlw==" data-cf-beacon="{&quot;rayId&quot;:&quot;73e5d7a6cf3144f2&quot;,&quot;token&quot;:&quot;323ef669b85e40d4ba13d48f5ff255dd&quot;,&quot;version&quot;:&quot;2022.8.0&quot;,&quot;si&quot;:100}" crossorigin="anonymous"></script>\n</body></html>'
*/

我是否有一个完全错误的方法,还是我忽略了一些细节?

EN

回答 1

Stack Overflow用户

发布于 2022-08-21 20:12:25

您做得很好,但问题是页面加载速度慢,所以在初始呈现之后需要等待更多时间(请参阅文档)。10秒就够了。

代码语言:javascript
复制
response.html.render(sleep=10)

在此之后,应该很容易找到所需的div:

代码语言:javascript
复制
div = soup.select_one("div.css-1qv42d6")
print(div.next)

因此,完整的代码如下:

代码语言:javascript
复制
from requests_html import HTMLSession
from bs4 import BeautifulSoup

url = 'https://app.ens.domains/name/2354.eth/register'

this_session = HTMLSession()
response = this_session.get(url)
response.html.render(sleep=10)

print(response.html.raw_html)
print("---------------------------------------------------------------")

soup = BeautifulSoup(response.html.raw_html, "html.parser")
div = soup.select_one("div.css-1qv42d6")
print(div.next)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73437792

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档