文章/答案/技术大牛

发布

问requests.get找不到表
EN

Stack Overflow用户

提问于 2022-09-10 00:04:26

回答 2查看 43关注 0票数 1

这是我的代码：

url_joueurs = ('https://www.basketball-reference.com/leagues/NBA_2022_per_game.html')
result = requests.get(url_joueurs).text
data = BeautifulSoup(result, 'html.parser')

comments = data.find_all(string=lambda text: isinstance(text, Comment))

tables = []
for each in comments:
   if 'table' in str(each):
       try:
           tables.append(pd.read_html(str(each), attrs = {'id': 'totals_stats'})[0])
           break
       except:
           continue
Stats_joueurs = tables
print(Stats_joueurs)

问题是它返回一个空列表(pd.df被输出到一个列表中)。

你知道问题出在哪里吗？

谢谢你。

web-scraping

python

回答 2

Stack Overflow用户

发布于 2022-09-10 05:46:59

这个问题在熊猫身上是可以解决的(三行代码)：

import pandas as pd

df = pd.read_html('https://www.basketball-reference.com/leagues/NBA_2022_per_game.html')[0]
print(df)

结果：

    Rk  Player  Pos     Age     Tm  G   GS  MP  FG  FGA     ...     FT%     ORB     DRB     TRB     AST     STL     BLK     TOV     PF  PTS
0   1   Precious Achiuwa    C   22  TOR     73  28  23.6    3.6     8.3     ...     .595    2.0     4.5     6.5     1.1     0.5     0.6     1.2     2.1     9.1
1   2   Steven Adams    C   28  MEM     76  75  26.3    2.8     5.1     ...     .543    4.6     5.4     10.0    3.4     0.9     0.8     1.5     2.0     6.9
2   3   Bam Adebayo     C   24  MIA     56  56  32.6    7.3     13.0    ...     .753    2.4     7.6     10.1    3.4     1.4     0.8     2.6     3.1     19.1
3   4   Santi Aldama    PF  21  MEM     32  0   11.3    1.7     4.1     ...     .625    1.0     1.7     2.7     0.7     0.2     0.3     0.5     1.1     4.1
4   5   LaMarcus Aldridge   C   36  BRK     47  12  22.3    5.4     9.7     ...     .873    1.6     3.9     5.5     0.9     0.3     1.0     0.9     1.7     12.9
...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...
837     601     Thaddeus Young  PF  33  TOR     26  0   18.3    2.6     5.5     ...     .481    1.5     2.9     4.4     1.7     1.2     0.4     0.8     1.7     6.3
838     602     Trae Young  PG  23  ATL     76  76  34.9    9.4     20.3    ...     .904    0.7     3.1     3.7     9.7     0.9     0.1     4.0     1.7     28.4
839     603     Omer Yurtseven  C   23  MIA     56  12  12.6    2.3     4.4     ...     .623    1.5     3.7     5.3     0.9     0.3     0.4     0.7     1.5     5.3
840     604     Cody Zeller     C   29  POR     27  0   13.1    1.9     3.3     ...     .776    1.9     2.8     4.6     0.8     0.3     0.2     0.7     2.1     5.2
841     605     Ivica Zubac     C   24  LAC     76  76  24.4    4.1     6.5     ...     .727    2.9     5.6     8.5     1.6     0.5     1.0     1.5     2.7     10.3

842 rows × 30 columns

票数 0

Stack Overflow用户

发布于 2022-09-10 08:22:36

虽然Barry为您提供了获取数据的代码，但没有解释代码的问题所在。有两个问题：

虽然这些reference.com站点在htnl注释中有一些表，但是这个特定的页面没有这种情况。您要查找的<table>标记在静态html中，而在html的注释中查找<table>标记。
甚至让bs4查找带有属性id="totals_stats"的<table>标记。在这个html中没有这样的表和属性。html属性中的表是id="per_game_stats".

。

如前所述，让熊猫为你解析桌子标签吧。然后执行一个简单的行来清除重复标题：

import pandas as pd

url_joueurs = ('https://www.basketball-reference.com/leagues/NBA_2022_per_game.html')
df = pd.read_html(url_joueurs)[0]
df = df[df['Rk'].ne('Rk')]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73668443

复制

相似问题

问requests.get找不到表
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问requests.get找不到表EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问requests.get找不到表
EN