我们之前介绍过含有多种插值方法的earthkit-grid,今天我们说一下它的bro,用于获取与读数据的erathkit-data。
earthkit-data 是由 ECMWF(欧洲中期天气预报中心)主导开发的开源 Python 库,专注于气象与气候科学领域的数据访问与处理。
它最大的特点是 format-agnostic(格式无关):同一套 API 可以同时处理 GRIB、NetCDF、BUFR 等多种格式,而无需关心底层细节。
核心设计理念包括:
from_source() 方法加载
这是官网的教程列表,可见已支持了大多数气象数据的格式
以下内容仅测试读取gfs的grib数据,其余数据请自行测试
!pip install earthkit-data -i https://pypi.mirrors.ustc.edu.cn/simple/
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/
Requirement already satisfied: earthkit-data in /opt/conda/lib/python3.11/site-packages (0.20.0)
Requirement already satisfied: cfgrib>=0.9.10.1 in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (0.9.14.1)
Requirement already satisfied: dask in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (2024.8.1)
Requirement already satisfied: deprecation in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (2.1.0)
Requirement already satisfied: earthkit-utils<0.99,>=0.2 in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (0.3.0)
Requirement already satisfied: eccodes>=1.7 in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (2.47.0)
Requirement already satisfied: entrypoints in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (0.4)
Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (3.29.1)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (3.1.4)
Requirement already satisfied: jsonschema in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (4.23.0)
Requirement already satisfied: lru-dict in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (1.4.1)
Requirement already satisfied: markdown in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (3.6)
Requirement already satisfied: multiurl>=0.3.3 in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (0.3.7)
Requirement already satisfied: netcdf4 in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (1.6.3)
Requirement already satisfied: pandas in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (2.2.3)
Requirement already satisfied: pdbufr>=0.11 in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (0.14.2)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (6.0.2)
Requirement already satisfied: tqdm>=4.63 in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (4.67.0)
Requirement already satisfied: xarray>=0.19 in /opt/conda/lib/python3.11/site-packages (from earthkit-data) (2024.3.0)
Requirement already satisfied: attrs>=19.2 in /opt/conda/lib/python3.11/site-packages (from cfgrib>=0.9.10.1->earthkit-data) (24.3.0)
Requirement already satisfied: click in /opt/conda/lib/python3.11/site-packages (from cfgrib>=0.9.10.1->earthkit-data) (8.1.7)
Requirement already satisfied: numpy in /opt/conda/lib/python3.11/site-packages (from cfgrib>=0.9.10.1->earthkit-data) (1.26.4)
Requirement already satisfied: array-api-compat in /opt/conda/lib/python3.11/site-packages (from earthkit-utils<0.99,>=0.2->earthkit-data) (1.14.0)
Requirement already satisfied: pint in /opt/conda/lib/python3.11/site-packages (from earthkit-utils<0.99,>=0.2->earthkit-data) (0.24.4)
Requirement already satisfied: cffi in /opt/conda/lib/python3.11/site-packages (from eccodes>=1.7->earthkit-data) (1.17.1)
Requirement already satisfied: findlibs in /opt/conda/lib/python3.11/site-packages (from eccodes>=1.7->earthkit-data) (0.0.5)
Requirement already satisfied: eccodeslib in /opt/conda/lib/python3.11/site-packages (from eccodes>=1.7->earthkit-data) (2.47.1.20)
Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from multiurl>=0.3.3->earthkit-data) (2.32.3)
Requirement already satisfied: pytz in /opt/conda/lib/python3.11/site-packages (from multiurl>=0.3.3->earthkit-data) (2024.1)
Requirement already satisfied: python-dateutil in /opt/conda/lib/python3.11/site-packages (from multiurl>=0.3.3->earthkit-data) (2.9.0.post0)
Requirement already satisfied: packaging>=22 in /opt/conda/lib/python3.11/site-packages (from xarray>=0.19->earthkit-data) (24.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/conda/lib/python3.11/site-packages (from pandas->earthkit-data) (2024.2)
Requirement already satisfied: cloudpickle>=3.0.0 in /opt/conda/lib/python3.11/site-packages (from dask->earthkit-data) (3.1.0)
Requirement already satisfied: fsspec>=2021.09.0 in /opt/conda/lib/python3.11/site-packages (from dask->earthkit-data) (2025.2.0)
Requirement already satisfied: partd>=1.4.0 in /opt/conda/lib/python3.11/site-packages (from dask->earthkit-data) (1.4.2)
Requirement already satisfied: toolz>=0.10.0 in /opt/conda/lib/python3.11/site-packages (from dask->earthkit-data) (1.0.0)
Requirement already satisfied: importlib-metadata>=4.13.0 in /opt/conda/lib/python3.11/site-packages (from dask->earthkit-data) (8.5.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.11/site-packages (from jinja2->earthkit-data) (3.0.2)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /opt/conda/lib/python3.11/site-packages (from jsonschema->earthkit-data) (2024.10.1)
Requirement already satisfied: referencing>=0.28.4 in /opt/conda/lib/python3.11/site-packages (from jsonschema->earthkit-data) (0.35.1)
Requirement already satisfied: rpds-py>=0.7.1 in /opt/conda/lib/python3.11/site-packages (from jsonschema->earthkit-data) (0.22.3)
Requirement already satisfied: cftime in /opt/conda/lib/python3.11/site-packages (from netcdf4->earthkit-data) (1.6.4)
Requirement already satisfied: zipp>=3.20 in /opt/conda/lib/python3.11/site-packages (from importlib-metadata>=4.13.0->dask->earthkit-data) (3.21.0)
Requirement already satisfied: locket in /opt/conda/lib/python3.11/site-packages (from partd>=1.4.0->dask->earthkit-data) (1.0.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil->multiurl>=0.3.3->earthkit-data) (1.17.0)
Requirement already satisfied: pycparser in /opt/conda/lib/python3.11/site-packages (from cffi->eccodes>=1.7->earthkit-data) (2.22)
Requirement already satisfied: eckitlib==2.0.7.20 in /opt/conda/lib/python3.11/site-packages (from eccodeslib->eccodes>=1.7->earthkit-data) (2.0.7.20)
Requirement already satisfied: platformdirs>=2.1.0 in /opt/conda/lib/python3.11/site-packages (from pint->earthkit-utils<0.99,>=0.2->earthkit-data) (4.3.6)
Requirement already satisfied: typing_extensions>=4.0.0 in /opt/conda/lib/python3.11/site-packages (from pint->earthkit-utils<0.99,>=0.2->earthkit-data) (4.12.2)
Requirement already satisfied: flexcache>=0.3 in /opt/conda/lib/python3.11/site-packages (from pint->earthkit-utils<0.99,>=0.2->earthkit-data) (0.3)
Requirement already satisfied: flexparser>=0.4 in /opt/conda/lib/python3.11/site-packages (from pint->earthkit-utils<0.99,>=0.2->earthkit-data) (0.4)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests->multiurl>=0.3.3->earthkit-data) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests->multiurl>=0.3.3->earthkit-data) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests->multiurl>=0.3.3->earthkit-data) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests->multiurl>=0.3.3->earthkit-data) (2024.12.14)
环境要求 Python ≥ 3.10。
基础安装仅包含核心功能,建议根据需求选择以下方式:
# 基础安装
# !pip install earthkit-data
# 安装全部可选依赖(推荐,支持 GRIB/NetCDF/BUFR/FDB 等所有格式)
# !pip install earthkit-data[all]
# 或按需安装特定格式支持
# !pip install earthkit-data[grib] # GRIB 格式
# !pip install earthkit-data[netcdf] # NetCDF 格式
# !pip install earthkit-data[bufr] # BUFR 格式
# !pip install earthkit-data[fdb] # FDB (Fields DataBase) 支持
# !pip install earthkit-data[polytope] # Polytope API 支持
from_sourceearthkit.data.from_source(<type>, <args>) 是整个库最核心的入口方法。通过统一的方式从各种来源加载数据。
支持的常见 type 包括:
类型 | 说明 | 示例场景 |
|---|---|---|
sample | 内置示例数据 | 快速体验、测试 |
file | 本地文件 | 读取本地 GRIB/NetCDF |
url | 远程 URL | 下载在线数据,支持 tar.gz 自动解压 |
stream | 数据流 | 处理标准输入或网络流 |
fdb | ECMWF FDB | 从 Fields DataBase 读取 |
mars | MARS 服务 | 通过 ECMWF MARS 远程访问 |
cds | CDS API | 从 Copernicus Climate Data Store 获取 |
polytope | Polytope | 通过 Polytope 服务访问数据 |
dummy-source | 虚拟数据源 | 生成测试用的假数据 |
import earthkit.data as ekd
ds = ekd.from_source("file", "/home/mw/input/GFS1824/gfs_4_20230902_0000_021.grb2")
print("类型:", type(ds))
print("字段数:", len(ds))
类型: <class 'earthkit.data.readers.grib.file.GRIBReader'>
字段数: 743
/opt/conda/lib/python3.11/site-packages/gribapi/__init__.py:23: UserWarning: ecCodes 2.42.0 or higher is recommended. You are running version 2.29.0
warnings.warn(
# 从 URL 加载(支持 tar.gz 自动解压)
url = "https://get.ecmwf.int/repository/test-data/earthkit-data/examples/test_gribs.tar.gz"
ds_url = ekd.from_source("url", url)
print("从 URL 加载后的类型:", type(ds_url))
print("字段数:", len(ds_url))
test_gribs.tar.gz: 0%| | 0.00/463k [00:00<?, ?B/s]
0%| | 0/2 [00:00<?, ?it/s]
从 URL 加载后的类型: <class 'earthkit.data.readers.grib.index.GribMultiFieldList'>
字段数: 6
GRIB 和适当的 NetCDF 文件在 earthkit-data 中被表示为 FieldList(字段列表),每个元素是一个 Field(水平切片,代表某一时刻、某一层的某个气象变量)。
以下方法可以快速了解数据的整体概况:
# 表格化快速摘要
ds.ls()

image
# 按 param 分组的详细描述
ds.describe()

image
FieldList 提供了类似 xarray 的 sel() 方法,可以按元数据(如参数名、层级、时效步长等)进行筛选。
# 按参数名筛选(param 是 GRIB 中常用的关键字)
t2m = ds.sel(param="2t")
print("筛选后字段数:", len(t2m))
t2m.ls()
筛选后字段数: 1

image
# 多条件筛选
subset = ds.sel(param="cin", levelist=9000)
print("多条件筛选后字段数:", len(subset))
subset.ls()

image
多条件筛选后字段数: 1
# 按元数据键排序
ordered = ds.order_by("param", "levelist")
print("排序后前5条:")
ordered.ls()
排序后前5条:

image
# 切片:取前3条
first3 = ds[:3]
print("切片后字段数:", len(first3))
first3.ls()
切片后字段数: 3

image
可以通过索引访问单个字段,查看其详细元数据或导出数据。
# 取第一条
f = ds[0]
print("类型:", type(f))
print("参数名:", f.metadata("param"))
print("层级:", f.metadata("levelist"))
print("时效步长:", f.metadata("step"))
类型: <class 'earthkit.data.readers.grib.codes.GribField'>
参数名: prmsl
层级: 0
时效步长: 21
# 完整 dump 所有元数据(namespace 常用 mars 或 default)
f.dump(namespace="mars")
earthkit-data 提供了多种常用导出接口,让你可以无缝衔接到 xarray、pandas、numpy 等生态工具中。
# 转为 xarray Dataset(气象分析最常用)
xa = subset.to_xarray()
print(xa)
<xarray.Dataset> Size: 2MB
Dimensions: (latitude: 361, longitude: 720)
Coordinates:
* latitude (latitude) float64 3kB 90.0 89.5 89.0 88.5 ... -89.0 -89.5 -90.0
* longitude (longitude) float64 6kB 0.0 0.5 1.0 1.5 ... 358.5 359.0 359.5
Data variables:
cin (latitude, longitude) float64 2MB ...
Attributes:
param: cin
paramId: 228001
levtype: unknown
date: 20230902
time: 0
levelist: 9000
Conventions: CF-1.8
institution: ECMWF
xa['cin'].plot()
<matplotlib.collections.QuadMesh at 0x7f2bd921c290>

output
多变量数据转换还是有点问题,等待进一步升级吧
earthkit-data 只是 earthkit 大家族的一员。完整的 earthkit 生态还包括:
子包 | 功能 |
|---|---|
earthkit-data | 数据访问与格式处理(本 notebook 主题) |
earthkit-maps | 地图投影与地理信息处理 |
earthkit-plots | 气象数据可视化 |
earthkit-regrid | 网格插值与重采样 |
earthkit-transforms | 数据变换与统计计算 |
earthkit-climate | 气候学分析与计算 |
这些包共同构成了一套 从数据获取 → 处理 → 分析 → 可视化 的完整气象科学工作流工具链。
总的来说这个工具比较工业化,适合做项目一条龙或者海量数据处理。
from_source() 是唯一的加载入口,屏蔽了底层格式差异FieldList / Field 是核心抽象,提供统一的元数据访问和数据导出接口sel() / order_by() / slice 让数据筛选变得非常直观to_xarray() / to_numpy() / to_pandas() 让你可以无缝接入现有的科学计算生态截至当前,earthkit-data 仍处于 Release Candidate 阶段,尚未发布稳定的 1.0 版本。API 可能在最终版前发生变动,生产环境使用时建议关注官方 Migration Guide。