Can Europe train a frontier AI model on the compute it owns?
摘要
该报告(EuroMesh)分析认为,相比等待新建 1GW 级别数据中心(平均需 7.6 年并网),利用 EuroHPC 等现有公共算力进行低通信频率(DiLoCo 风格)的联邦训练,可将顶级模型的交付时间提前至 2028 年。核心逻辑在于“并网时间差”带来的收益超过了“训练效率损耗”。报告同时也指出,该方案面临算力异构、政治协调难度以及超大规模分布式训练尚未在百亿参数以上得到验证等现实挑战。
荐读理由
针对大规模算力受电力并网周期(平均 7.6 年)制约的现实,你可以参考其对 DiLoCo 联邦训练架构的量化可行性分析,作为在分散或异构算力环境下构建大模型的工程决策依据。
原文
EuroMesh
A sourced model and short report on a single question:
Can Europe stand up a sovereign frontier-class AI model now, by federating the public compute it already owns, while the gigawatt datacenters it is planning take years to connect to the grid?
The answer the model gives is yes, as a stopgap. Europe already operates tens of exaflops of public AI compute across the EuroHPC supercomputers and the national AI Factories. A 1 GW campus, by contrast, waits a mean of 7.6 years for grid power. Federated with low-communication (DiLoCo-style) training, the compute Europe already has can deliver a frontier-class model around 2028, against around 2033 for a new gigawatt campus.
Read this first
The report is paper/compute-at-home.pdf (built from paper/compute-at-home.md). It is a short, sourced read aimed at a general audience. Title: "Do We Need OpenAI or Anthropic? Europe Has Tens of Exaflops at Home."
What is in the repo
euromesh/
├── README.md
├── requirements.txt
├── paper/
│ ├── compute-at-home.md / .pdf the report
│ ├── grid_queue_dataset.md sourced 1 GW vs 40 MW grid-connection lead times
│ ├── eurohpc_substrate.md sourced EU public-compute inventory + "is it enough" math
│ ├── build_pdf.sh, _report.typ PDF build (pandoc + typst)
│ └── figures/ generated charts (PNG + SVG)
└── model/
├── MODEL_SPEC.md the model specification (equations, params, invariants)
├── RESULTS.md full results, scenarios, sensitivity, caveats
├── run.py regenerates every CSV and figure
├── src/ the three-layer model (efficiency, ramp, regions)
├── params/ hardware.yaml, training.yaml, regions.csv + SOURCES
├── results/ generated CSVs (do not hand-edit)
└── tests/ pytest suite (52 tests) + invariant self-checks
The model in one paragraph
Three layers. Layer 1 is the per-FLOP efficiency of low-communication training (how much the DiLoCo penalty costs). Layer 2 is time-to-availability (when sites energize and how fast cumulative compute accrues). Layer 3 is a per-region scorecard on time, cost, carbon, and feasibility. The headline result is set almost entirely by Layer 2: it reduces to one inequality, the federation wins if its sites are online before a gigawatt campus is. The training efficiency penalty is second-order, confirmed by the sensitivity tornado.
Run it
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/python -m model.run # regenerates all CSVs in model/results and figures in paper/figures
.venv/bin/python -m pytest model/tests/ # 52 passed
bash paper/build_pdf.sh # rebuilds paper/compute-at-home.pdf (needs pandoc + typst)
The run is reproducible from a clean tree: deleting every output and re-running exits 0 and regenerates everything.
Data and sources
Grid-connection lead times:
paper/grid_queue_dataset.md, seven regions, per-region primary sources, anchored by the AWS "up to seven years" statement and the IEA 2-to-10-year range, with limitations stated.EU public compute:
paper/eurohpc_substrate.md, the EuroHPC flagships and the 19 AI Factories, accelerator counts and the training-time math.Model parameters:
model/params/SOURCES.mdandmodel/params/SOURCES_hardware_training.md, with confidence tags.
Honest caveats
The point of this repo is clarity, not novelty. The thesis rests on grid-queue lead times, which are sourced central estimates rather than observed figures (no European operator has yet energized a 1 GW point load). The compute is owned but not yet usable for one coordinated run: the EuroHPC machines are shared, batch-scheduled, and heterogeneous, so the addressable fraction is a political decision rather than a hardware fact. Frontier-scale distributed training is unproven above about 10B parameters today, so the target is a credible frontier-class model rather than a guaranteed 405B. All of this is in model/RESULTS.md and the report's caveats section. Figures and dated events are as of June 2026. This is an independent model and analysis, not peer-reviewed.
这条对你有帮助吗?