Conversation
Phase 1 of the matgl LAMMPS-Kokkos plugin. Adds matgl.ext.lammps.LAMMPSMatGLModel, a TorchScript-friendly nn.Module wrapping a TensorNet (PyG, extensive, no-Warp) Potential. Forward takes Cartesian positions, edge_index, integer image shifts, cell, atomic numbers, and a local-or-ghost mask, and returns total local energy, per-atom energies, forces, and virials via autograd of the strain trick. Mirrors the MACE-LAMMPS contract so a future pair_matgl[/kokkos] can load the artifact via torch::jit::load. Includes: - src/matgl/ext/_lammps.py + public matgl.ext.lammps re-export. - mgl create-lammps-model CLI subcommand. - 4 parity tests (eager vs Potential.forward, ghost-mask additivity, virial vs stress*volume, TorchScript save/load round-trip). Upstream PyG layers gained explicit type annotations and a few minor TorchScript-friendliness tweaks (tuple instead of dict in TensorEmbedding.message, edge_index[1] indexing, local pi in cosine_cutoff). No runtime behaviour changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the LAMMPS-Kokkos plugin. Adds the C++ side that loads the
TorchScript artifact produced in Phase 1.
- lammps/src/ML-MATGL/pair_matgl.{cpp,h}: CPU/serial pair style. Requests
a full neighbor list with ghost atoms, builds (edge_index, unit_shifts,
positions, cell, atomic_numbers, local_or_ghost) tensors, calls the
scripted forward, and accumulates forces + the 3x3 virial back into
LAMMPS arrays.
- lammps/cmake/ML-MATGL.cmake: drop-in CMake snippet users `include()`
from a stock LAMMPS source tree. Toggles PKG_ML-MATGL, finds libtorch
via CMAKE_PREFIX_PATH, sets C++17.
- lammps/README.md: build instructions, input syntax, known limitations.
- lammps/tests/in.matgl_si: sample single-point deck.
- lammps/tests/python_reference.py: gold reference helper for parity
checks against the Python ASE calculator.
Also extends .codespellrc to allow the bare element symbols "Nd" and "Te"
that appear in the inline periodic table inside pair_matgl.cpp.
Phase 3 (pair_matgl/kokkos GPU variant) is not in this commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of the LAMMPS-Kokkos plugin.
- lammps/src/KOKKOS/pair_matgl_kokkos.{cpp,h}: templated Kokkos variant
registered as `pair_style matgl/kk`. Walks the device-side neighbor
list with two parallel_for / parallel_scan kernels to count and fill
edge_index/unit_shifts on-device, wraps the resulting Kokkos Views as
torch tensors via from_blob, runs the scripted forward inside libtorch,
and scatters forces back into LAMMPS' Kokkos f-view. Coordinated GPU
pick from `lmp -k on g 1`. Single-GPU only — multi-rank Kokkos with
libtorch is unreliable (MACE issues #1294, #322); the CMake snippet
warns explicitly.
- lammps/cmake/ML-MATGL-KOKKOS.cmake: drop-in snippet, layered on top
of the base ML-MATGL.cmake.
- .github/workflows/lammps-build.yml: builds the CPU pair style inside
the public `lammps/lammps-build:ubuntu_latest` Docker image, drops a
CXX11-ABI libtorch tarball next to the source tree, configures LAMMPS
with PKG_ML-MATGL=ON, exports a tiny in-tree TensorNet via
LAMMPSMatGLModel, runs in.matgl_si, and diffs PotEng against a Python
reference written by the same export step.
- README + changes.md: document the Kokkos build flow, the CI workflow,
and the single-GPU constraint.
CUDA CI (actual GPU runs) is deferred to a self-hosted runner — the
plain ubuntu-latest GitHub runner has no GPU.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mage The lammps/lammps:ubuntu_latest Docker image is pinned to an old LAMMPS release whose Pair / neighbor-list APIs predate the ones pair_matgl relies on. Switch to a vanilla ubuntu-latest runner that: - apt-installs build deps (gcc-12, cmake, ninja, fftw, libpng/jpeg). - Caches the libtorch download and the LAMMPS source clone separately so warm runs only pay for the package compile. - Builds against both ``stable_29Aug2024_update3`` and ``develop`` via a matrix so we get an early warning when LAMMPS' APIs drift. - Uses Ninja and -j 2 for a faster compile. - Uploads logs + model artifacts on failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch from a vanilla ubuntu-latest runner with apt-installed deps to the materialyzeai/lammps:latest image, which is expected to ship LAMMPS source, libtorch, and the build toolchain pre-staged. The workflow probes a few common paths for the LAMMPS source and libtorch inside the image, falls back to downloading/cloning if either is missing, and threads the resolved paths through to the configure / run / diff steps. This keeps the from-scratch path working as a safety net while the image bakes in the heavy dependencies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The materialyzeai/lammps image is a runtime image and doesn't ship curl, unzip, or git, which the rest of the workflow depends on (actions/checkout shells out to git; libtorch is downloaded with curl + unzip; the uv installer is curl-piped). Add a first step that detects the package manager and installs the minimal tooling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The materialyzeai/lammps image ships the `lmp` binary but none of the build toolchain, so the previous run failed at the cmake step with "Ninja not found" / "CXX compiler not set". Extend the bootstrap step to install gcc/g++, cmake, ninja-build, pkg-config, FFTW/PNG/JPEG dev headers, and a Python dev/venv for uv. Covers apt, dnf, and apk package managers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onfig The CPU libtorch tarball ships a Caffe2 cmake config that references MKL_INCLUDE_DIR in its INTERFACE_INCLUDE_DIRECTORIES even though it doesn't link against MKL. Without MKL on the system the generator step fails with "non-existent path MKL_INCLUDE_DIR-NOTFOUND". Pass an existing harmless path to satisfy the imported target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…egisters The previous run built pair_matgl.cpp without errors but LAMMPS failed with `Unrecognized pair style 'matgl'`: LAMMPS' style-header generator only scans top-level src/*.h plus *enabled* package subdirs, and ML-MATGL isn't in LAMMPS' standard package list, so our PairStyle() macro never got into style_pair.h. Copy the .cpp/.h straight into LAMMPS' src/ for CI (the production build instructions still use src/ML-MATGL/), and switch to a minimal inline cmake fragment that just adds Torch to the link line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`fx`, `fy`, `fz` are not thermo_style keywords (they're dump-style keywords). The CI deck failed at LAMMPS' thermo parser; replace the per-atom force columns with stress tensor components and emit the forces via a `dump custom` instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Just a note that the failing tests are due to the new uploaded TensorNet pre-trained models. It has nothing to do iwth the LAMMPS stuff. |
Restructures the LAMMPS export wrapper from a single TensorNet-specific flow into a kernel-composition pattern: the outer ``LAMMPSMatGLModel`` holds the strain / autograd machinery (single source of truth) and an architecture-specific ``_TensorNetKernel`` or ``_M3GNetKernel`` for the per-atom feature compute. The M3GNet kernel needed several pure-tensor replacements before it could survive ``torch.jit.script`` + ``save``: - ``create_line_graph_torch`` / ``_compute_3body_indices_torch`` — vectorized port of M3GNet's numpy/Python-loop 3-body indexer; assumes edges are sorted by source atom (already the case for both ``find_points_in_spheres`` and our LAMMPS neighbor walk). - ``_y_l0_torch`` — Legendre-polynomial recurrence for Y_l^0(cos θ). - ``_m3gnet_three_body_basis_torch`` — combines ``spherical_bessel_smooth`` (already pure tensor) with the Y_l^0 basis using the ``use_phi=False`` recipe from ``combine_sbf_shf``. - The kernel skips ``model.bond_expansion`` and ``model.basis_expansion`` entirely; those wrap ``SphericalBesselFunction``, whose sympy-lambdified Python ``funcs`` list doesn't survive ``script.save``. Other small TorchScript-friendliness tweaks: - ``MLP`` properties marked ``@torch.jit.unused`` (they use ``reversed(ModuleList)`` which TorchScript rejects). - ``EmbeddingBlock.forward`` typed; ``state_attr`` properly handled as Optional. - ``M3GNetGraphConv.state_update_`` narrows ``state_update_func: Optional[Module]`` via assert; ``self.dropout`` ``is not None`` checks made explicit. - ``spherical_bessel_smooth`` no longer relies on closed-over module globals (``pi``, ``sqrt``). - Iteration over the M3GNet block lists uses ``zip(...)`` instead of integer-indexing both ``ModuleList``s (TorchScript requires literal indices). Currently supported M3GNet config: extensive head, SphericalBessel rbf with ``use_smooth=True``, ``use_phi=False`` — the standard PES setup. Tests parametrize ``test_lammps_export.py`` over both TensorNet and M3GNet (8 tests, all green); the eager and scripted paths match ``Potential.forward`` to fp-precision on a 4-atom Mo-S supercell. QET remains deferred (needs charge-head export + LAMMPS-side ``atom_style charge`` wiring). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…MatPES-PBE-2025.2 weights. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #770 +/- ##
==========================================
- Coverage 95.16% 94.72% -0.44%
==========================================
Files 81 83 +2
Lines 7388 7647 +259
==========================================
+ Hits 7031 7244 +213
- Misses 357 403 +46 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
# Conflicts: # changes.md
shyuep
left a comment
There was a problem hiding this comment.
Automated PR review generated by Claude (Opus 4.7) on behalf of @shyuep. Based purely on the diff and current GitHub Actions state — no code was executed locally.
This is a substantial and well-scoped piece of work — three logically separate phases (Python TorchScript export, CPU pair_matgl, Kokkos pair_matgl/kk) all delivered together with a real CI gate. Mirroring the MACE-Kokkos pattern is the right design choice.
CI status: all functional checks green — pair_matgl (CPU) build + in.matgl_si parity diff against the Python reference is the most important signal here, and that's passing in 5m52s. lint, test_pyg, test_dgl, coverage, and pre-commit.ci are all green. The only red marks are codecov/patch and codecov/project, expected for a large addition where ~all of lammps/ (C++) and the Kokkos kernels are not covered by Python tests. Not a blocker.
Strengths:
- The 4 parity tests in
tests/ext/test_lammps_export.py(eager vsPotential.forward, ghost-mask additivity, virial vsstress · volume, TorchScript round-trip) hit exactly the right invariants. These are the things that silently break when a wrapper drifts. - Honest scoping in the description re: single-GPU Kokkos limitation, with concrete upstream MACE issue references (#1294, #322). Good engineering hygiene.
materialyzeai/lammps:latestDocker image + cached libtorch / LAMMPS source keeps CI runtime sane.1e-3 eVPotEng tolerance against the Python reference is a tight, meaningful gate.
Things to verify before announcing the feature (already in your test plan but flagging for the record):
- Single-GPU
pair_matgl/kkparity check on real CUDA hardware — the Kokkos variant is unexercised in CI. Consider standing up a self-hosted CUDA runner or at minimum a documented manual smoke procedure committed underlammps/tests/so it's reproducible. - End-to-end run with a real foundation potential (
materialyze/TensorNet-MatPES-r2SCAN) before announcing — the in-tree tiny TensorNet exercises the plumbing but doesn't catch numerical-stability quirks at production scale. - Multi-rank guard: since multi-rank Kokkos+libtorch is unreliable, a clear
error->all(...)ifcomm->nprocs > 1inpair_matgl/kksetup would protect users from surprising hangs. (Worth checking whether this is already inpair_matgl_kokkos.cpp.)
Minor / non-blocking:
src/matgl/ext/_lammps.pyis 498 lines — fine for now, but if the export-wrapper logic grows it might be worth splitting theLAMMPSMatGLModelclass from the helper functions (graph construction, ghost-mask handling).- Pre-existing
_pymatgen_pyg.pymypy errors are noted as out-of-scope; consider opening a follow-up issue to track those so they don't get forgotten. - The PyG layer tweaks in
_basis.py,_core.py,_embedding*.py,_graph_convolution_pyg.py,cutoff.pyfor TorchScript-friendliness are pure additions per the description and the 178-test PyG suite stays green — good. Worth one targeted test that explicitly TorchScripts the resultingPotentialend-to-end (not just the wrapper) so future PyG-layer changes can't silently break scriptability.
Overall: ready to merge once the items in your own test-plan checklist (single-GPU Kokkos parity, real foundation-potential end-to-end run) are checked off. The Phase 1+2 surface is in good shape.
Signed-off-by: Shyue Ping Ong <shyuep@users.noreply.github.com>
Signed-off-by: Shyue Ping Ong <shyuep@users.noreply.github.com>
|
🤖 Automated PR review (generated by Claude on behalf of @shyuep)
Generated by Claude Code |
shyuep
left a comment
There was a problem hiding this comment.
Automated PR review generated by Claude (Opus 4.7) on behalf of @shyuep. Follow-up note — no code was executed locally.
Since the prior automated review at 6b4edfb, the only new commits on this branch are four Merge branch 'main' into lammps merges (5df9481, 3faa2c8, c32dcac, af0f6ed) — no new functional changes. The substantive review still stands. Two CI signals worth flagging from the latest run:
- New
lintfailure (real, from a ruff bump): ruff0.15.12now flagsRET504 Unnecessary assignment to 'atomic' before 'return' statementin _lammps.py:255. One-line fix — drop the intermediateatomicvariable and return the expression directly. Thepre-commit.cifailure is the same rule via the autofix path. test_pyg (ubuntu-latest, 3.13, ops)is not a real failure — the runner hitremote: Internal Server Error(HTTP 500) ongit fetch ... pull/770/mergethree times in a row before the job aborted. Pure GitHub infra flake; re-run after the lint fix lands.
codecov/patch and codecov/project remain red for the same reason called out previously (C++/Kokkos kernels uncovered by Python tests) — non-blocking.
Otherwise unchanged: ready to merge once the items in your own test-plan checklist (single-GPU Kokkos parity, real foundation-potential end-to-end run) are checked off.
…kernel The pretrained TensorNet-MatPES-r2SCAN checkpoint uses rbf_type='SphericalBessel' with smooth=True. SphericalBesselFunction.forward dispatches to _call_smooth_sbf, which is @torch.jit.ignore (it iterates a list of sympy-lambdified callables), so torch.jit.script(...).save(...) errors with "Could not export Python function call '_call_smooth_sbf'". The previous _TensorNetKernel blindly stored model.bond_expansion, and the existing test only exercised rbf_type='Gaussian', missing this path. Add _SmoothSBFExpansion, a tiny nn.Module wrapping the pure-tensor spherical_bessel_smooth (already used by _M3GNetKernel for the same reason; output matches BondExpansion(SphericalBessel, smooth=True) to ~1e-7). _TensorNetKernel.__init__ swaps in this adapter when rbf_type is SphericalBessel, raises a clear NotImplementedError for the non-smooth case, and passes through Gaussian/ExpNorm bond_expansions unchanged. Test parametrization gains a tensornet_sb fixture exercising the SBF-smooth path; the export CLI now produces a loadable artifact for TensorNet-PES-MatPES-r2SCAN-2025.2. Also inline a stray `atomic = ...; return atomic` in _M3GNetKernel.forward that the new test_lammps_export run surfaced under ruff (RET504). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nn.Module's __getattr__ types attribute access as Tensor | Module; the else branch assigning ``be`` directly to ``self.bond_expansion: nn.Module`` tripped --strict mypy. Cast at the point of access (same pattern used for potential.data_mean elsewhere in this file). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🤖 Automated PR review (generated by Claude on behalf of @shyuep) — re-review after the latest commits. Two new functional commits since the prior review (
Generated by Claude Code — automated review, no code executed locally. Generated by Claude Code |
|
🤖 Automated PR review (generated by Claude on behalf of @shyuep) — re-review after the latest commits ( Reviewed the delta since the prior review from the GitHub diff + CI only — no code executed locally.
LGTM. |
Summary
Adds a LAMMPS plugin that runs MatGL TensorNet (PyG) potentials at production scale, mirroring the MACE-Kokkos pattern. Three phases:
matgl.ext.lammps.LAMMPSMatGLModelwraps aPotentialso it takes Cartesian positions, edge_index, integer image shifts, cell, atomic numbers, and a ghost mask, and returns total local energy / per-atom energies / forces / virials. Newmgl create-lammps-modelCLI subcommand. 4 parity tests (eager vsPotential.forward, ghost-mask additivity, virial vs stress·volume, TorchScript save/load round-trip).pair_matglCPU pair style underlammps/src/ML-MATGL/. Requests a full neighbor list with ghost atoms, builds (positions / edge_index / unit_shifts / cell / atomic_numbers / local_or_ghost) tensors, calls the scriptedforward, accumulates forces + 3×3 virial back into LAMMPS arrays.pair_matgl/kkunderlammps/src/KOKKOS/. Twoparallel_for/parallel_scankernels build edges on-device; Kokkos Views are wrapped as torch tensors viatorch::from_blob(zero-copy on matching device); device picked fromlmp -k on g 1. Single-GPU only — multi-rank Kokkos with libtorch is unreliable (MACE issues #1294, Bump boto3 from 1.34.160 to 1.34.161 #322).CI
A new
lammps-buildGitHub Actions workflow runs on every push that toucheslammps/, the export wrapper, or the workflow itself. It uses thematerialyzeai/lammps:latestDocker image, installs the build toolchain, downloads CXX11-ABI libtorch (cached), clones LAMMPS source (cached), exports a tiny in-tree TensorNet viaLAMMPSMatGLModel, builds LAMMPS with thepair_matglsource dropped intosrc/, runs thein.matgl_sideck, and diffsPotEngagainst the Python reference within1e-3 eV.The Kokkos variant is not exercised in CI — GitHub-hosted runners have no GPU. Hardware-accelerated CI on a self-hosted CUDA runner is the next step.
Upstream PyG layer tweaks
A few files in
src/matgl/layers/_*_pyg.pyandsrc/matgl/utils/cutoff.pygot minor TorchScript-friendliness changes (explicit type annotations, returning a tuple instead of aDict[str, Tensor], locally-definedpi). Pure additions, no runtime behavior changed; the full PyG test suite (178 tests) still passes.Test plan
uv run pytest tests/ext/test_lammps_export.py— 4 tests, all green.uv run pytest tests/— 178 passed, 31 backend-skipped.uv run mypy -p matgl— only the 2 pre-existing_pymatgen_pyg.pyerrors remain.uv run ruff check src tests lammps— clean.lammps-buildGitHub Actions workflow — green (run 25344207159, 5m52s end-to-end including LAMMPS source build).pair_matgl/kkparity check — needs CUDA hardware; deferred to follow-up.materialyze/TensorNet-MatPES-r2SCAN) — recommended before announcing the feature.🤖 Generated with Claude Code