Machine learning for computational materials discovery — benchmarking ML models on crystal stability prediction, thermodynamic property regression, and high-throughput materials screening. Covers graph neural network interatomic potentials, compositional feature engineering, and discovery-rate evaluation frameworks. Built on the Matbench and Matbench Discovery benchmark suites from the Materials Project.
Domain: ML for Materials Discovery
Application of machine learning — particularly graph neural networks and gradient boosting on compositional/structural descriptors — to predict materials properties (formation energy, band gap, elastic moduli, thermodynamic stability) and accelerate computational screening of novel inorganic crystals. Benchmarked against DFT ground truth on standardized datasets from the Materials Project.
Period: 2020–present (ML era of materials informatics)
Population: Inorganic crystalline materials (oxides, sulfides, intermetallics, etc.); benchmark datasets derived from the Materials Project and WBM database (256,963 materials)
Level: material
Constructs
formation_energy_per_atom
Formation Energy per Atom
The energy released or required to form a crystal from its constituent elements in their standard reference states, normalized by the number of atoms. Measured in eV/atom via DFT calculations. The primary regression target in materials property prediction benchmarks.
energy_above_convex_hull
Energy Above Convex Hull
The thermodynamic distance of a material from the convex hull of stable phases in compositional space, measured in eV/atom. Materials with e_above_hull = 0 are thermodynamically stable; positive values indicate metastability. The key stability criterion in high-throughput screening.
thermodynamic_stability
Thermodynamic Stability
Binary classification of whether a crystal is thermodynamically stable (on the convex hull) or not. In Matbench Discovery, 15.3% of WBM test structures are stable. The primary classification target for discovery benchmarks.
discovery_acceleration_factor
Discovery Acceleration Factor (DAF)
The ratio of a model's precision at top-k screening relative to random selection baseline. Quantifies how much faster a model identifies stable materials compared to untargeted DFT calculation. A DAF of 6 means 6x more discoveries per DFT calculation than random. Primary efficiency metric in Matbench Discovery.
band_gap
Band Gap
The energy difference between the valence band maximum and conduction band minimum in a crystalline material, measured in eV via DFT (PBE functional). Determines whether a material is metallic (0 eV), semiconducting, or insulating. A key target in Matbench regression tasks.
gnn_interatomic_potential
Graph Neural Network Interatomic Potential (GNN-IP)
A machine-learned force field that maps crystal graph inputs to total energies, atomic forces, and stresses using message-passing neural networks. Trained on DFT trajectories (e.g., MPtrj ~1.6M structures), enabling geometry optimization at DFT accuracy but orders of magnitude faster. Examples: M3GNet, CHGNet, MACE-MP, SevenNet.
mean_absolute_error_materials
Mean Absolute Error (MAE) for Property Prediction
Primary regression metric in Matbench: average absolute difference between predicted and DFT-computed material properties (eV/atom for energies, eV for band gaps, GPa for moduli). Lower is better; state-of-the-art models achieve ~0.02–0.05 eV/atom for formation energy.
Findings
GNN interatomic potentials (MACE-MP, CHGNet, SevenNet) achieve Discovery Acceleration Factors of 5–6x on the WBM test set, compared to ~1x for random baseline and ~2x for simpler one-shot GNN predictors like MEGNet.
Direction: positive
Confidence: strong
Method: benchmark evaluation on WBM holdout set (N=10,000 unique prototypes)
Only 15.3% of WBM test structures are thermodynamically stable (on or within 0 meV/atom of the convex hull), establishing the random discovery baseline for computing DAF.
Direction: null
Confidence: strong
Method: DFT convex hull analysis of WBM dataset (256,963 materials)
Models trained on geometry-relaxed structures significantly outperform those using unrelaxed (initial) structures for stability prediction, demonstrating that structural relaxation quality is a key bottleneck.
Direction: positive
Confidence: strong
Method: ablation comparison: relaxed vs. unrelaxed inputs across 45 model submissions
Graph neural network models (coGN, coNGN, MEGNet) systematically outperform composition-only models on structure-dependent properties like elastic moduli and phonon frequencies, while performing comparably on formation energy where composition is highly predictive.
Direction: positive
Confidence: strong
Method: cross-validated MAE comparison across 14 Matbench tasks, 28 algorithms
Gradient boosted trees with Magpie compositional features achieve competitive performance on formation energy prediction (MAE ~0.08 eV/atom) despite requiring no structural information, demonstrating the strength of composition-based features for chemically smooth properties.
Direction: positive
Confidence: moderate
Method: Matbench cross-validation, gradient boosting with Magpie featurization
CHGNet, trained on 1.5M MPtrj DFT trajectory frames with magnetic moment supervision, achieves force MAE of ~0.06 eV/Å and correctly predicts DFT-relaxed structure energies within ~0.03 eV/atom for the majority of Materials Project entries.
Direction: positive
Confidence: strong
Method: held-out test set evaluation on Materials Project data; phonon benchmark
GNN interatomic potentials (MACE-MP, CHGNet, SevenNet) achieve Discovery Acceleration Factors of 5–6x on the WBM test set, vs ~1x for random baseline and ~2x for simpler one-shot GNN predictors.
Direction: positive
Confidence: strong
Method: benchmark evaluation on WBM holdout set (N=10,000)
Only 15.3% of WBM test structures are thermodynamically stable, establishing the random discovery baseline for computing DAF.
Direction: null
Confidence: strong
Method: DFT convex hull analysis of WBM dataset (256,963 materials)
Graph neural network models (coGN, coNGN, MEGNet) systematically outperform composition-only models on structure-dependent properties like elastic moduli and phonon frequencies, while performing comparably on formation energy.
Direction: positive
Confidence: strong
Method: cross-validated MAE comparison across 14 Matbench tasks, 28 algorithms
CHGNet trained on 1.5M MPtrj DFT trajectory frames achieves force MAE of ~0.06 eV/Å and energy MAE of ~0.03 eV/atom on Materials Project held-out entries.
Direction: positive
Confidence: strong
Method: held-out test evaluation on Materials Project data
Graph neural network models systematically outperform composition-only models on structure-dependent properties like elastic moduli and phonon frequencies, while performing comparably on formation energy.
Direction: positive
Confidence: strong
Method: cross-validated MAE comparison across 14 Matbench tasks, 28 algorithms
Details
Domain: ML for Materials Discovery
Application of machine learning — particularly graph neural networks and gradient boosting on compositional/structural descriptors — to predict materials properties (formation energy, band gap, elastic moduli, thermodynamic stability) and accelerate computational screening of novel inorganic crystals. Benchmarked against DFT ground truth on standardized datasets from the Materials Project.
Temporal scope: 2020–present (ML era of materials informatics) | Population: Inorganic crystalline materials (oxides, sulfides, intermetallics, etc.); benchmark datasets derived from the Materials Project and WBM database (256,963 materials)
Key Findings
- GNN interatomic potentials (MACE-MP, CHGNet, SevenNet) achieve Discovery Acceleration Factors of 5–6x on the WBM test set, compared to ~1x for random baseline and ~2x for simpler one-shot GNN predictors like MEGNet. (positive, strong)
- Only 15.3% of WBM test structures are thermodynamically stable (on or within 0 meV/atom of the convex hull), establishing the random discovery baseline for computing DAF. (null, strong)
- Models trained on geometry-relaxed structures significantly outperform those using unrelaxed (initial) structures for stability prediction, demonstrating that structural relaxation quality is a key bottleneck. (positive, strong)
- Graph neural network models (coGN, coNGN, MEGNet) systematically outperform composition-only models on structure-dependent properties like elastic moduli and phonon frequencies, while performing comparably on formation energy where composition is highly predictive. (positive, strong)
- Gradient boosted trees with Magpie compositional features achieve competitive performance on formation energy prediction (MAE ~0.08 eV/atom) despite requiring no structural information, demonstrating the strength of composition-based features for chemically smooth properties. (positive, moderate)
- CHGNet, trained on 1.5M MPtrj DFT trajectory frames with magnetic moment supervision, achieves force MAE of ~0.06 eV/Å and correctly predicts DFT-relaxed structure energies within ~0.03 eV/atom for the majority of Materials Project entries. (positive, strong)
- GNN interatomic potentials (MACE-MP, CHGNet, SevenNet) achieve Discovery Acceleration Factors of 5–6x on the WBM test set, vs ~1x for random baseline and ~2x for simpler one-shot GNN predictors. (positive, strong)
- Only 15.3% of WBM test structures are thermodynamically stable, establishing the random discovery baseline for computing DAF. (null, strong)
…and 3 more findings