Project: Training an AI to play Axis & Allies from scratch using AlphaZero-style self-play
Architecture:
Custom Rust game engine (PyO3 bindings to Python) running the full A&A ruleset ā purchases, combat moves, naval transports, amphibious assaults, strategic bombing, tech research, the works
Graph Neural Network (GNN) treating the 143-territory board as a graph with attention-based message passing
MCTS (Monte Carlo Tree Search) for move selection during self-play
Factorized action space across 5 heads: action_type, from_territory, to_territory, unit_type, quantity
Training pipeline:
Phase 1 ā Supervised learning: Trained on 5,000 games of ProAI (rule-based AI) replays (~18M samples) to bootstrap the network
Phase 2 ā Self-play: 195 generations, 1,460 games, the model playing against itself and learning from its own experience
Key milestones:
The AI learned amphibious assaults on its own through self-play ā averaging 12 per game across 17,500+ total amphibious attacks. This is a complex multi-step operation (load units onto transports, move fleet, unload in combat) that the network figured out through MCTS search
Supervised loss dropped from 3.80 ā 0.99 during initial training
100% Axis win rate vs the rule-based ProAI with MCTS search
Current challenge ā the Allied problem:
Axis play is strong, but Allied play is weak (only 89 Allied wins out of 1,460 self-play games)
This mirrors the real game's asymmetry ā Allies need coordinated multi-front strategy across 3 nations
The GNN-small (574K params) doesn't have enough capacity to learn this
Currently running: Upgrading from GNN-small (574K params) to GNN-large (8.6M params) ā supervised training is at chunk 90/100 on 5,000 ProAI games. Holdout accuracy: action_type 100%, unit_type 88.2%, quantity 96.8%. Should finish tonight, then we restart self-play with the bigger brain and see if 15x more parameters can crack the Allied strategy problem.
Small update of where the project stands. Keep in mind I am a vibe coder! So while I did my best to do this there still maybe problems I don't know. Also Anthropic got back to me: Hi Christopher,
Thanks for reaching out.
Training a narrow, task-specific CNN for game AI does not conflict with our Usage Policy, which is primarily aimed at preventing competitive model distillation.
This authorization covers the use case as described. Individuals and hobbyists are welcome to use the Claude API. However, please note that use of the API is subject to our Commercial Terms of Service, regardless of whether you are an individual or representing a company.
ā
Best,
Anthropic Privacy Team
I will be having Opus 4.6 taking a crack at some games to get some better data! Here was some napkin math for the Opus games: Good question. Let me estimate based on what a game looks like through the API:
Per decision point:
Input: ~8-15K tokens (strategy prompt + board state + legal moves list)
Output: ~200-500 tokens (move selection + brief reasoning)
Per game (~150 strategic decisions per side after --auto-tactical):
Input: ~1.5M tokens
Output: ~50K tokens
Opus 4.6 API pricing ($15/M input, $75/M output):
Games Input Cost Output Cost Total
1 ~$22 ~$3.75 ~$26
10 ~$220 ~$37 ~$260
50 ~$1,100 ~$188 ~$1,300
But consider: If the llm_vs_llm.py accumulates conversation history (likely), later decisions get much more expensive as context grows. Resetting context each turn keeps costs down.
Cheaper alternatives that might still be valuable:
Sonnet 4.6 ($3/$15 per M): Same games for ~$50-60 for 10 games ā 5x cheaper
Haiku 4.5 ($0.80/$4 per M): ~$15 for 10 games ā but quality drops
Is it worth it? For this project specifically, even 10 high-quality LLM games as Allies could be extremely valuable ā the model is starving for good Allied play examples. ProAI plays Allies poorly (10% win rate), so LLM games would be a completely different distribution of Allied strategy. You could mix those into the training buffer as "golden data."
My recommendation: start with 5-10 games on Sonnet (~$30-60) to test quality, then do a batch on Opus if the moves look smart. Would you want me to check on the llm_vs_llm.py setup to see how it's structured for cost?