Game Engine Rules (AI Training)

Kindwind

@rogercooper no nothing against the triplea ai yet. have to do some different work for that to happen. had to ask what the 1st principals of axis and allies were. Here is where it stands right now this is posted at 9:09 pm EST: What ProAI Does (The Heuristic)
It's a plan-caching agent — computes a full phase plan on first call, then drains it move-by-move. Key logic:

Purchasing: Assesses threat vs defense around factories. Three stances: Panic (80% infantry), Stabilized (mixed), Dominating (30% naval). USA gets a hardcoded Kill Germany First (KGF) override with 25% transports.
Attack planning: 3-pass system — first grab overwhelming wins, then standard wins with "can we hold it?" simulation, then risky VC/capital grabs.
Defense: Priority heatmap centered on capitals (11x multiplier). Includes a "Moscow Fighter Sink" — if Russia is in trouble, UK/USA fly all aircraft toward Moscow.
Amphibious: Yes, it HAS a shuck pipeline (load → sail → unload), with massive score bonuses for territories near Berlin (+25). But it's greedy — one route at a time.
Funny/Quirky Things It Does
Capital hoarding — Places ALL purchased units at the capital factory (5x multiplier). Russia stacks 15 infantry in Moscow while Leningrad falls.

Moscow Fighter Sink — UK and USA will abandon their own theaters to fly every fighter to Moscow if Russia is in trouble. Imagine Britain pulling fighters from North Africa while Germany walks into London.

Transport eagerness — Loads ALL ground units onto transports immediately, leaving ports completely undefended. "Everyone on the boat! ...who's guarding the harbor?"

Casualty order is context-blind — Always sacrifices cheapest unit first. Will throw away its last artillery (the unit that boosts infantry) even when keeping it would win the battle.

Zero tech research — Never invests in technology. Heavy bombers, jet fighters? Never heard of 'em.

No threat prediction — Reacts purely to current board state. Germany could stack 30 tanks on the Russian border and ProAI won't notice until they attack.

Estimating Amphibious Landings in Current Games
Based on what we know:

ProAI data had amphibious assaults (the shuck pipeline exists), but our diagnostic last session showed the GNN learned almost none of it — only 6 sea→land moves in a 7-round game, with 41 UK naval moves that went nowhere.
The network inherited ProAI's knowledge via supervised learning, but amphibious is a multi-step credit assignment problem (load → sail → land → fight → capture VC = 3-5 actions). The sparse reward signal makes it one of the hardest things to learn.
My estimate for Gen 6:

Game Moves Rounds Estimated Amphibious
Game 1 1500 R18 3-8 (long game, lots of time, but stalling)
Game 2 434 R5 0-1 (quick Axis land win, no time for naval)
Game 3 1500 R19 2-6 (draw = passive, minimal aggression)
Game 4 740 R8 1-3 (medium game)
Game 5 1500 R19 2-6 (draw, passive)
Game 6 217 R3 0 (too fast)
Game 7 1500 R18 3-8 (long game)
Gen 6 total estimate: ~15-40 amphibious assaults across all 15 games, heavily concentrated in the long stalling games. That's roughly 1-3 per game — far below the ~6-8 per game that competent human play would show.

We'll know for sure in about 20-30 minutes when Gen 6 finishes and prints the Naval: line. Could be wrong and I will have to look at what to do next if it is wrong.

We did get a signal from some proai play correct?
Yes — the supervised training on 5,000 ProAI-vs-ProAI games definitely contained amphibious assaults. ProAI has an explicit shuck pipeline, and USA especially has a hardcoded KGF strategy that buys 25% transports and targets territories near Berlin.

So the signal is in the supervised checkpoint (gnn_small_final.pt). The problem is how diluted it is:

A typical ProAI game runs ~8-10 rounds, maybe ~500-1000 total actions
Of those, maybe 20-40 are naval/amphibious related (load + sail + unload)
That's roughly 3-5% of all actions being amphibious
The other 95% is land combat, purchasing, passes, placement
The network saw 17.97 million training samples. Even at 3-5%, that's potentially 500K-900K amphibious action examples. That's not nothing — the network should have learned something.

The diagnostic from last session showed it did learn to move ships around (UK=41, Japan=31, USA=29 naval moves) but almost never the critical last step — actually unloading troops onto land. It learned "move ships" but not "land troops."

That's the credit assignment gap: the reward comes from capturing the territory (many steps later), not from the unload itself. Self-play with enough games should eventually close that gap since the network will see that games where troops land tend to end in wins. Have to fix the game buffer 1st fixing that. Will let you know more in about a week or two.

RogerCooper

@kindwind I will reiterate my suggestion to use the Minimap scenario to train the AI.There is no naval component to that game.

Good play will require an AI to do some sort of threat prediction.

Kindwind

@rogercooper Got it the 1st signal! There it is — amphibious tracking is live:

Game 1: 1500 moves, R18 — 1 amphib, 4 loads (troops loaded onto ships 4 times, landed once)
Game 2: 434 moves, R5 — 1 amphib, 2 loads (quick game but still had a landing!)
That's the signal we were looking for. Even in a short 5-round game, there's an amphibious assault happening. The golden data anchor is keeping the naval knowledge alive in the training mix.

The numbers are still low (human-level play would be 6-8 amphibs per game), but the fact that they're non-zero even in these early games is promising. As training continues and the golden data keeps reinforcing "ships can land troops," we should see this number climb.

Everything is running with all the new features:

Golden lock (30% buffer = 15K ProAI samples)
Buffer persistence (saves every gen)
Crash recovery (auto-save + --resume)
Per-game amphibious display
Anti-stalling fixes from before
If it crashes overnight, just run:

python -m colossus.selfplay --resume --golden-data data/proai_golden.pkl --generations 50 --sims 50 --games-per-gen 15 --train-batches 200 --eval-every 5 --eval-games 4 --batch-size 256 --lr 1e-4 --buffer-size 50000 --checkpoint-every 5 --temp-rounds 3
Game 1: 1500 moves, R18 — 1 amphib, 4 loads (troops loaded onto ships 4 times, landed once)
Game 2: 434 moves, R5 — 1 amphib, 2 loads (quick game but still had a landing!) so it was an attack or just a load?
Good catch — let me be precise about what those numbers mean.

amphib=1 means 1 move where a unit went from a sea zone → land territory. That's an actual landing (unload onto enemy/neutral coast).

loads=2 or 4 means units went from land → sea zone. That's loading troops onto transports.

So in Game 2: troops loaded onto ships twice, and one of those transported groups actually landed on a coast. That's a real amphibious operation — load, sail, unload.

But — there's a nuance I should flag. Our detection logic counts any sea→land move as "amphibious," even if it's just unloading at a friendly port during non-combat movement. A true amphibious assault is specifically unloading into a hostile territory during the combat move phase. We're not distinguishing between:

Amphibious assault (landing on enemy territory to fight)
Amphibious transfer (unloading at your own coast)
Right now the code just checks from_territory in sea_zones and to_territory not in sea_zones. It doesn't know if the destination is enemy-held or if it's combat vs non-combat phase.

So that amphib=1 might be an actual D-Day style assault, or it might just be UK moving troops from a transport to a friendly coast. Still a sign of naval power projection either way, but worth knowing the signal is noisy.

Want me to refine the detection to distinguish hostile landings from friendly transfers, or is the rough signal good enough for now?

what do we need just the rought signal right now?
Yeah, the rough signal is fine for now. What we care about at this stage is: are ships moving troops at all?

Before the golden data, the answer was basically no — ships sailed around aimlessly. Now we're seeing loads and landings. Whether it's a hostile assault or friendly transfer, both require the network to understand "put troops on ship → move ship → take troops off ship." That's the multi-step reasoning we needed.

We can refine hostile vs friendly detection later once the numbers are high enough to be worth splitting. Right now 1-2 per game is so low that subdividing it would just be noise.

Let's let this run overnight and check the generation summaries tomorrow. The key metrics to watch:

amphib/game trending up (golden data reinforcing naval play)
draw rate trending down (anti-stalling penalties propagating)
loss decreasing (network actually learning from the mixed buffer) 9:42 2/28/2026! will have other problems though this pretty hard stuff to crack for a non coder person.

ubernaut

@kindwind interesting possibly relevant edit post today:

https://www.reddit.com/r/AxisAllies/comments/1rhzj4m/can_llms_play_this_game/

Kindwind

@ubernaut that's the plan. I am going to do an LLM native ai. My final deliverable is a person takes their revised .tsvg game save of where they are at gets advice and then tells the lmm to play the other side. I think it's neat. Others not so much. I was thinking that people might like a Markov chain battle calculator too in the final deliverable too.

This is pretty hard for a guy who doesn't really code. I am sure my code is a mess but it works so there is that.

Trying to just have Claude play Claude calling my engine and a database seemed like a good idea to which I think he was doing too but the cost was like 1.20 a game and for 100,000 games the math didn't math. I tried using Qwen but it's not smart enough. Then there was another problem about distillation from Claude

I am stuck with ML for now and training it that way until I get an ok player and then a good player and then a super human player. It might not even work either no one has done this before made one work. I think I can get there just a wish right now.

Pretty sure I can get a really good heuristic model if this fails.

beelee

@kindwind said in Game Engine Rules (AI Training):

This is pretty hard for a guy who doesn't really code. I am sure my code is a mess but it works so there is that.

Rock On !!!.png

ubernaut

@kindwind interested to see your progress

Kindwind

Project: Training an AI to play Axis & Allies from scratch using AlphaZero-style self-play

Architecture:

Custom Rust game engine (PyO3 bindings to Python) running the full A&A ruleset — purchases, combat moves, naval transports, amphibious assaults, strategic bombing, tech research, the works
Graph Neural Network (GNN) treating the 143-territory board as a graph with attention-based message passing
MCTS (Monte Carlo Tree Search) for move selection during self-play
Factorized action space across 5 heads: action_type, from_territory, to_territory, unit_type, quantity
Training pipeline:

Phase 1 — Supervised learning: Trained on 5,000 games of ProAI (rule-based AI) replays (~18M samples) to bootstrap the network
Phase 2 — Self-play: 195 generations, 1,460 games, the model playing against itself and learning from its own experience
Key milestones:

The AI learned amphibious assaults on its own through self-play — averaging 12 per game across 17,500+ total amphibious attacks. This is a complex multi-step operation (load units onto transports, move fleet, unload in combat) that the network figured out through MCTS search
Supervised loss dropped from 3.80 → 0.99 during initial training
100% Axis win rate vs the rule-based ProAI with MCTS search
Current challenge — the Allied problem:

Axis play is strong, but Allied play is weak (only 89 Allied wins out of 1,460 self-play games)
This mirrors the real game's asymmetry — Allies need coordinated multi-front strategy across 3 nations
The GNN-small (574K params) doesn't have enough capacity to learn this
Currently running: Upgrading from GNN-small (574K params) to GNN-large (8.6M params) — supervised training is at chunk 90/100 on 5,000 ProAI games. Holdout accuracy: action_type 100%, unit_type 88.2%, quantity 96.8%. Should finish tonight, then we restart self-play with the bigger brain and see if 15x more parameters can crack the Allied strategy problem.

Small update of where the project stands. Keep in mind I am a vibe coder! So while I did my best to do this there still maybe problems I don't know. Also Anthropic got back to me: Hi Christopher,

Thanks for reaching out.

Training a narrow, task-specific CNN for game AI does not conflict with our Usage Policy, which is primarily aimed at preventing competitive model distillation.

This authorization covers the use case as described. Individuals and hobbyists are welcome to use the Claude API. However, please note that use of the API is subject to our Commercial Terms of Service, regardless of whether you are an individual or representing a company.

Best,

Anthropic Privacy Team

I will be having Opus 4.6 taking a crack at some games to get some better data! Here was some napkin math for the Opus games: Good question. Let me estimate based on what a game looks like through the API:

Per decision point:

Input: ~8-15K tokens (strategy prompt + board state + legal moves list)
Output: ~200-500 tokens (move selection + brief reasoning)
Per game (~150 strategic decisions per side after --auto-tactical):

Input: ~1.5M tokens
Output: ~50K tokens
Opus 4.6 API pricing ($15/M input, $75/M output):

Games Input Cost Output Cost Total
1 ~$22 ~$3.75 ~$26
10 ~$220 ~$37 ~$260
50 ~$1,100 ~$188 ~$1,300
But consider: If the llm_vs_llm.py accumulates conversation history (likely), later decisions get much more expensive as context grows. Resetting context each turn keeps costs down.

Cheaper alternatives that might still be valuable:

Sonnet 4.6 ($3/$15 per M): Same games for ~$50-60 for 10 games — 5x cheaper
Haiku 4.5 ($0.80/$4 per M): ~$15 for 10 games — but quality drops
Is it worth it? For this project specifically, even 10 high-quality LLM games as Allies could be extremely valuable — the model is starving for good Allied play examples. ProAI plays Allies poorly (10% win rate), so LLM games would be a completely different distribution of Allied strategy. You could mix those into the training buffer as "golden data."

My recommendation: start with 5-10 games on Sonnet (~$30-60) to test quality, then do a batch on Opus if the moves look smart. Would you want me to check on the llm_vs_llm.py setup to see how it's structured for cost?

beelee

@kindwind

hmm ...

I will ping @redrum even though he isn't here anymore. He probbaly created or helped create AI or whatever name it is going by.

Keep up the good work

ubernaut

@kindwind exciting stuff!