@thedog What the neural network CAN SEE (encoded in the 48-channel tensor):
Feature
Channel
How Encoded
Territory PU values
25
Production value normalized (0-1, max 10)
Victory Cities
26
Binary flag
Capitals
28
Binary flag
Factories
30
Binary "has factory" flag
Factory damage
29
Damage level normalized
So the network sees all this information, but here's the philosophical difference from TripleA's built-in AI:
TripleA's AI: Has explicit targeting heuristics ("prioritize capitals → VCs → factories → high-PU territories")
COLOSSUS: Has NO explicit targeting. It must discover through self-play that these things matter based on:
Win condition: Victory City count (9/10/12 VCs) or capital capture
Reward shaping: IPC advantage as a tie-breaker (which reflects total controlled PU)
The theory is that after enough training games, the network should learn:
"Capturing Moscow gives me all their IPCs" → value capitals
"Controlling more VCs wins games" → value VCs
"High-PU territories increase my IPC advantage" → value territory production
"Factories let me place units there" → value factories (implicitly)
The question is: Has training progressed far enough for this learning to emerge? Or does it need explicit reward shaping for intermediate strategic objectives?
So it's an alpha go style. Should work used low luck instead of pure dice.
It may not work out however!