Navigation

    TripleA Logo

    TripleA Forum

    • Register
    • Login
    • Search
    • TripleA Website
    • Categories
    • Recent
    • Popular
    • Users
    • Groups
    • Tags

    ML and Cloud deployment

    Development
    2
    4
    70
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Kindwind
      Kindwind last edited by

      Any one with ML experience can you look over this? I want to make sure I get this right! # COLOSSUS Hyperparameter Tuning Guide

      Current Defaults

      Parameter Day Mode Night Mode Notes
      Workers 8 8 1 per physical core (Ryzen 7 7800X3D)
      MCTS Simulations 100 200 Searches per move
      Batch Size 256 512 Training batch size
      Learning Rate 0.001 0.001 Adam optimizer
      Temperature (early) 1.0 1.0 Exploration in first 30 moves
      Temperature (late) 0.1 0.1 Exploitation after 30 moves
      Max Moves 300 300 Game length cap
      Buffer Capacity 50K 50K Training sample buffer

      Mercy Rule (TUV + IPC Victory Condition)

      The mercy rule ends games early when one side has a decisive advantage, providing clear win/loss training signals instead of relying solely on IPC tiebreakers.

      Formula

      Score = Total_Unit_Value + (IPC_Income × income_weight)
      

      Parameters

      Parameter Default Description
      min_moves 50 Minimum moves before mercy can trigger
      score_ratio_threshold 1.05 Required improvement from baseline (5%)
      income_weight 3.0 Multiplier for IPC income in score

      Baseline Adjustment

      The starting position favors Allies economically:

      • Axis starting score: 824 (TUV 629 + IPC 65×3)
      • Allies starting score: 977 (TUV 689 + IPC 96×3)
      • Baseline ratio: 0.84 (Axis/Allies)

      The mercy rule compares against this baseline, so:

      • Axis wins if: current_ratio / 0.84 >= 1.05 → ratio ≥ 0.882
      • Allies wins if: 0.84 / current_ratio >= 1.05 → ratio ≤ 0.800

      Why 1.05x Threshold?

      Analysis of trained network self-play showed games swing ratios by ±4% from baseline:

      • Trained games stay within 0.80-0.87 ratio range
      • 1.2x threshold (old) required ratio ≥ 1.008 for Axis win → never triggered
      • 1.05x threshold allows decisive advantages to end games early

      Results with Random Play

      Threshold Axis Wins Allies Wins Draws Notes
      1.2x 26% 16% 58% Too conservative for trained play
      1.1x 28% 30% 2% Good balance
      1.05x (default) 95% 5% 0% Random play favors Axis

      Note: Random play heavily favors Axis. With trained networks playing balanced games, expect closer to 50/50 Axis/Allies split.

      Tuning Guidelines

      If you see... Try adjusting...
      Still 100% draws Lower threshold to 1.03x
      Too many early mercy wins Increase min_moves to 100
      Unbalanced win rates Check baseline calculation
      AI exploiting mercy rule Raise threshold to 1.1x or 1.2x

      Curriculum Learning (Future)

      As training progresses, tighten the mercy rule:

      1. Phase 1 (early): threshold=1.2x (easy wins, frequent signal)
      2. Phase 2 (mid): threshold=1.5x (harder to trigger)
      3. Phase 3 (late): threshold=2.0x or disable (require actual VC capture)

      When to Tune What

      Phase 1: Getting It Working (You Are Here)

      Don't tune yet. Use defaults until you confirm:

      • Model beats random (>60% win rate)
      • Loss is decreasing
      • No degenerate strategies

      Phase 2: Initial Optimization

      Once basics work, try:

      If you see... Try adjusting...
      Loss stuck high (>5.0) Lower learning rate (0.0003)
      Loss drops then spikes Lower learning rate, add warmup
      Axis wins 90%+ Check game balance, maybe remove German bid
      Draws 50%+ Increase max_moves, check mercy rule
      Very slow training Reduce simulations (50), keep workers at 8

      Phase 3: Serious Training

      After 10K+ games, consider:

      Parameter When to increase When to decrease
      Simulations Model plateaued, need deeper search Training too slow
      Batch size Stable training, want faster Loss unstable
      Learning rate Training too slow Loss unstable/spiking
      Temperature Too deterministic, missing good moves Too random, not converging

      Specific Recommendations

      Learning Rate

      0.001  - Default, good starting point
      0.0003 - If loss is unstable
      0.0001 - Fine-tuning after initial training
      

      MCTS Simulations

      50   - Fast iteration, early experiments
      100  - Default day mode (balanced)
      200  - Night mode (better quality)
      400+ - Only if you have time and see benefit
      

      Batch Size

      128 - If running out of GPU memory
      256 - Default (good for 17GB VRAM)
      512 - Night mode, faster training
      

      Workers (Self-Play)

      4  - Light usage (gaming, browsing)
      6  - Medium usage (some background tasks)
      8  - Optimal for Ryzen 7 7800X3D (1 per physical core)
      

      Important: 8 workers = 1 per physical core. More workers cause CPU contention and slower training due to context switching overhead. Testing showed 8 workers outperforms 10-14 workers on 8-core CPUs.


      Memory Management

      Buffer Capacity

      Setting Memory Usage Notes
      500K (old default) ~210 GB WILL CRASH - impossible
      100K ~44 GB Too large for 32GB RAM
      50K (current default) ~22 GB Safe for 32GB RAM
      25K ~11 GB Conservative, for smaller systems

      Each training sample is ~440KB due to the 104,729 action space.

      Formula: memory_gb ≈ buffer_capacity × 0.00044

      Memory Leak Signs

      • Memory usage climbing steadily over hours
      • System slowdown after 2-3 hours
      • Windows "low virtual memory" warnings
      • BSOD with CRITICAL_PROCESS_DIED

      Solutions

      1. Set --buffer-capacity 50000 (or lower)
      2. Monitor with Task Manager during training
      3. Restart training if memory exceeds 28GB

      Warning Signs

      Model Not Learning

      • Loss stays >10 after 1000 steps → Check data pipeline
      • Loss oscillates wildly → Reduce learning rate
      • 0% or 100% win rate → Game mechanics bug

      Degenerate Strategies

      • 50% passes → Model learned "doing nothing is safe"

      • All games draw → Peace treaty equilibrium (check mercy rule is enabled)
      • Games <50 moves → Suicide attacks or mercy threshold too low
      • 100% Axis or Allies wins → Mercy rule baseline may be miscalibrated
      • Alternating Axis/Allies wins → Peace Treaty 2.0 (see below)

      Peace Treaty 2.0 (Mercy Rule Gaming)

      The mercy rule solves the original "peace treaty" where neither side attacks and all games draw. But there's a subtler exploit:

      The Problem: In self-play, the same network plays both sides. It could learn to trade mercy wins - "I'll tank my TUV to let you win this game, you do the same next game." The network doesn't distinguish between Axis and Allies identities across games.

      Detection Signs:

      Pattern What It Means
      Win rate oscillates: 80% Axis → 80% Allies → 80% Axis Network cycling between strategies
      Mercy triggers at exactly min_moves (50) Intentionally fast losses
      One side's TUV drops to near-zero quickly Deliberate unit sacrifice
      Suspiciously balanced 50/50 Axis/Allies wins Too perfect to be real learning
      Low game variance (all games look similar) Memorized "trade" pattern

      How to Detect Programmatically:

      # In analyze_progress.py or training loop
      # Check for alternating win streaks
      recent_winners = [game.winner for game in last_100_games]
      axis_streaks = count_streaks(recent_winners, 0)  # Count consecutive Axis wins
      allies_streaks = count_streaks(recent_winners, 1)
      
      # Suspicious if we see many short alternating streaks
      if avg_streak_length < 3 and win_rate_variance < 0.1:
          print("WARNING: Possible Peace Treaty 2.0 detected")
      
      # Check mercy trigger timing
      mercy_moves = [game.end_move for game in mercy_games]
      if np.std(mercy_moves) < 10:  # All mercy at same move count
          print("WARNING: Suspiciously consistent mercy timing")
      

      Prevention Strategies:

      Strategy Implementation Tradeoff
      Asymmetric rewards Axis win = +1.0, Allies win = +0.9 Breaks symmetry, may bias learning
      Minimum game length Raise min_moves from 50 → 100 Slower training, but harder to game
      TUV floor check No mercy if loser TUV > 50% of start Prevents deliberate tanking
      Streak detection Pause training if alternating pattern detected Reactive, not preventive
      Diverse opponents Play against older checkpoints (not just self) Best solution, more complex

      Recommended Fix (TUV Floor):

      Add to mercy rule: Don't trigger mercy win if the "losing" side still has significant forces.

      # In rewards.py MercyRule.check()
      loser_tuv_ratio = loser_tuv / loser_starting_tuv
      if loser_tuv_ratio > 0.5:  # Loser still has >50% of starting army
          # This is a legitimate beatdown, allow mercy
          return (winner, reason)
      else:
          # Loser's army evaporated suspiciously fast
          # Could be intentional tanking - don't trigger mercy
          return None
      

      When to Worry:

      • Early training (first 1000 games): Don't worry, randomness dominates
      • Mid training (1000-10000 games): Watch for patterns emerging
      • Late training (10000+ games): If 50/50 split persists with low variance, investigate

      Overfitting

      • Loss decreases but eval win rate drops → Reduce training, add regularization
      • Training loss << validation loss → More diverse self-play

      The "Suicide Loop"

      The AI learns that losing quickly is less painful than dragging out a loss.

      Detection: Games ending in <30 moves, one side's units disappear immediately.

      Fix: Add small per-move survival bonus to reward function, or raise min_moves for mercy.

      The "Hyperparameter Mismatch"

      Learning rate too high → AI forgets everything every few minutes.

      Detection: Loss oscillates wildly, win rate swings between 0% and 100%.

      Fix: Lower learning rate to 0.0003 or 0.0001.


      Command Examples

      # Conservative (stable but slow)
      python scripts/train.py --lr 0.0003 --simulations 50 --workers 8
      
      # Aggressive (faster but riskier)
      python scripts/train.py --lr 0.003 --batch-size 512 --workers 8
      
      # Debug mode (fast iteration)
      python scripts/train.py --duration 600 --simulations 25 --workers 4
      
      # Full night run
      python scripts/train.py --night --simulations 200 --workers 8
      
      # Memory-safe long run
      python scripts/train.py --night --buffer-capacity 50000 --workers 8
      

      Monitoring Checklist

      Every training run, check:

      1. Loss trending down (not stuck or spiking)
      2. Win rates not extreme (20-80% range is healthy)
      3. Games completing (not all hitting max moves)
      4. Mercy rule triggering (expect 30-50% with trained network)
      5. Mercy timing varies (not all at exactly min_moves)
      6. No alternating Axis/Allies win streaks (Peace Treaty 2.0)
      7. Passes reasonable (<30% of moves)
      8. Game lengths have healthy variance (not all identical)
      9. Memory usage stable (<28GB for 32GB system)

      After training:

      python scripts/analyze_progress.py
      python scripts/watch_game.py --checkpoint checkpoints/latest.pt --speed fast
      python scripts/evaluate.py --checkpoint checkpoints/latest.pt --games 20
      

      Neural Network Encoding Limitations

      Current Cargo Encoding (Channels 48-53)

      The network sees aggregate cargo counts per territory, normalized to max 4:

      • Channels 48-49: Our transport cargo (infantry / other)
      • Channels 50-51: Enemy transport cargo (infantry / other)
      • Channels 52-53: Carrier fighters (ours / enemy)

      Known Limitations

      Limitation Strategic Impact Priority
      No per-transport cargo visibility Can't plan which transport to unload first Medium
      No transport capacity remaining Can't see if transport is full (2 units) or has space High
      No carrier capacity remaining Can't see if carrier has 0/1/2 fighters Medium
      Aggregate counts only Multiple transports in same sea zone appear as one blob Medium
      "Other" cargo lumps unit types Artillery, armor, AA guns indistinguishable in cargo Low
      Normalization cap of 4 Large fleets (5+ transports) lose precision Low

      Future Encoding Improvements

      If training shows transport/carrier coordination issues:

      1. Per-unit cargo channels: Separate channel for each transport's cargo state
      2. Capacity remaining channels: Binary flags for "has space" vs "full"
      3. Distinct unit encoding: Track which specific units are loaded where
      4. Separate "other" cargo types: Artillery vs armor vs AA gun distinction

      AA Gun Transport Rules

      Summary (verified and tested 2026-01):

      Action Combat Move Non-Combat Move
      Load AA gun on transport ❌ Not allowed ✅ Allowed
      Unload AA gun from transport ❌ Not allowed ✅ Allowed
      Amphibious assault with AA gun ✅ Allowed (as cargo) N/A

      Details:

      • AA guns CAN load on transports during NCM ✅
      • AA guns CANNOT load on transports during Combat Move ❌
      • AA guns CAN unload during NCM ✅
      • AA guns CAN participate in amphibious assault (unloaded as non-combatant) ✅
      • AA guns CANNOT move independently during Combat Move (no combat movement for AA)

      This is enforced by the game engine in gen_transport_loads_combat_dedup which excludes AA guns, while gen_transport_loads_dedup (NCM) includes them.

      Test coverage: See tests/integration_tests.rs and unit tests in src/moves/generation_fixed.rs


      Titans Memory Integration (FUTURE)

      Status: PLANNING - Implement after 5,000+ games baseline established

      Paper: "Titans: Learning to Memorize at Test Time" (Behrouz, Zhong, Mirrokni - Google Research, December 2024)
      Implementation: lucidrains/titans-pytorch (MIT License, 1.5k+ stars)

      Overview

      Titans is a "surprise-based neural memory" architecture that enables test-time learning. This is a "brain transplant" rather than a full rebuild: we keep the MCTS chassis but replace the static neural network with one that adapts during gameplay.

      In vanilla AlphaZero, the neural network is frozen during gameplay. It learned patterns from millions of self-play games but cannot adapt to:

      1. Opponent-specific tendencies - Is this player aggressive? Defensive? Risk-tolerant?
      2. Strategic surprises - Unusual openings, unconventional purchases
      3. Game-specific adaptations - Adjusting mid-game when something unexpected happens

      Why Titans for COLOSSUS?

      Feature Standard AlphaZero Titans-Enhanced
      Network weights during game Static Dynamic (memory module updates)
      Opponent modeling None Implicit (learns from surprises)
      Adaptation speed Zero Real-time (after each opponent move)
      Memory of game history CNN sees last N states Neural long-term memory
      "Surprise" awareness None Quantified (gradient of prediction error)

      A&A-Specific Benefits:

      1. Purchase Phase Adaptation: Opponent buys 6 bombers → High surprise → Memory updates → Value network shifts toward anti-air strategies
      2. Risk Tolerance Modeling: Opponent attacks with 30% win probability → Memory encodes "opponent is risk-seeking" → MCTS values "bait" moves higher
      3. Strategic Flexibility: Russia stacks Ukraine instead of expected Caucasus defense → AI adjusts strategic evaluation for remainder of game
      4. Breaking Peace Treaty Pattern: Surprising aggressive moves become memorable, encouraging counter-play

      Architecture

      Current COLOSSUS Architecture

      ┌─────────────────────────────────────────────────────────┐
      │                    MCTS Engine                          │
      │   ┌─────────────────────────────────────────────────┐   │
      │   │  For each simulation:                           │   │
      │   │    1. Select (UCB)                              │   │
      │   │    2. Expand                                    │   │
      │   │    3. Evaluate → Query Neural Network (STATIC)  │   │
      │   │    4. Backpropagate                             │   │
      │   └─────────────────────────────────────────────────┘   │
      │                         ↓                               │
      │              ┌─────────────────┐                        │
      │              │  Policy Head    │ → Move probabilities   │
      │              │  Value Head     │ → Win probability      │
      │              │  (ResNet/CNN)   │                        │
      │              │  [FROZEN]       │                        │
      │              └─────────────────┘                        │
      └─────────────────────────────────────────────────────────┘
      

      Titans-Enhanced Architecture

      ┌─────────────────────────────────────────────────────────┐
      │                    MCTS Engine                          │
      │   ┌─────────────────────────────────────────────────┐   │
      │   │  For each simulation:                           │   │
      │   │    1. Select (UCB)                              │   │
      │   │    2. Expand                                    │   │
      │   │    3. Evaluate → Query Titans Network           │   │
      │   │    4. Backpropagate                             │   │
      │   │  [Memory LOCKED during thinking]                │   │
      │   └─────────────────────────────────────────────────┘   │
      │                         ↓                               │
      │   ┌─────────────────────────────────────────────────┐   │
      │   │           TITANS ARCHITECTURE                   │   │
      │   │  ┌─────────────┐  ┌─────────────┐  ┌─────────┐  │   │
      │   │  │Short-Term   │  │Long-Term    │  │Persist  │  │   │
      │   │  │Memory       │  │Memory       │  │Memory   │  │   │
      │   │  │(Attention)  │  │(Neural MLP) │  │(Fixed)  │  │   │
      │   │  │[Window=128] │  │[UPDATES!]   │  │[Task]   │  │   │
      │   │  └─────────────┘  └─────────────┘  └─────────┘  │   │
      │   │           ↓              ↓              ↓        │   │
      │   │         ┌────────────────────────────────┐      │   │
      │   │         │        Policy + Value Heads    │      │   │
      │   │         │        + Surprise Metric       │      │   │
      │   │         └────────────────────────────────┘      │   │
      │   └─────────────────────────────────────────────────┘   │
      │                         ↓                               │
      │   ┌─────────────────────────────────────────────────┐   │
      │   │  AFTER OPPONENT MOVES:                          │   │
      │   │    1. Calculate prediction vs actual            │   │
      │   │    2. Compute surprise (gradient loss)          │   │
      │   │    3. Backprop to Long-Term Memory ONLY         │   │
      │   │    4. Memory weights updated for next turn      │   │
      │   └─────────────────────────────────────────────────┘   │
      └─────────────────────────────────────────────────────────┘
      

      Hyperparameters

      Memory Module Configuration

      Parameter Default Range Notes
      enabled false - Enable after base training complete
      type MemoryMLP MemoryMLP, FactorizedMemoryMLP 2-layer MLP recommended
      dim 384 256-512 Match board embedding size
      num_layers 2 1-4 More = more expressive, slower
      chunk_size 64 32-128 History window per update

      Surprise Mechanism

      Parameter Default Range Notes
      learning_rate 0.01 0.001-0.1 Memory update step size
      min_threshold 0.1 0.0-0.5 Skip tiny surprises
      max_gradient 1.0 0.5-2.0 Clip extreme surprises

      Integration Flags

      Parameter Default Notes
      lock_during_mcts true Don't learn from simulations
      update_on_opponent_move true Core mechanism
      reset_between_games true Avoid opponent overfitting

      Full YAML Configuration

      titans:
        enabled: false                    # Enable after base training complete
      
        memory_module:
          type: "MemoryMLP"               # MemoryMLP, MemoryAttention, etc.
          dim: 384                        # Match board embedding dimension
          num_layers: 2                   # MLP depth (2 recommended by lucidrains)
          chunk_size: 64                  # History window for processing
      
        surprise:
          learning_rate: 0.01             # Memory update step size
          min_threshold: 0.1              # Don't update if surprise below this
          max_gradient: 1.0               # Clip large surprises
      
        integration:
          lock_during_mcts: true          # Don't learn from simulations
          update_on_opponent_move: true   # Core surprise mechanism
          update_on_own_move: false       # Usually not needed
      
        history:
          max_length: 300                 # Max game states to track
          include_purchases: true         # Track purchase decisions
          include_combat_results: true    # Track battle outcomes
      

      Implementation

      Installation

      pip install titans-pytorch
      

      Core Imports

      from titans_pytorch import NeuralMemory, MemoryAsContextTransformer
      
      # Memory models available:
      from titans_pytorch import (
          MemoryMLP,              # Simple 1-4 layer MLP (paper default)
          MemoryAttention,        # Attention-based memory
          FactorizedMemoryMLP,    # Efficient factorized version
          MemorySwiGluMLP,        # SwiGLU activation variant
          GatedResidualMemoryMLP  # With residual connections
      )
      

      Board State Encoding (Keep Current)

      # Current encoding (unchanged)
      board_tensor = encode_board(game_state)  # Shape: [1, 54, 12, 12]
      
      # Flatten for Titans
      board_flat = board_tensor.view(1, -1)  # Shape: [1, 7776]
      
      # Project to Titans dimension
      embedding = self.projection(board_flat)  # Shape: [1, 384]
      

      History Sequence

      class GameHistory:
          def __init__(self, embedding_dim=384, max_length=300):
              self.states = []
              self.dim = embedding_dim
      
          def add_state(self, board_embedding):
              self.states.append(board_embedding)
      
          def get_sequence(self):
              if not self.states:
                  return torch.zeros(1, 1, self.dim)
              return torch.stack(self.states, dim=1)  # [1, T, dim]
      

      The Surprise Calculation (Key Innovation)

      def calculate_surprise(network, board_before_opponent, actual_opponent_move):
          """
          Core Titans mechanism: How surprised was the AI by opponent's move?
      
          High surprise → Large gradient → Memory updates significantly
          Low surprise → Small gradient → Memory mostly unchanged
          """
          with torch.enable_grad():
              # Get prediction BEFORE opponent moved
              policy_pred, value_pred, _ = network(board_before_opponent)
      
              # What probability did we assign to their actual move?
              move_idx = encode_move(actual_opponent_move)
              predicted_prob = policy_pred[0, move_idx]
      
              # Surprise = negative log probability (cross-entropy style)
              # If we predicted 90% → low surprise
              # If we predicted 0.1% → high surprise
              surprise_loss = -torch.log(predicted_prob + 1e-8)
      
              return surprise_loss
      
      def update_memory(network, surprise_loss, learning_rate=0.01):
          """Update ONLY the memory module weights, not the full network."""
          network.memory_module.zero_grad()
          surprise_loss.backward()
      
          with torch.no_grad():
              for param in network.memory_module.parameters():
                  if param.grad is not None:
                      param.data -= learning_rate * param.grad
      

      Game Loop Integration

      class TitansEnhancedMCTS:
          def __init__(self, network):
              self.network = network
              self.history = GameHistory()
      
          def play_turn(self, game_state):
              # === THINK PHASE ===
              # Lock memory during MCTS (don't learn from imagination)
              self.network.memory_module.eval()
      
              # Standard MCTS search
              best_move = self.mcts_search(game_state, simulations=200)
      
              # Store state BEFORE our move
              self.board_before_move = encode_board(game_state)
      
              return best_move
      
          def observe_opponent_move(self, opponent_move, new_state):
              # === SURPRISE PHASE ===
              # Calculate how unexpected opponent's move was
              surprise = calculate_surprise(
                  self.network,
                  self.board_before_move,
                  opponent_move
              )
      
              # Update memory based on surprise
              self.network.memory_module.train()
              update_memory(self.network, surprise)
      
              # Add to history for context
              self.history.add_state(encode_board(new_state))
      
              # Log for analysis
              print(f"Opponent move surprise: {surprise.item():.4f}")
      

      Three Titans Variants

      The paper presents three ways to incorporate memory. For COLOSSUS, we recommend MAC (Memory as Context):

      1. Memory as Context (MAC) - RECOMMENDED

      History → [Neural Memory] → context
      Current → [Attention] → query
      (context, query) → [Combine] → Policy/Value
      

      Why for A&A: Game history matters. What territories changed hands, what was purchased - this context informs current decisions.

      2. Memory as Layer (MAL)

      Input → [Memory Layer] → [Attention Layer] → ... → Output
      

      Better for: Very long sequences (2M+ tokens). Overkill for A&A games.

      3. Memory as Gate (MAG)

      Input → [Memory Branch] ─┐
            → [Attention]    ──┼→ [Gated Combine] → Output
      

      Better for: When you need fine-grained control over memory influence.


      Pre-Training Requirements

      CRITICAL: Titans surprise-based learning only works if the AI already knows what's "normal". You must:

      1. Train base model first (current COLOSSUS training)

        • 10,000+ self-play games minimum
        • Network learns rules, basic strategy
        • This is your "persistent memory" foundation
      2. Then enable surprise updates

        • Network can now detect deviations from learned patterns
        • Memory module adapts to specific opponents
        • Value shifts reflect game-specific surprises

      Implementation Phases

      Phase 1: Validation (Current Priority)

      • Continue current training to 5,000+ games
      • Validate learning is happening (loss decreasing)
      • Resolve peace treaty pattern
      • Establish baseline performance metrics

      Phase 2: Titans Infrastructure (After Baseline)

      • Install titans-pytorch:
        pip install titans-pytorch
      • Create TitansNetwork wrapper class
      • Implement GameHistory tracking
      • Add surprise calculation utilities
      • Unit tests for memory updates

      Phase 3: Integration (Careful)

      • Modify MCTS to use Titans network
      • Implement memory locking during search
      • Add post-opponent-move surprise calculation
      • Test on single games first
      • Monitor memory weight changes

      Phase 4: Training (New Paradigm)

      • Train base model (frozen memory) - 10,000 games
      • Enable memory updates during inference only
      • Test against vanilla COLOSSUS
      • Measure adaptation effectiveness

      Phase 5: Optimization

      • Tune memory learning rate (0.001 - 0.1 range)
      • Experiment with memory architectures (MLP layers)
      • Adjust chunk_size for A&A game length
      • Profile memory/compute overhead

      Expected Outcomes

      Metric Without Titans With Titans (Expected)
      Adaptation to unusual openings None Within 3-5 turns
      Opponent tendency modeling None Implicit after 10+ moves
      Response to strategic surprises Fixed policy Dynamic adjustment
      "Stuck in local minima" games Common Reduced (surprise breaks patterns)

      A&A-Specific Scenarios

      1. Germany buys navy instead of land units

        • Current: AI follows trained policy regardless
        • Titans: High surprise → Memory updates → UK/US naval strategy shifts
      2. Japan ignores India, attacks Australia

        • Current: AI continues India-focused defense
        • Titans: Surprise registered → Pacific defense prioritized
      3. Russia trades Ukraine aggressively

        • Current: Standard Eastern Front evaluation
        • Titans: Risk-seeking behavior encoded → AI sets traps

      Known Challenges

      1. PyTorch Functional Transforms: The titans-pytorch library uses torch.func.grad which has compatibility issues with some setups. May need:

        torch._C._jit_set_profiling_mode(False)
        torch._C._jit_set_profiling_executor(False)
        
      2. Memory Overhead: Neural memory adds parameters. Monitor GPU memory during MCTS (many forward passes). Memory state size grows with game length.

      3. Overfitting to Opponent: Risk that AI adapts TOO much to one opponent's style, becomes exploitable. Mitigation: decay memory updates over time, reset memory between games, train on diverse self-play opponents.


      Decision Point

      Condition Action
      Current training < 5,000 games Wait. Build baseline first.
      Peace treaty pattern persists after 10k games Consider Titans to break equilibrium
      Want opponent-adaptive AI for human play Titans is the answer
      Cloud training budget available Train base → add Titans layer

      Resources

      Papers:

      • Titans: Learning to Memorize at Test Time - Core paper
      • MIRAS - Theoretical framework
      • Test-Time Training Done Right - Related approach

      Code:

      • lucidrains/titans-pytorch - MIT licensed implementation

      COLOSSUS Cloud Deployment Plan

      Budget: $100
      Goal: 20,000-50,000 games
      Timeline: Week 3 (after PC validation)


      Pre-Cloud Checklist

      Complete these BEFORE spending money:

      Task Status Notes
      5,000 games on PC ⬜ ~2 weeks at current rate
      Watch games with watch_game.py ⬜ Verify AI is learning, not broken
      Win rate not 100% draws ⬜ Some Axis/Allied wins appearing
      Test in WSL ⬜ Catch Linux bugs free
      Checkpoint upload working ⬜ Don't lose work if instance dies
      Git repo ready ⬜ Push code to GitHub/GitLab

      DO NOT proceed to cloud until all boxes checked!


      Phase 1: WSL Testing (Free)

      Test on Linux before paying for cloud:

      # 1. Enable WSL (Windows Terminal as admin)
      wsl --install
      
      # 2. Open Ubuntu
      wsl
      
      # 3. Install dependencies
      sudo apt update
      sudo apt install -y build-essential python3-pip curl
      
      # 4. Install Rust
      curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
      source $HOME/.cargo/env
      
      # 5. Copy project (from Windows path)
      cp -r /mnt/c/colossus ~/colossus
      cd ~/colossus
      
      # 6. Install Python deps
      pip3 install torch numpy maturin psutil
      
      # 7. Build Rust extension
      maturin develop --release
      
      # 8. Run quick test
      python3 scripts/train.py --workers 4 --hours 0.5
      
      # 9. Run full test suite
      cargo test --all
      

      If this works, cloud will work.


      Phase 2: Checkpoint Cloud Sync

      Add automatic checkpoint upload so you don't lose progress:

      Option A: Google Drive (Recommended - You already use it)

      Install rclone and configure:

      # Install rclone
      curl https://rclone.org/install.sh | sudo bash
      
      # Configure Google Drive
      rclone config
      # Follow prompts to add "gdrive" remote
      
      # Test upload
      rclone copy checkpoints/latest.pt gdrive:colossus/checkpoints/
      

      Add to training script (auto-upload every checkpoint):

      # In async_pipeline.py after saving checkpoint:
      import subprocess
      subprocess.run([
          "rclone", "copy", 
          "checkpoints/", 
          "gdrive:colossus/checkpoints/",
          "--quiet"
      ])
      

      Option B: Simple SCP (manual but reliable)

      After training stops:

      # From your Windows machine:
      scp -r user@cloud-ip:~/colossus/checkpoints ./cloud_checkpoints/
      

      Phase 3: Cloud Provider Setup

      Recommended: Vast.ai

      Best price for your budget.

      1. Create account: https://vast.ai
      2. Add $100 credits
      3. Find instance:
        • GPU: RTX 4090 or A100
        • CPU: 32+ cores
        • RAM: 64GB+
        • Storage: 50GB+
        • Price: $0.30-0.80/hr

      Instance Selection

      GPU $/hr Cores For $100 Best For
      RTX 4090 $0.30-0.50 32 200-300 hrs Best value
      A100 40GB $0.80-1.20 64 80-125 hrs Max speed
      RTX 3090 $0.20-0.35 16-32 280-500 hrs Budget

      Recommendation: RTX 4090 with 32+ CPU cores at ~$0.40/hr = 250 hours = ~10 days


      Phase 4: Cloud Training

      One-Time Setup

      # SSH into instance
      ssh -i your_key root@instance_ip
      
      # Run setup script
      curl -sSL https://raw.githubusercontent.com/YOUR_USERNAME/colossus/main/scripts/cloud_setup.sh | bash
      
      # OR manual:
      git clone https://github.com/YOUR_USERNAME/colossus.git
      cd colossus
      pip install torch numpy maturin
      maturin develop --release
      

      Upload Your Checkpoint (Continue Training)

      # From your Windows machine, upload current checkpoint:
      scp C:\colossus\checkpoints\latest.pt root@instance_ip:~/colossus/checkpoints/
      

      Start Training

      # Use screen (stays running after disconnect)
      screen -S training
      
      # Start with more workers (cloud has more CPU cores)
      cd ~/colossus
      python scripts/train.py \
          --workers 24 \
          --simulations 100 \
          --hours 240 \
          --resume checkpoints/latest.pt
      
      # Detach: Ctrl+A then D
      # Reconnect: screen -r training
      

      Monitor

      # New SSH session
      screen -r training  # Watch live
      
      # Or check logs
      tail -f checkpoints/training.log
      

      Phase 5: Download Results

      When done or budget running low:

      # From Windows, download checkpoint:
      scp root@instance_ip:~/colossus/checkpoints/latest.pt C:\colossus\checkpoints\cloud_latest.pt
      
      # Download all checkpoints:
      scp -r root@instance_ip:~/colossus/checkpoints/ C:\colossus\cloud_checkpoints/
      

      Budget Tracking

      Item Hours Cost
      Budget - $100
      Instance ($0.40/hr) 250 -$100
      Remaining 0 $0

      Expected Results for $100

      Instance Type Hours Workers Games/hr Total Games
      RTX 4090 (32 core) 250 24 ~150 ~37,500
      A100 (64 core) 100 48 ~250 ~25,000

      Cloud Training Config

      Update scripts/train.py for cloud:

      # Cloud-optimized settings
      CLOUD_CONFIG = {
          'workers': 24,           # 32-core machine
          'simulations': 100,
          'batch_size': 512,       # Bigger GPU
          'hours': 240,            # 10 days max
          'checkpoint_interval': 600,  # Every 10 min
      }
      

      Or create train_cloud.sh:

      #!/bin/bash
      python scripts/train.py \
          --workers 24 \
          --simulations 100 \
          --batch-size 512 \
          --hours 240 \
          --resume checkpoints/latest.pt \
          2>&1 | tee training.log
      

      Exit Criteria

      Stop training when:

      Condition Action
      Budget exhausted Download checkpoint, stop instance
      50,000 games reached You have enough for evaluation
      AI beats random 80%+ Success! Time to evaluate
      Loss stops decreasing May need hyperparameter tuning
      Still 100% draws at 20K games Something's wrong, stop and debug

      Troubleshooting

      Instance Dies / Gets Preempted

      1. Checkpoints auto-save every 10 min
      2. Use rclone to sync to Google Drive
      3. Restart on new instance, resume from latest.pt

      Out of GPU Memory

      # Reduce batch size
      python scripts/train.py --batch-size 256 ...
      

      Training Too Slow

      # More workers (up to CPU cores - 2)
      python scripts/train.py --workers 48 ...
      
      # Fewer simulations (faster but lower quality)
      python scripts/train.py --simulations 50 ...
      

      Summary Checklist

      Before Cloud:

      • 5,000 games on PC
      • Watched games, AI is learning
      • WSL test passed
      • Git repo pushed
      • Checkpoint sync tested

      On Cloud:

      • Instance launched
      • Setup script ran
      • Uploaded local checkpoint
      • Training started in screen
      • rclone syncing checkpoints

      After Cloud:

      • Downloaded final checkpoint
      • Stopped instance (stop billing!)
      • Tested checkpoint locally
      • Watch AI play

      Last updated: 2026-01

      RogerCooper 1 Reply Last reply Reply Quote 1
      • RogerCooper
        RogerCooper @Kindwind last edited by

        @kindwind Rather than a mercy rule, I would suggest a fixed turn limit, where the game is adjudicated based upon VC's. For the boardgame ports, a 10 turn limit would be sufficient.

        Keep in mind is possible for a game to reach a stalemate.

        Kindwind 1 Reply Last reply Reply Quote 1
        • Kindwind
          Kindwind @RogerCooper last edited by

          @rogercooper right now I am trying to pin down a good signal from the tree search. I finally stopped get draws. Axis were winning like 25% but the game engine was wrong. I think i have the game engine pinned down. will some training tonight to see if I can get a signal. If I have to I will add turns. have to see how it plays out.

          RogerCooper 1 Reply Last reply Reply Quote 2
          • RogerCooper
            RogerCooper @Kindwind last edited by

            @kindwind It would seem that training against random games would be every inefficient compared to training against the AI that TripleA comes with.TripleA is not Go, random play should be very bad.

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post
            Copyright © 2016-2018 TripleA-Devs | Powered by NodeBB Forums