Our final strategy is a deterministic, rule-based Dragapult ex agent. It focuses on three priorities: establish Dragapult ex reliably, make decisions around the prize race rather than raw damage, and use disruption only when it changes the current game.
The strongest prepared evidence came from Simulation results: Team TomTom reached a best observed rank of #73 with score 1076.4, and the prepared final snapshot showed #86 / 3243 with score 1068.8. We use those results as empirical validation of the strategy, not as the strategy itself.
The key lesson was that local tests and live leaderboard behavior can diverge sharply. The final policy is deliberately conservative: it keeps the core Dragapult plan stable, learns from repeated failure modes, and avoids narrow patches that only solve one matchup.
Final agent
Primary archetype: Dragapult ex with Dreepy, Drakloak, Budew, Fezandipiti ex, Latias ex, Meowth ex, Rare Candy, Crushing Hammer, Boss's Orders, Crispin, Lillie's Determination, and Team Rocket's Watchtower.
For each legal decision, the agent:
Parses the board, hand, discard, available attackers, and visible resources.
Estimates remaining resources where possible.
Builds a prize-race plan around Dragapult ex and Phantom Dive.
Scores legal actions using setup, evolution, attachment, supporter, retreat, Boss target, bench-damage, and disruption modules.
Avoids dangerous low-value actions, including damage into immunity/counter targets and unnecessary draw/search at low deck count.
Why Dragapult was selected
We tested many directions: Dragapult resource variants, Lucario, Iono/Bellibolt, Crustle wall, Abomasnow/Kyogre, same-deck mirror tuning, target-energy variants, no-Hammer/Boss variants, Hop/Dunsparce, Alakazam/Dunsparce, and early search/router ideas.
Several were locally promising, but the live environment punished brittle changes. The final Dragapult policy was strongest because it was stable in the right places:
It prioritizes reaching Dragapult ex before greedier support lines.
It values multi-prize Phantom Dive turns over generic damage.
It maps Boss targets and bench damage by matchup instead of using one global target rule.
It treats Lucario, Alakazam/Dunsparce, Crustle, Iono/Bellibolt, Abomasnow/Kyogre, mirrors, and low-HP swarm boards as different threat classes.
It uses Crushing Hammer, Boss, Unfair Stamp, and Watchtower only when they support the current prize plan or stop a loaded attacker line.
It rejects broad overfit patches even when they improve one smoke test.
Architecture breakdown
Layer
What it does
Why it matters
Performance anchor
Final Dragapult ex policy, best observed rank #73 / score 1076.4; prepared snapshot score 1068.8
Establishes the selected policy as Simulation-backed evidence.
Prevents over-passive support turns when the opponent is racing.
Disruption timing
Scores Boss, Crushing Hammer, Unfair Stamp, and Watchtower by prize plan/comeback/threat state
Keeps disruption conditional instead of blind.
Safety guards
Avoids poor damage targets and restricts draw/search at low deck count
Reduces wasted counters and deck-out risk.
Development process
This was an iterative strategy-engine project, not a neural-network training run. The loop was:
Build a candidate deck/agent.
Validate package safety and deck legality.
Run local gauntlets against archived strong agents, public-style agents, and current candidates.
Submit only selected candidates for live evaluation.
Treat live underperformance as negative evidence.
Convert repeated failures into narrow heuristic changes.
Re-test; reject changes that improve one matchup while damaging broader robustness.
The most important discipline was restraint. We did not treat every local improvement as a final improvement. We asked whether each change survived broader anchors, or whether it merely exploited yesterday's local test.
Local-vs-live meta gap
One of the biggest findings was that local performance and live performance could differ sharply. Local tests caught deck mistakes and tactical regressions, but they were not a full substitute for the external opponent pool.
Examples:
same-deck mirror tuning looked plausible locally but scored 710.9 live;
target-energy logic scored 723.7;
no-Hammer-Boss scored 842.6;
resource-focused Dragapult reached a stronger prior candidate around 959.0, but added complexity did not beat the final guarded policy;
the final Dragapult policy remained the best prepared direction with 1068.8 in the final snapshot and a best observed #73 / 1076.4.
This changed the evaluation question from “what wins locally?” to “what is robust against a changing external meta?”
Post-selection cached leaderboard checks
After selecting the final direction, cached official results continued to support the same conclusion. The leaderboard pool grew substantially, yet Team TomTom remained in the same upper band: the later cached snapshot showed #89 / 3339 with score 1065.3, close to the prepared final snapshot and below the best observed #73 / 1076.4.
Cached official episode aggregation also showed why the strategy should be described as robust rather than solved. Across 535 cached TomTom-related official episodes, the aggregate record was 278-250-7 with positive total rating delta and a win rate just above 51%. That is consistent with a policy that has real edge across a broad pool, but it also shows that the remaining gap to the very top is not one obvious tactical bug. The harder remaining problem is variance plus matchup-specific pressure from strong opponents, not merely making Dragapult more aggressive in every game.
The main post-selection lessons were:
The final Dragapult policy held up after the field expanded, which is a better robustness signal than a single peak rank.
The edge was distributed across many opponents rather than concentrated in one easy matchup class.
Several cached matchups still showed repeated losses, reinforcing the need for matchup-specific target and disruption timing rather than global damage or setup boosts.
Current top leaderboard scores moved higher than our best snapshot, so the honest claim is “strong, robust, and well-validated,” not “fully solved.”
Evidence trajectory
Phase
Strategic idea
Result
Lesson
Early alternate archetypes
Non-Dragapult directions and public-meta counters
Mostly low or unstable
Useful as negative evidence and matchup probes.
Resource-focused Dragapult
Better visible-resource tracking and deck-count discipline
Around 959.0
Resource tracking mattered, but extra complexity increased robustness risk.
Best balance of live evidence, local robustness, and implementation safety.
Alternatives considered
Candidate family
Why it was attractive
What we learned
Final decision
Lucario/Riolu policy
Strong independent archetype and useful pressure into some Dragapult weaknesses
Helped reveal which losses were prize-race pressure versus setup weakness
Kept as secondary evidence, not the final agent.
Dragapult resource/search variants
Improved resource inference, search routing, disruption-card counts, and support tempo
Some variants were respectable, but complexity did not improve the final tradeoff
Fold useful discipline into the simpler guarded policy.
Mirror / target-energy / disruption ablations
Tested whether focused changes could outperform the broader policy
Narrow patches underperformed live even when locally plausible
Rejected as overfit or incomplete.
Alternate archetype tests
Explored public-meta threats and non-Dragapult approaches
Valuable as opponents, anchors, and threat families
Used to shape matchup-specific target maps.
Lucario + Dragapult portfolio analysis
Estimated whether different archetypes covered different meta pockets
Helpful for understanding meta risk, but not a single-agent claim
Reported only as strategy evidence.
Failure-to-patch matrix
Failure mode
Observed symptom
Strategy response
Decision
Late Dragapult / thin setup
Drakloak or Dragapult arrived too late or not at all
Prioritize evolution line and board stability over slow support/stall when opponent is racing
Included in final setup discipline.
Lucario pressure
Mega Lucario raced Dragapult and punished weak prize mapping
Add target logic for Riolu/Mega Lucario and loaded attackers; avoid broad overfit boosts
Broad anti-Lucario patch rejected after underperforming baseline.
Crustle wall pressure
Wall effects disrupted Phantom Dive lines
Score Crustle/Dwebble and wall pieces as strategic targets
Included as target-map evidence, not a risky final patch.
Alakazam/Dunsparce pressure
Control/stall threatened both Dragapult and Lucario
Test deck-count and target discipline before adoption
Helped one smoke test but hurt broader reliability, so rejected.
Overfit counter patches
Candidate improved one local matchup but worsened others
Prefer conservative robustness and negative evidence
Central principle of final policy.
Single-package constraint
Portfolio analysis was stable but cannot be claimed as one agent
Use portfolio only to reason about meta coverage and risk
Included as context, not overclaimed.
Secondary Lucario evidence
The Lucario/Riolu policy was not the final top scorer, but it was strategically useful. It covered some Dragapult weaknesses and helped isolate threat classes. Its live score of 967.6 made it a strong complement and a useful meta probe.
Offline portfolio analysis comparing Lucario and final Dragapult was stable across 432 games at 245-187-0 / 56.7%. This is not claimed as a single-agent improvement; it is evidence for how we reasoned about meta coverage and submission risk.
Reproducibility
The local evidence pack keeps exact package names, hashes, source snippets, and detailed tables for auditability. The public narrative intentionally stays readable and avoids front-loading raw internal file labels.
The reproducibility basis is:
final selected policy: deterministic Dragapult ex rule-based agent;
source: the agent implementation and matching deck list in the evidence pack;
validation: Python compilation, deck checks, and Docker/local-engine gauntlets;
empirical evidence: live score/rank snapshots and submission history;
local evidence: anchor, expanded-pit, meta-proxy, ablation, and failure-analysis tables.
Conclusion
The final strategy is a robust, meta-aware Dragapult ex rule-based agent. Its strength came from a disciplined process: observe live results, compare them against local tests, identify where the local proxy was wrong, convert repeated losses into narrow threat classes, and reject changes that only solved one visible problem.
Reliable board development, prize-race planning, conservative matchup-specific heuristics, and humility about local-vs-live mismatch beat brittle complexity. Failed submissions and rejected patches were not wasted attempts; they were the evidence that shaped the final robust policy.