Clean version. This preview focuses on the strategy, agent architecture, validation evidence, and failure-to-patch process.

Evidence visuals

日本語 English 一般向け

堅牢なメタ対応 Dragapult ex エージェント

要約

最終戦略は、決定論的なルールベースの Dragapult ex エージェントです。方針はシンプルです。Dragapult ex を安定して立てること、単純なダメージ量ではなくサイドレースを軸に判断すること、そして妨害カードは現在の局面を本当に変える時だけ使うことです。

最も強い検証材料は Simulation の結果でした。Team TomTom は観測上の最高順位 #73、スコア 1076.4 に到達し、最終準備時のスナップショットでは #86 / 3243、スコア 1068.8 でした。これらは戦略そのものではなく、戦略が実戦的に機能したことを示す経験的な検証材料として扱っています。

最大の学びは、ローカル検証とライブ環境の挙動が大きくズレることでした。最終ポリシーはあえて保守的です。Dragapult の基本プランを崩さず、繰り返し出た負け筋を狭い修正に変換し、特定の1マッチアップだけに効く過剰なパッチを避けました。

最終エージェント

主軸: Dragapult ex。Dreepy、Drakloak、Budew、Fezandipiti ex、Latias ex、Meowth ex、Rare Candy、Crushing Hammer、Boss's Orders、Crispin、Lillie's Determination、Team Rocket's Watchtower で支えます。

各合法手に対して、エージェントは次を行います。

盤面、手札、トラッシュ、利用可能なアタッカー、見えているリソースを読む。
可能な範囲で残りリソースを推定する。
Dragapult ex と Phantom Dive を中心にサイドレースの計画を作る。
展開、進化、エネルギー、サポーター、逃げ/入れ替え、Boss 対象、ベンチダメージ、妨害の各モジュールで合法手をスコアリングする。
無効・カウンター対象への無駄なダメージや、山札が薄い時の不要なドロー/サーチを避ける。

Dragapult を選んだ理由

検討した方向性は多岐にわたりました。Dragapult のリソース管理型、Lucario、Iono/Bellibolt、Crustle 壁、Abomasnow/Kyogre、同系ミラー調整、target-energy 系、no-Hammer/Boss 系、Hop/Dunsparce、Alakazam/Dunsparce、初期の探索/ルーター案などです。

いくつかはローカルでは有望でした。しかしライブ環境では、壊れやすい変更が厳しく罰せられました。最終 Dragapult 方針が強かった理由は、重要な部分で安定していたからです。

欲張ったサポート展開より先に Dragapult ex の成立を優先する。
単純なダメージより、Phantom Dive による複数サイドターンを重視する。
Boss 対象とベンチダメージを一律ルールではなくマッチアップ別に評価する。
Lucario、Alakazam/Dunsparce、Crustle、Iono/Bellibolt、Abomasnow/Kyogre、ミラー、低HP横並びを別々の脅威として扱う。
Crushing Hammer、Boss、Unfair Stamp、Watchtower は、現在のサイドプランを進めるか相手の完成アタッカーを止める時だけ高く評価する。
1つのスモークテストを改善しても、広く壊れるパッチは採用しない。

アーキテクチャ

レイヤー	役割	重要性
性能アンカー	最終 Dragapult ex 方針。観測最高 #73 / 1076.4、準備時スナップショット 1068.8	選択した方針の経験的な裏付けになる。
盤面/リソース解析	バトル場/ベンチ、手札/トラッシュ枚数、見えているリソース、サイド不確定性、ターンログ、KO/アイテムロック兆候を読む	盲目的なプレイを避け、展開/逆転局面を認識する。
サイド計画	バトル場の対象と Phantom Dive のベンチダメージ配分を評価する	ダメージ量ではなくサイドレースを中心にする。
ターゲット評価	サイド価値、エネルギー、どうぐ、進化段階、HP、既知の脅威を重み付けする	主要メタに対するマッチアップ知識を入れる。
展開エンジン	Dreepy、Drakloak、Dragapult ex、Rare Candy、Poffin、サーチラインを進化可能性に応じて評価する	Dragapult の遅れ/不成立という主要負け筋を直接減らす。
サポート抑制	Budew、Meowth ex、Latias ex、Fezandipiti ex、ドロー/サーチの使用を制御する	相手がレースしている時の過剰な受け身ターンを防ぐ。
妨害タイミング	Boss、Crushing Hammer、Unfair Stamp、Watchtower をサイド計画/逆転状況/脅威状態で評価する	妨害を雑に撃たず、局面に効く時だけ使う。
安全ガード	悪いダメージ対象を避け、山札が薄い時のドロー/サーチを制限する	Phantom Dive の無駄打ちとデッキアウトリスクを減らす。

開発プロセス

これはニューラルネットの学習ではなく、反復的な戦略エンジン開発でした。

候補デッキ/エージェントを作る。
パッケージ安全性とデッキ合法性を確認する。
アーカイブ済みの強いエージェント、公開系エージェント、現行候補に対してローカルガントレットを回す。
選別した候補だけをライブ評価に出す。
ライブで弱かった結果を負の証拠として扱う。
繰り返し出る負け筋を狭いヒューリスティック変更に変える。
再検証し、1マッチアップだけ良くなって全体が悪化する変更は捨てる。

最も重要だったのは抑制です。ローカルで良く見えた変更を、そのまま最終改善とは見なしませんでした。その変更が広い相手に耐えるのか、それとも昨日のローカルテストにだけ刺さったのかを常に確認しました。

ローカルとライブのメタ差

大きな発見の1つは、ローカル性能とライブ性能が大きくズレることでした。ローカルテストはデッキミスや明らかな戦術劣化を見つけるには有効ですが、外部の相手プールを完全には代替できません。

例:

同系ミラー調整はローカルでは有望に見えたが、ライブでは 710.9。
target-energy ロジックは 723.7。
no-Hammer-Boss は 842.6。
リソース重視 Dragapult は 959.0 前後まで伸びたが、複雑さが最終 guarded 方針を上回らなかった。
最終 Dragapult 方針は、準備時スナップショット 1068.8、観測最高 #73 / 1076.4 で最も強い方向性だった。

このため、評価の問いは「ローカルで何が勝つか」から「変化する外部メタに何が耐えるか」に変わりました。

選定後の cached leaderboard 検証

最終方針を選んだ後の cached official results も、同じ結論を補強しました。参加プールが大きく増えた後でも、Team TomTom は同じ上位帯を維持しました。後続の cached snapshot では #89 / 3339、スコア 1065.3 で、最終準備時スナップショットに近く、観測最高 #73 / 1076.4 からも大きく崩れていません。

cached official episode の集計からは、この戦略を「解けた」と言うより「堅牢」と表現すべき理由も見えました。TomTom 関連の cached official episode 535件 では、合計 278-250-7、総 rating delta はプラス、勝率は 51%強 でした。これは広い相手プールに対して実際の edge があることを示します。一方で、トップとの差は単純な1つの戦術バグだけでは説明できません。残る課題は、全ゲームで Dragapult をより攻撃的にすることではなく、分散と強い相手ごとのマッチアップ圧力への対応です。

選定後の主な学びは次の通りです。

最終 Dragapult 方針は、フィールド拡大後も崩れず、単発の最高順位より強い堅牢性シグナルになった。
edge は特定の簡単な相手だけでなく、多数の相手に分散していた。
一部の cached matchup では繰り返し負けが残っており、一律のダメージ強化や展開強化ではなく、マッチアップ別のターゲット選択と妨害タイミングが引き続き重要だった。
上位スコアはさらに伸びているため、正直な主張は「強く、堅牢で、十分に検証された」であり、「完全に解けた」ではない。

検証の推移

フェーズ	戦略アイデア	結果	学び
初期の別アーキタイプ	Dragapult 以外の方向性と公開メタ対策	低め、または不安定	負の証拠とマッチアップ探索として有用。
リソース重視 Dragapult	見えているリソースと山札枚数の管理を強化	959.0 前後	リソース管理は重要だが、複雑さは堅牢性リスクも増やす。
Dragapult の局所アブレーション	ミラー調整、target-energy、妨害カード調整	710.9、723.7、842.6	狭いローカル改善は十分に汎化しなかった。
Lucario/Riolu 方針	独立した Lucario 系戦略エンジン	967.6	強い補完・メタ探索材料だが、最終トップではない。
最終 Dragapult 方針	保守的な展開、サイド計画、マッチアップ別ターゲット	スナップショット 1068.8、観測最高 #73 / 1076.4	ライブ証拠、ローカル堅牢性、実装安全性のバランスが最も良い。

検討した代替案

候補	魅力	学び	最終判断
Lucario/Riolu 方針	独立した強いアーキタイプで、Dragapult の弱点に圧力をかけられる	負けがサイドレース圧力由来か展開不全由来かを切り分ける助けになった	二次証拠として保持。最終エージェントにはしない。
Dragapult リソース/サーチ型	リソース推定、サーチ順、妨害枚数、サポートテンポを改善できる	一部は十分強かったが、複雑さが最終トレードオフを改善しなかった	有用な規律だけを単純な guarded 方針へ戻す。
ミラー/target-energy/妨害アブレーション	局所変更で広い方針を上回れるか検証できる	ローカルであり得てもライブでは弱かった	過学習または不完全な変更として棄却。
別アーキタイプ探索	公開メタ脅威と非 Dragapult の道を調べられる	相手・アンカー・脅威分類として価値があった	最終アーキタイプは変えず、ターゲットマップに反映。
Lucario + Dragapult ポートフォリオ分析	異なるアーキタイプがメタを補完するか見られる	メタリスク理解には有用だが、単一エージェントの主張ではない	戦略上の参考としてのみ扱う。

失敗から修正への対応表

失敗モード	症状	対応	判断
Dragapult の遅れ/展開不足	Drakloak / Dragapult が遅い、または立たない	相手がレースしている時は遅いサポート/足止めより進化ラインと盤面安定を優先	最終の展開規律に採用。
Lucario 圧力	Mega Lucario が Dragapult をレースし、弱いサイド計画を咎める	Riolu/Mega Lucario とエネルギー付きアタッカーへのターゲットロジックを追加し、広すぎる補正は避ける	広い anti-Lucario パッチは baseline 未満だったため棄却。
Crustle 壁	壁効果が Phantom Dive の勝ち筋を崩す	Crustle/Dwebble と壁パーツを戦略対象として評価	ターゲットマップの証拠として採用。危険な大改修はしない。
Alakazam/Dunsparce 圧力	コントロール/スタールが Dragapult と Lucario の両方を脅かす	山札枚数とターゲット規律を検証してから採用判断	1テストでは効いたが全体信頼性を落としたため棄却。
過剰対策パッチ	1つの局所マッチアップは改善するが他を悪化させる	保守的な堅牢性と負の証拠を優先	最終方針の中心原則。
単一パッケージ制約	ポートフォリオ分析は安定しても、1エージェントとしては主張できない	メタカバレッジとリスク判断の材料としてのみ使う	文脈として含め、過大主張しない。

Lucario の二次証拠

Lucario/Riolu 方針は最終トップではありませんが、戦略的には有用でした。Dragapult の一部弱点を補い、脅威分類を切り分ける助けになりました。ライブスコア 967.6 は、強い補完・メタ探索材料として十分でした。

Lucario と最終 Dragapult のオフライン比較は 432ゲーム、245-187-0 / 56.7% で安定していました。これは単一エージェントの改善として主張するものではなく、メタカバレッジと提出リスクをどう考えたかの証拠です。

再現性

ローカルの evidence pack には、正確なパッケージ名、ハッシュ、ソース断片、詳細テーブルを監査用に保持しています。本文では、内部ファイル名を前面に出さず、読みやすい戦略説明を優先します。

再現性の基盤は次の通りです。

最終選択方針: 決定論的な Dragapult ex ルールベースエージェント。
ソース: evidence pack 内のエージェント実装と対応するデッキリスト。
検証: Python コンパイル、デッキチェック、Docker/ローカルエンジンのガントレット。
経験的証拠: ライブのスコア/順位スナップショットと提出履歴。
ローカル証拠: anchor、expanded-pit、meta-proxy、ablation、failure-analysis テーブル。

結論

最終戦略は、堅牢でメタ対応した Dragapult ex ルールベースエージェントです。強さの源泉は、ライブ結果を観察し、ローカル検証と比較し、ローカルプロキシが外した点を特定し、繰り返し出た負け筋を狭い脅威分類へ変換し、1つの問題だけを解く変更を捨てる、という規律あるプロセスでした。

安定した盤面展開、サイドレース計画、保守的なマッチアップ別ヒューリスティック、そしてローカルとライブのズレへの謙虚さが、壊れやすい複雑さに勝ちました。失敗した提出や棄却したパッチは無駄ではなく、最終方針を形作った証拠でした。

Robust Meta-Aware Dragapult ex Agent

TL;DR

Our final strategy is a deterministic, rule-based Dragapult ex agent. It focuses on three priorities: establish Dragapult ex reliably, make decisions around the prize race rather than raw damage, and use disruption only when it changes the current game.

The strongest prepared evidence came from Simulation results: Team TomTom reached a best observed rank of #73 with score 1076.4, and the prepared final snapshot showed #86 / 3243 with score 1068.8. We use those results as empirical validation of the strategy, not as the strategy itself.

The key lesson was that local tests and live leaderboard behavior can diverge sharply. The final policy is deliberately conservative: it keeps the core Dragapult plan stable, learns from repeated failure modes, and avoids narrow patches that only solve one matchup.

Final agent

Primary archetype: Dragapult ex with Dreepy, Drakloak, Budew, Fezandipiti ex, Latias ex, Meowth ex, Rare Candy, Crushing Hammer, Boss's Orders, Crispin, Lillie's Determination, and Team Rocket's Watchtower.

For each legal decision, the agent:

Parses the board, hand, discard, available attackers, and visible resources.
Estimates remaining resources where possible.
Builds a prize-race plan around Dragapult ex and Phantom Dive.
Scores legal actions using setup, evolution, attachment, supporter, retreat, Boss target, bench-damage, and disruption modules.
Avoids dangerous low-value actions, including damage into immunity/counter targets and unnecessary draw/search at low deck count.

Why Dragapult was selected

We tested many directions: Dragapult resource variants, Lucario, Iono/Bellibolt, Crustle wall, Abomasnow/Kyogre, same-deck mirror tuning, target-energy variants, no-Hammer/Boss variants, Hop/Dunsparce, Alakazam/Dunsparce, and early search/router ideas.

Several were locally promising, but the live environment punished brittle changes. The final Dragapult policy was strongest because it was stable in the right places:

It prioritizes reaching Dragapult ex before greedier support lines.
It values multi-prize Phantom Dive turns over generic damage.
It maps Boss targets and bench damage by matchup instead of using one global target rule.
It treats Lucario, Alakazam/Dunsparce, Crustle, Iono/Bellibolt, Abomasnow/Kyogre, mirrors, and low-HP swarm boards as different threat classes.
It uses Crushing Hammer, Boss, Unfair Stamp, and Watchtower only when they support the current prize plan or stop a loaded attacker line.
It rejects broad overfit patches even when they improve one smoke test.

Architecture breakdown

Layer	What it does	Why it matters
Performance anchor	Final Dragapult ex policy, best observed rank #73 / score 1076.4; prepared snapshot score 1068.8	Establishes the selected policy as Simulation-backed evidence.
Board/resource parser	Tracks active/bench state, hand/discard counts, visible resources, prize uncertainty, turn logs, and prior KO/item-lock signals	Prevents blind plays and recognizes setup/comeback states.
Prize planner	Evaluates active targets plus Phantom Dive bench-counter combinations	Makes the policy prize-race first rather than damage-first.
Target map	Weights prize value, energy, tools, stage, HP, and known threat pieces	Encodes matchup knowledge for key meta families.
Setup engine	Values Dreepy, Drakloak, Dragapult ex, Rare Candy, Poffin, and search lines when evolution is available	Directly addresses late or missing Dragapult setup.
Support guardrails	Controls Budew, Meowth ex, Latias ex, Fezandipiti ex, and draw/search usage	Prevents over-passive support turns when the opponent is racing.
Disruption timing	Scores Boss, Crushing Hammer, Unfair Stamp, and Watchtower by prize plan/comeback/threat state	Keeps disruption conditional instead of blind.
Safety guards	Avoids poor damage targets and restricts draw/search at low deck count	Reduces wasted counters and deck-out risk.

Development process

This was an iterative strategy-engine project, not a neural-network training run. The loop was:

Build a candidate deck/agent.
Validate package safety and deck legality.
Run local gauntlets against archived strong agents, public-style agents, and current candidates.
Submit only selected candidates for live evaluation.
Treat live underperformance as negative evidence.
Convert repeated failures into narrow heuristic changes.
Re-test; reject changes that improve one matchup while damaging broader robustness.

The most important discipline was restraint. We did not treat every local improvement as a final improvement. We asked whether each change survived broader anchors, or whether it merely exploited yesterday's local test.

Local-vs-live meta gap

One of the biggest findings was that local performance and live performance could differ sharply. Local tests caught deck mistakes and tactical regressions, but they were not a full substitute for the external opponent pool.

Examples:

same-deck mirror tuning looked plausible locally but scored 710.9 live;
target-energy logic scored 723.7;
no-Hammer-Boss scored 842.6;
resource-focused Dragapult reached a stronger prior candidate around 959.0, but added complexity did not beat the final guarded policy;
the final Dragapult policy remained the best prepared direction with 1068.8 in the final snapshot and a best observed #73 / 1076.4.

This changed the evaluation question from “what wins locally?” to “what is robust against a changing external meta?”

Post-selection cached leaderboard checks

After selecting the final direction, cached official results continued to support the same conclusion. The leaderboard pool grew substantially, yet Team TomTom remained in the same upper band: the later cached snapshot showed #89 / 3339 with score 1065.3, close to the prepared final snapshot and below the best observed #73 / 1076.4.

Cached official episode aggregation also showed why the strategy should be described as robust rather than solved. Across 535 cached TomTom-related official episodes, the aggregate record was 278-250-7 with positive total rating delta and a win rate just above 51%. That is consistent with a policy that has real edge across a broad pool, but it also shows that the remaining gap to the very top is not one obvious tactical bug. The harder remaining problem is variance plus matchup-specific pressure from strong opponents, not merely making Dragapult more aggressive in every game.

The main post-selection lessons were:

The final Dragapult policy held up after the field expanded, which is a better robustness signal than a single peak rank.
The edge was distributed across many opponents rather than concentrated in one easy matchup class.
Several cached matchups still showed repeated losses, reinforcing the need for matchup-specific target and disruption timing rather than global damage or setup boosts.
Current top leaderboard scores moved higher than our best snapshot, so the honest claim is “strong, robust, and well-validated,” not “fully solved.”

Evidence trajectory

Phase	Strategic idea	Result	Lesson
Early alternate archetypes	Non-Dragapult directions and public-meta counters	Mostly low or unstable	Useful as negative evidence and matchup probes.
Resource-focused Dragapult	Better visible-resource tracking and deck-count discipline	Around 959.0	Resource tracking mattered, but extra complexity increased robustness risk.
Focused Dragapult ablations	Mirror tuning, target-energy logic, disruption-card ablations	710.9, 723.7, 842.6	Narrow local ideas did not generalize well enough.
Secondary Lucario/Riolu policy	Independent Lucario-based strategy engine	967.6	Strong complement and meta probe, but not the top final strategy.
Final Dragapult policy	Conservative setup, prize planning, matchup-aware targeting	1068.8 snapshot; best observed #73 / 1076.4	Best balance of live evidence, local robustness, and implementation safety.

Alternatives considered

Candidate family	Why it was attractive	What we learned	Final decision
Lucario/Riolu policy	Strong independent archetype and useful pressure into some Dragapult weaknesses	Helped reveal which losses were prize-race pressure versus setup weakness	Kept as secondary evidence, not the final agent.
Dragapult resource/search variants	Improved resource inference, search routing, disruption-card counts, and support tempo	Some variants were respectable, but complexity did not improve the final tradeoff	Fold useful discipline into the simpler guarded policy.
Mirror / target-energy / disruption ablations	Tested whether focused changes could outperform the broader policy	Narrow patches underperformed live even when locally plausible	Rejected as overfit or incomplete.
Alternate archetype tests	Explored public-meta threats and non-Dragapult approaches	Valuable as opponents, anchors, and threat families	Used to shape matchup-specific target maps.
Lucario + Dragapult portfolio analysis	Estimated whether different archetypes covered different meta pockets	Helpful for understanding meta risk, but not a single-agent claim	Reported only as strategy evidence.

Failure-to-patch matrix

Failure mode	Observed symptom	Strategy response	Decision
Late Dragapult / thin setup	Drakloak or Dragapult arrived too late or not at all	Prioritize evolution line and board stability over slow support/stall when opponent is racing	Included in final setup discipline.
Lucario pressure	Mega Lucario raced Dragapult and punished weak prize mapping	Add target logic for Riolu/Mega Lucario and loaded attackers; avoid broad overfit boosts	Broad anti-Lucario patch rejected after underperforming baseline.
Crustle wall pressure	Wall effects disrupted Phantom Dive lines	Score Crustle/Dwebble and wall pieces as strategic targets	Included as target-map evidence, not a risky final patch.
Alakazam/Dunsparce pressure	Control/stall threatened both Dragapult and Lucario	Test deck-count and target discipline before adoption	Helped one smoke test but hurt broader reliability, so rejected.
Overfit counter patches	Candidate improved one local matchup but worsened others	Prefer conservative robustness and negative evidence	Central principle of final policy.
Single-package constraint	Portfolio analysis was stable but cannot be claimed as one agent	Use portfolio only to reason about meta coverage and risk	Included as context, not overclaimed.

Secondary Lucario evidence

The Lucario/Riolu policy was not the final top scorer, but it was strategically useful. It covered some Dragapult weaknesses and helped isolate threat classes. Its live score of 967.6 made it a strong complement and a useful meta probe.

Offline portfolio analysis comparing Lucario and final Dragapult was stable across 432 games at 245-187-0 / 56.7%. This is not claimed as a single-agent improvement; it is evidence for how we reasoned about meta coverage and submission risk.

Reproducibility

The local evidence pack keeps exact package names, hashes, source snippets, and detailed tables for auditability. The public narrative intentionally stays readable and avoids front-loading raw internal file labels.

The reproducibility basis is:

final selected policy: deterministic Dragapult ex rule-based agent;
source: the agent implementation and matching deck list in the evidence pack;
validation: Python compilation, deck checks, and Docker/local-engine gauntlets;
empirical evidence: live score/rank snapshots and submission history;
local evidence: anchor, expanded-pit, meta-proxy, ablation, and failure-analysis tables.

Conclusion

The final strategy is a robust, meta-aware Dragapult ex rule-based agent. Its strength came from a disciplined process: observe live results, compare them against local tests, identify where the local proxy was wrong, convert repeated losses into narrow threat classes, and reject changes that only solved one visible problem.

Reliable board development, prize-race planning, conservative matchup-specific heuristics, and humility about local-vs-live mismatch beat brittle complexity. Failed submissions and rejected patches were not wasted attempts; they were the evidence that shaped the final robust policy.

ポケカを知らない人向け：このAI戦略は何をしているのか

ひとことで言うと

これは、ポケモンカードゲームをプレイするAIに「強いカードを出す」だけでなく、今この試合をどう勝ち切るかを考えさせるための戦略です。

最終的に選んだのは、Dragapult ex というカードを中心にしたルールベースAIです。ルールベースとは、AIが毎回ランダムに考えるのではなく、「この状況なら何を優先するか」という判断ルールをたくさん持っている方式です。

このAIは、単に相手に大きなダメージを出すことだけを狙いません。むしろ、

自分の主力カードを安全に準備する
どの相手を倒せば勝ちに近づくかを考える
妨害カードを無駄に使わない
うまくいかなかった試合から、負け方を分類して改善する

という方針で作っています。

そもそもポケモンカードでは何を競うのか

ポケモンカードは、ざっくり言うと「自分のポケモンで相手のポケモンを倒し、先に勝利条件を満たすゲーム」です。

大事なのは、相手に合計でどれだけダメージを与えたかではありません。どのポケモンを、どの順番で倒すかです。

たとえば、目の前の相手に大きなダメージを出しても、それが勝ちに直結しないことがあります。逆に、少し弱そうに見えるベンチのポケモンを倒すことで、次のターン以降の勝ち筋が一気に近づくこともあります。

この「勝ちまでの道筋」を、本文ではサイドレースや prize race と呼んでいます。一般向けには、どの駒を取れば最短で勝てるかを考えるレースだと思えば十分です。

Dragapult ex を中心にした理由

Dragapult ex は、正面の相手だけでなく、控えにいる相手にも圧力をかけられるカードです。

これはAIにとって相性が良い特徴でした。なぜなら、AIがうまく判断できれば、

今すぐ倒すべき相手
次に倒す準備をしておく相手
放っておくと危険になる相手

を同時に考えられるからです。

ただし、Dragapult ex は最初から強い状態で出てくるわけではありません。ゲーム中に準備して、進化させて、攻撃できる状態まで持っていく必要があります。

そのため、このAIの最初の大きな仕事は、主力を焦らず、でも遅すぎず立てることでした。

AIが毎ターン見ていること

このAIは、毎ターン次のようなことを見ています。

見ているもの	一般向けの意味
自分の場	主力が準備できているか、控えは足りているか
相手の場	どの相手を倒すと勝ちに近いか
手札と使えるカード	今できる行動は何か
残り山札	これ以上カードを引いても安全か
相手の危険カード	放置すると負けにつながる相手はどれか
勝ちまでの距離	今は攻めるべきか、準備すべきか

重要なのは、AIが「一番派手な行動」を選んでいるわけではないことです。

たとえば、強い妨害カードが手札にあっても、今使っても試合が動かないなら温存します。逆に、相手の主力を1ターン遅らせるだけで勝ち筋が太くなるなら、そこで使います。

何が難しかったのか

一番難しかったのは、手元のテストで強い動きが、本番環境でも強いとは限らないことでした。

ローカルテストでは、ある修正が良く見えることがあります。たとえば特定の相手には勝ちやすくなる。しかし、外の環境では別の相手が多かったり、想定していない戦い方をされたりして、全体としては弱くなることがありました。

そのため、開発では次のような姿勢を取りました。

1つの相手にだけ勝てる修正は疑う
失敗した試合を「なぜ負けたか」で分類する
大きく複雑な変更より、小さく安全な改善を優先する
ローカルで強くても、ライブ結果が悪ければ採用しない

この意味で、今回のAIは「一発で天才的な答えを出したAI」ではありません。むしろ、失敗を見て、壊れにくい判断ルールへ整理していったAIです。

実際にどう強くなったのか

最終方針は、Simulation の結果で一定の強さを示しました。Team TomTom は観測上の最高で #73、スコア 1076.4 に到達しました。後続の cached snapshot でも #89 / 3339、スコア 1065.3 と、参加チームが増えた後も大きく崩れていません。

また、cached official episode の集計では、TomTom 関連の 535試合 で 278勝・250敗・7分け でした。勝率は圧倒的ではありませんが、広い相手プールに対して少しずつ有利を積み重ねるタイプの結果です。

これは、ポケカのように運や相性の影響があるゲームでは重要です。常に完勝するAIではなく、いろいろな相手に対して大崩れしにくいAIを目指しました。

採用しなかった改善も重要だった

開発中には、他にも多くの案を試しました。

たとえば、別の主力カードを使う案、特定の相手への対策を強める案、妨害カードの使い方を変える案などです。

しかし、その多くは最終採用しませんでした。理由は、ある一部では良くても、全体では安定しなかったからです。

今回の方針で大事だったのは、強そうに見える変更を全部入れないことでした。

ゲームAIでは、機能を足せば足すほど賢く見えます。しかし実際には、条件が増えすぎると、想定外の場面で変な判断をすることがあります。だから最終版では、

主力を立てる
勝ちに近い相手を選ぶ
妨害は効く時だけ使う
危険な無駄行動を避ける

という基本を優先しました。

この戦略の特徴

このAIの特徴を一般向けに言うと、次の4つです。

準備が安定している

主力カードを出すまでの手順を重視し、序盤で迷子になりにくくしています。

勝ち筋から逆算する

目の前のダメージではなく、何を倒せば勝利に近づくかを考えます。

妨害を雑に使わない

相手を邪魔するカードは、試合の流れを変える時だけ高く評価します。

過剰な対策を避ける

特定の相手にだけ刺さる修正より、広い相手に壊れにくい方針を優先します。

たとえるなら

このAIは、将棋やチェスで「一番強そうな駒を動かす」AIではありません。

どちらかというと、

まず自分の形を作る
相手の次の脅威を見る
どの駒を取れば勝ちまで近いかを考える
無理な読みで一気に勝とうとしない
負けた形を覚えて、同じ崩れ方を減らす

というタイプです。

派手な一手より、毎ターンの小さな判断を安定させることで、長い試合全体の勝率を上げることを狙っています。

まとめ

今回の戦略は、ポケモンカードの細かいルールを知らなくても、次のように理解できます。

「主力を安定して準備し、勝ちまでの道筋から逆算して相手を選び、妨害を必要な時だけ使う、壊れにくいゲームAI」です。

最高順位だけを見ると、まだトップを完全に倒し切ったわけではありません。しかし、参加プールが広がっても上位帯を維持し、広い相手に対して少しずつ有利を積み重ねられた点が、この方針の強さです。

最終的な学びは、複雑な小技を増やすことではなく、勝ち筋・準備・相手ごとの脅威判断を、シンプルで壊れにくい形にまとめることでした。