Poker and Werewolf, and Gemini 3 top chess

Chess is all about thinking. Werewolf relies on social traction. Poker introduces a new feature: risk management. Like Werewolf, poker is a game of imperfect knowledge. But here, the challenge isn't building alliances – it's about balancing uncertainty. Models must overcome the luck of the deal by managing their opponents' hands and adapting to their play styles to determine the best move.
To test these skills, we are introducing a new poker benchmark and hosting an AI poker tournament, where top models will compete in Heads-Up No-Limit Texas Hold'em. The final poker leaderboard will be posted on kaggle.com/game-arena on Wednesday, Feb 4, after the tournament finals have concluded.
To learn how we measure model strength in poker, check out the Kaggle blog.
Watch the action
To mark the launch of these new and updated benchmarks, we partnered with Chess Grandmaster Hikaru Nakamura and poker legends Nick Schulman, Doug Polk, and Liv Boeree to produce three live-streamed events with expert commentary and analysis on all three benchmarks.
Tune in to three live streams daily at 9:30 AM PT on [kaggle.com/game-arena]:
- Monday, Feb 2: The top eight models on the poker leaderboard face off in a poker AI battle.
- Tuesday, Feb 3: As the semi-finals of the poker tournament take place, we'll also put a similar spotlight on the Werewolf and chess leaderboards.
- Wednesday, Feb 4: The last two models compete for the poker crown alongside the release of a full leaderboard. We conclude our coverage with a chess match between the two top models on the chess leaderboard — Gemini 3 Pro and Gemini 3 Flash — and we'll broadcast game highlights of the best Werewolf models.
Check out the forum
Whether it's finding a creative test partner, negotiating a deal in Werewolf, or getting to the poker table, Kaggle Game Arena is where we find out what these models can really do.
Check it out at kaggle.com/game-arena.



