Google Deepmind introduces SIMA 2, Gemini Powered Gempaniwe with the world's most powerful 3D complex

Google Depmind has released SIMA 2 to test how well a doctor with normal difficulty can fit inside the complex worlds of 3D Worlds. Sima's (Scable Scallorld Agent) New Version) has developed the first follower of orders in the program run by Gemini reasons related to the objectives, explains its plans, and advances in calculation in many different areas.
From Sima 1 to Sima 2
The First Sima, released in 2024, has learned more than 600 languages following skills such as 'turn left', 'climb the ladder', and 'open the map'. Controls commercial games only from rendered pixels and physical keyboard and mouse, without access to Game Internals. On complex tasks, DeepMind reported a Sima 1 Success rate of 31 percent, while human players reached about 71 percent on the same benchmark.
Sima 2 retains the same interface but replaces the main policy with the Gemini model. According to the article of techcrunch that the program uses Gemini 2.5 flash lite as a reference engine. This change of SIMA from a direct representation between pixels and actions to an agent that creates an internal system, language reasons, and then performs the sequence of action required in the game. DeepMind describes this as moving from a command follower to a virtual gaming companion that interacts with the player.

Properties, gemini in loop control
The structure of Sima 2 The structure includes Gemini as the core of the agent. The model receives visual observations and user commands, implements a high-level goal, and generates actions sent through a physical keyboard and mouse interface. The training uses a combination of human demonstration videos with language labels and labels produced by Gemini itself. This monitoring allows the agent to align its internal thinking about human intention with a model made of behavioral explanations.
Thanks to this training program, SIMA 2 can explain what it is like and calculate the steps it will take. Essentially, this means that the agent can answer questions about its current intention, justify its decisions, and reveal an explanatory chain of reasoning about nature.
General performance and performance
The work to finish the job shows sIMA 1 at about 31% and SIMA 2 at 62% that value in the main evaluation suite, with people around 70% range. Combining Gemini doubles the performance of the original agent in complex tasks. The important point is not the exact number, it's the situation, the new agent closes most of the measured gap between SIMA 1 and human players in long, language-imposed training games.
In REFUND like Aska and Minedojo, never seen during training, the DepMind group shows the same pattern. Sima 2 performed significantly higher than sima 1 in these areas, showing a real advantage ZERO Shot General Instead of getting overwhelmed by a set of organized games. An agent also conveys abstract concepts, for example it can also use the understanding of 'mining' in one subject when asked to 'harvest' in another.
Multimodal commands
Sima 2 extends the command channel beyond plain text. DeepMind's demonstrations show the agent following spoken commands, responding to drawings drawn on the screen, and performing tasks from Empjis using only emojis. In one example, a user asks Sima 2 to go to 'a house that is the color of a ripe tomato'. The basic Gemini reasons when ripe tomatoes are red, then choose and go to the red house.
Gemini also enables next-generation instruction in multiple natural languages and supports hybrid limitations where languages and visuals are combined. In visual AI, robotic devs, this is a stack of multimodal concrete, LICKSORT collaborative text, Sound, images, and game actions, and the agent uses this representation in illegal images in concrete controlled sequences.
Self-improvement by measure
One of the main research contributions to SIMA 2 is the clear development loop. After the first phase that uses human gameplay as a basis, the group moves the agent to new games and allows you to learn only through its experience. The unique Gemini model creates new tasks for the agent in each world, and the reward model receives each attempt.
These data are stored in the generated data bank. The latest generations of Sima 2 Use this data during training, which allows the agent to succeed in tasks where previous generations failed, without showing new people. This is a concrete example of multitask, a model in the data loop of the loop, where the linguistic model specifies the objectives and provides feedback, and the agent transforms the feedback into new competent policies.
Genie 3 Worlds
To push progress further, DeepMind integrates Sima 2 with Genie 3, a landscape model that generates 3D functional environments from a single image or text. In these virtual worlds, the agent must act on its own, rescue commands, and act on goals even though the geometry and assets are different in all training games.
The reported behavior is that SIMA 2 can navigate these genie 3 scenes, point to objects such as benches and trees, and perform the requested actions accordingly. This is important for investigators, it shows that a single agent can work across commercial topics and uses the same areas of consultation and control.
Key acquisition
- Gemini Cended Architecture: Sama 2 includes Gemini 2.5 Flash Lite, as basic editing and programming, wrapped in a stack of visuomotor controls from pixel by pixel and mouse in many commercial games.
- Performance measurement jumps above SIMA 1: In Deepmind's Main Task Suite, Sima 2 almost doubles Sima 1's 31 percent in training games, while also delivering higher success rates in suggested areas like Aska and Neminedojo.
- Multimodal commands, next next: The agent can follow long, complex commands and supports multimodal expression, including speech, drawings, and emojis, with basic language and symbols on shared platforms for game visualization.
- Self-improvement with a model generated by tasks and rewards: Stop 2 Using a Gem-based teacher to generate tasks and a reward model learned to score points, build a growing Settlement Bank that allows for the latest generations of additional agents without additional human exposures.
- Stress testing with Genie 3 and robot results: Coupling Sima 2 with Genie 3, which includes interactive 3D environments from images or text, show that the agent is able to transfer the newly produced capabilities of the integrated agents to work with ordinary agents and, finally, global robots that are able to work well.
Sima 2 is a logical Milestone program rather than a simple benchmark win. By stimulating the Gemini model of Gemini 2 Overall, Sima 2 shows how the Gemid Geminid stack can serve as a practical symbol for advocates of the general goals of the Romatic Agents.
Look Technical details. Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.
AsifAzzaq is the CEO of MarktechPost Media Inc.. as a visionary entrepreneur and developer, Asifi is committed to harnessing the power of social intelligence for good. His latest effort is the launch of a media intelligence platform, MarktechPpost, which stands out for its deep understanding of machine learning and deep learning stories that are technically sound and easily understood by a wide audience. The platform sticks to more than two million monthly views, which shows its popularity among the audience.
Follow Marktechpost: Add us as a favorite source on Google.



