Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Google Gemini panicked by playing Pokémon


AI companies are fighting to dominate the industry, but sometimes they are also fighting in Pokémon gymnasiums.

As Google And Anthropic Both are studying how their latest AI models early navigating in Pokémon games, the results can be as fun as they are enlightening – and this time Google Deepmind has Written in a report This gemini 2.5 Pro uses panic when his Pokémon are close to death. This can lead to AI performance experience to undergo a “qualitatively observable degradation in the model’s reasoning capacity”, according to the report.

The comparative analysis of the AI ​​- or the process of comparing the performance of different models of AI – is a Dubious art which often provides little context For the real capacities of a given model. But some researchers think that Study how AI models play video games could be useful (or, at the very least, a little funny).

In recent months, two developers not affiliated with Google and Anthropic have set up respective Twitch flows called “Gemini plays Pokémon” And “Claude plays Pokémon», Where anyone can look in real time as an AI tries to sail in a children’s video game over 25 years ago.

Each flow displays the AI ​​”reasoning” process – or, a translation of natural language of the way AI assesses a problem and comes to an answer – giving us an overview of how these models work.

Image credits:Google

Although the progress of these AI models are impressive, they are still not very good at playing Pokémon. It takes hundreds of hours to Gemini to reason through a game that a child could end in exponentially less time.

What is interesting to look at an AI navigate in a Pokémon game is not so much their completion time, but rather how it behaves along the way.

“During the game, Gemini 2.5 Pro is part of various situations that make the model simulate” panic “,” says the report.

This state of “panic” can lead to the worst of the model performance, as AI can suddenly stop using certain tools at its disposal for a gameplay extent. Although AI does not think or does not feel emotion, its actions imitate the way in which a human could make poor and hasty decisions when they are under stress – a fascinating but disturbing response.

“This behavior occurred in enough separate cases so that the members of the Twitch cat actively noticed when it occurs,” said the report.

Claude also presented curious behavior in his travels through Kanto. In a case, AI has taken up the model that when all of its Pokémon lack health, the character of the player “bleached” and will return to a Pokémon center.

When Claude remained stuck in the cave of MT. Moon, he wrongly hypothesized that if she intentionally made all his Pokémon to pass out, he would be transported through the cave to the Pokémon Center in the next city.

However, this is not how the game works. When all your Pokémon die, you return to the Pokémon Center that you have used more recently, rather than the closest geographically. Viewers watched the AI ​​with horror trying essentially to commit suicide in the game.

Despite its shortcomings, there are several ways whose AI can surpass human players. From the release of Gemini 2.5 Pro, AI is capable of solving puzzles with impressive precision.

With human aid, the aging tools created by AI – prompted gemini 2.5 pro instances intended for specific tasks – to solve the rock puzzles of the game and find effective routes to reach a destination.

“With only one prompt describing the physics of the rock and a description of how to check a valid path, Gemini 2.5 Pro is capable of making a single stroke of these complex rock puzzles, which are necessary to progress by Victory Road”, indicates the report.

Since Gemini 2.5 Pro has done a lot of work in creating these tools, Google theorizes that the current model can be able to create these tools without human intervention. Who knows, perhaps Gemini therapy by creating a “Don’t Panic” module.

(Tagstotranslate) Claude



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *