Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Genie 3 of Google Deepmind can dynamically modify the state of its simulated worlds


In early December, Google Deepmind Genie 2 released. The Genie family of AI systems is called global models. They are able to generate images because the user – be a human or, more likely, an automated AI agent – moves the world that the software simulates. The resulting video of the model in action can look like a video game, but Deepmind has always positioned Genie 2 as a way to train other AI systems to be better in what they are designed to accomplish. With its new Genie 3 model, which the laboratory announced on Tuesday, Deepmind thinks it has made an even better system for the training of AI agents.

At first glance, the jump between Genie 2 and 3 is not as dramatic as the one that the model made last year. With Genie 2, the Deepmind system has become able to generate 3D worlds and could accurately rebuild part of the environment even after the user or an AI agent has let him explore other parts of the generated scene. Environmental consistency was often a weakness of previous world models. For example, Decart oasis The system had trouble remembering the arrangement of Minecraft levels it would generate.

In comparison, the improvements offered by Genie 3 seem more modest, but in a press briefing, Google was highlighted today’s official announcement, Shlomi Fruchter, research director at Deepmind, and Jack Parker-Holder, scientific researcher in Deepmind, argued that they represented important steping on stones on the way to artificial general intelligence.

So, what exactly Genie 3 does? To start, he publishes images at 720p, instead of 360p like his predecessor. He is also able to maintain a “coherent” simulation longer. Genie 2 had a theoretical limit of up to 60 seconds, but in practice, the model often started to hallucinate much earlier. On the other hand, Deepmind says that Genie 3 is able to run for several minutes before starting to produce artifacts.

Deepmind also calls new in the model. Genie 2 was interactive insofar as the user or an AI agent was able to enter the movement commands and the model would respond after having a few moments to generate the following frame. Genie 3 does this work in real time. In addition, it is possible to modify the simulation with text prompts which invite Genie to modify the state of the world it generates. In a demo, Deepmind showed that the model was invited to insert a herd of deer into a scene from a person skiaing on a mountain. The deer has not moved in the most realistic way, but it is the killed Genie 3 characteristic, explains Deepmind.

As mentioned above, the laboratory mainly considers the model as a tool for training and evaluating AI agents. Deepmind says that Genie 3 could be used to teach AI systems to tackle the “and if” scenarios which are not covered by their pre-training. “There are a lot of things that must happen before a model can be deployed in the real world, but we see it as a way to form models more effectively and increase their reliability,” said Fruchter, pointing, for example, a scenario where Genie 3 could be used to teach a self-designed car how to avoid a pedestrian who leads them.

A GIF demonstrating the great interactivity of Genie 3,

A GIF demonstrating the great interactivity of Genie 3,

(Google Deepmind)

Despite the improvements that Deepmind has made to Genie, the laboratory recognizes that there is a lot of work to do. For example, the model cannot generate real world locations with perfect precision, and it fights with text rendering. In addition, for Genie to be really useful, Deepmind thinks that the model must be able to maintain a simulated world for hours, not minutes. However, the laboratory thinks that Genie is ready to have a real impact.

“We already have to the point where you would not use (genius) as the only training environment, but you can certainly find things that you would not want agents to do because they act dangerous in certain contexts, even if these parameters are not perfect, it’s always good to know,” said Parker-Holder. “You can already see where it goes. It will become more and more useful as the models improve.”

For the moment, Genie 3 is not available for the general public. However, Deepmind says he works to make the model available to additional testers.

(Tagstotranslate) Google Deepmind (T) World models



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *