Google AI agent completes tasks independently

by times news cr

Gemini 2.0

Google’s new AI agent completes tasks independently


12.12.2024Reading time: 2 min.

Gemini 2.0: The new version of Google AI can generate images and audio output itself. (What: Google)

With Gemini 2.0, Google is introducing a new generation of AI agents that can carry out tasks independently and process multimodal data.

Google has introduced a new version of its artificial intelligence Gemini. In the future, the system will be able to independently carry out certain tasks as a digital assistant. As the technology group announced in a blog post, the AI ​​can, for example, search for components for hobby projects in online shops and independently place them in the shop’s shopping cart. However, the final purchase decision remains with people.

The new version builds on the previous model Gemini 1.5 and significantly expands its capabilities. The system can now not only process text, images and audio data, but also generate images and audio output itself. In addition, Gemini 2.0 can independently access Google products such as the search function and execute program code.

A core part of the development is “Project Mariner”. This research prototype allows AI to navigate websites like a human. “It can click, tap and scroll just like you as a user,” explained Google manager Tulsee Doshi. The system is programmed in such a way that it does not carry out certain sensitive actions without asking. For example, the user’s express consent must be obtained before completing a purchase.

Google CEO Sundar Pichai called the development a “new era of agents.” While the first generation Gemini 1.0 was about organizing and understanding information, version 2.0 should be much more useful. The AI ​​can now think several steps in advance and carry out tasks on behalf of the users – always under their control.

In addition to the browser assistant, Google is developing other applications based on Gemini 2.0. In “Project Astra,” the company is working on smart glasses that, like Meta’s model, can display additional information about buildings or works of art. There are also innovations for developers: The Gemini Flash 2.0 system variant can now also run locally on computers and certain smartphone models.

The new version will first be tested by selected developers and test subjects before it is made available to the wider public. The multimodal edition should be available to all developers from January. At the same time, Google plans to integrate Gemini 2.0 into other company products.

You may also like

Leave a Comment