On the second day of OpenAI’s Spring Collar Conference, Google faced off with the new I/O Leaders Conference.
This movement has been full of gunpowder since 1am Beijing time on May 15th. Google made a decision at the meeting to “declare everything”: continuously leading and innovating more than ten products, including AI assisted Astra, the Bunsen image model Imagen3, the Bunsen video model Veo benchmarking Sora, and the highly anticipated flagship model Gemini.

When OpenAI lost its search engine and instead launched the latest flagship model GPT-4o, Google, which had long occupied the dominant position in search engines, not only re planned AI search engines but also simultaneously launched AI image recognition assistance.
Gemini’s new voice dialogue feature, Live, directly benchmarks against OpenAI’s GPT-4o. It also does not allow real-time questioning of surrounding situations through mobile phones, allowing for timely follow-up even during continuous conversations.
In addition, Google Reader Chrome will add Gemini Nano. The latter is a lightweight version of the Gemini series, mainly designed for stationary equipment planning.
Google also stated that another small model, Gemma 2.0, will be launched this summer, including the open-source model PaliGemma, which will not be used for logo photos and adding titles to images. The Gemma model adopts the same skill stack as the Gemini model, but with a smaller scope, making it suitable for layout in capital constrained situations.
On a large scale, the artificial intelligence competition is also a competition for smartphones. Google’s Vice President of Product Governance, Sameer Samat, clearly stated that Google will further optimize the Android operating system through Gemini. This optimization will first lose performance on Google’s own phone Pixel.
Gemini is obviously the supporting role of the collar cloth meeting, especially the multimodal and long context skills.
In the past few months, Google has launched the Gemini 1.5 Pro, which can potentially stop long context previews, and has made a series of improvements in translation, encoding, and inference. At present, the context length of Gemini 1.5 Pro has been innovated from 1 million tokens (the basic unit of text processing punishment) to 2 million tokens, doubling in three months, indicating that the company is eager to show off its muscles to the outside world.
At this time, Gemini had been around for a year, and this multimodal model could already stop reasoning across text, images, videos, code, and more. According to Google, 2 billion users and over 1.5 million employees are using the Gemini model. Will this model be used to debug code, gain new perspectives, and establish the next generation of artificial intelligence application methods.
In order to further showcase the various personalities of the model, Google has provided more detailed introductions for different scenarios such as search, photos, and Android system.
For example, in terms of search and scraping, Gemini has brought comprehensive AI reform to it. Users can stop asking for innovative, longer, and more complex results, and even stop searching for photos. Google plans to launch “AI Summary” search in the United States starting this week, and it will be launched in other countries in the future.
Google demonstrated the function of “interrogating photos” on site. When users pay in the parking lot but forget their license plate number, they usually search for the crux of the problem in their phone photos and read a large number of photos to explore the license plate. But now, simply interrogating photos can accurately identify frequently flashing cars, stop triangulating the vehicles, and display the license plate numbers.
For example, you can ask the photo leader when your child learned to swim, or even let the photo show you how their swimming stopped.
Gemini is not only a chatbot, but also a personal helper who can assist users in dealing with complex missions and behaviors. Gemini 1.5 Pro has also been introduced into Google Cloud Computing Services Google Work Space. Google claims that Gemini can achieve all the necessary rest steps. Taking returns as an example, whether AI searches for receipts in emails, finds the corresponding order number, automatically fills out the return form, and controls the shipment.
A big model is a computing power competition, and practicing the most advanced models requires a lot of computing power. In the past six years, the demand for mechanical training and calculation in the industry has increased by one million times, and it is increasing tenfold every year. As an important participant in the era of action AI, Google has also led the way in fundamental measures.
That night, Google immediately launched the sixth generation TPU (a specific integrated circuit designed by Google to accelerate mechanical training and rest load) – “Trillium”, and claimed that Trillium is its most functional and efficient TPU to date. Compared with the previous generation TPU v5e, the computing power of each chip has improved by 4.7 times, and the plan is to supply it to customers by the end of this year.
Gemini completely refuses to practice and do things on Google’s self-developed fourth and fifth generation TPUs, and other leading artificial intelligence companies, including Antioptics, have also practiced their models on TPUs.
But while Google is focusing on the AI benefits of its various products, it means that users need to make more concessions to their personal and covert data. Google promises not to use user literature on its platform to practice Gemini or other artificial intelligence models.
Google CEO Pichai stated that 121 instances of “AI” were mentioned at the collar meeting that day, which is enough to explain the importance of AI to Google. But apart from exaggerating its importance, the anticipated attack on OpenAI by the outside world did not bring any greater surprises.

作者 admin


您的电子邮箱地址不会被公开。 必填项已用 * 标注