Alphabet-owned Google has launched Gemini, its most capable generative artificial intelligence tool yet.
With the launch, the California-based company is aiming to grab a huge chunk of the generative AI market that was first disrupted by Microsoft-backed OpenAI last year with the launch of ChatGPT.
Here, media looks at various features of Gemini and other options available in the market.
Gemini is the first AI model to beat human experts on MMLU (Massive Multitask Language Understanding) that is one of the widely used methods to test the knowledge and problem-solving abilities of AI.
It can comprehend diverse tasks and generate code based on different inputs – an innovation poised to revolutionise problem-solving capabilities. It can independently navigate and merge diverse information types such as text, code, audio, images and video, thereby operating across varied data formats.
Its ability to extract insights from hundreds of thousands of documents through reading, filtering and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance, Google said.
Started in 2010, Google DeepMind played a crucial role in developing Gemini.
So far, it has brought together new ideas in machine learning, neuroscience, engineering, maths, simulation and computing infrastructure, along with new ways of organising scientific endeavours.
Gemini is the result of DeepMind’s efforts to produce AI that “feels less like a smart piece of software and more like something useful and intuitive”, said Demis Hassabis, chief executive and co-founder of Google DeepMind.
The first version, Gemini 1.0, is optimised for different sizes – nano, pro and ultra – to ensure it can run on everything from resource-intensive data centres to small mobile devices.
Google said Gemini is the most capable and general AI model it has ever built.
Thus far, the standard approach to creating multimodal models involved training separate components for different modalities and then stitching them together. These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning.