Detail Explanation of Gemini 1.5 Pro LLM Model

Discover Gemini 1.5 from Google, featuring groundbreaking long-context understanding and efficient Mixture-of-Experts architecture. With a context window of up to 1 million tokens, Gemini 1.5 Pro excels in complex reasoning and multimodal tasks. Experience enhanced AI performance and build innovative applications with AI Studio and Vertex AI.

This is an exciting time for AI. New advances in the field have the potential to make AI more helpful for billions of people over the coming years. Since introducing Gemini 1.0, extensive testing, refining, and enhancing of its capabilities have been carried out. Gemini 1.5 delivers dramatically enhanced performance. It represents a step change in approach, building upon research and engineering innovations across nearly every part of foundation model development and infrastructure. This includes making Gemini 1.5 more efficient to train and serve, with a new Mixture-of-Experts (MoE) architecture. The first Gemini 1.5 model being released for early testing is Gemini 1.5 Pro. It’s a mid-size multimodal model, optimized for scaling across a wide range of tasks, and performs at a similar level to 1.0 Ultra, the largest model to date. It also introduces a breakthrough experimental feature in long-context understanding. Gemini 1.5 Pro comes with a standard 128,000 token context window. But starting today, a limited group of developers and enterprise customers can try it with a context window of up to 1 million tokens via Vertex AI in private preview. As the full 1 million token context window is rolled out, active work on optimizations to improve latency, reduce computational requirements, and enhance the user experience continues. This breakthrough capability is now available for testing, and more details on future availability are shared below. These continued advances in the next-generation models will open up new possibilities for people, developers, and enterprises to create, discover, and build using AI.

Highly Efficient Architecture

Gemini 1.5 is built upon leading research on Transformer and MoE architecture. While a traditional Transformer functions as one large neural network, MoE models are divided into smaller "expert” neural networks. Depending on the type of input given, MoE models learn to selectively activate only the most relevant expert pathways in its neural network. This specialization massively enhances the model’s efficiency. Google has been an early adopter and pioneer of the MoE technique for deep learning through research such as Sparsely-Gated MoE, GShard-Transformer, Switch-Transformer, M4, and more. The latest innovations in model architecture allow Gemini 1.5 to learn complex tasks more quickly and maintain quality, while being more efficient to train and serve. These efficiencies are helping teams iterate, train, and deliver more advanced versions of Gemini faster than ever before, and further optimizations are being worked on.

Greater Context, More Helpful Capabilities

An AI model’s “context window” is made up of tokens, which are the building blocks used for processing information. Tokens can be entire parts or subsections of words, images, videos, audio, or code. The bigger a model’s context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant, and useful. Through a series of machine learning innovations, 1.5 Pro’s context window capacity has been increased far beyond the original 32,000 tokens for Gemini 1.0. It can now run up to 1 million tokens in production. This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In research, it has also successfully tested up to 10 million tokens.

Complex Reasoning About Vast Amounts of Information

1.5 Pro can seamlessly analyze, classify, and summarize large amounts of content within a given prompt. For example, when given the 402-page transcripts from Apollo 11’s mission to the moon, it can reason about conversations, events, and details found across the document.

Better Understanding and Reasoning Across Modalities

1.5 Pro can perform highly-sophisticated understanding and reasoning tasks for different modalities, including video. For instance, when given a 44-minute silent Buster Keaton movie, the model can accurately analyze various plot points and events, and even reason about small details in the movie that could easily be missed.

Relevant Problem-Solving with Longer Blocks of Code

1.5 Pro can perform more relevant problem-solving tasks across longer blocks of code. When given a prompt with more than 100,000 lines of code, it can better reason across examples, suggest helpful modifications, and give explanations about how different parts of the code works.

Enhanced Performance

When tested on a comprehensive panel of text, code, image, audio, and video evaluations, 1.5 Pro outperforms 1.0 Pro on 87% of the benchmarks used for developing Google’s large language models (LLMs). And when compared to 1.0 Ultra on the same benchmarks, it performs at a broadly similar level.

Gemini 1.5 Pro maintains high levels of performance even as its context window increases. In the Needle In A Haystack (NIAH) evaluation, where a small piece of text containing a particular fact or statement is purposely placed within a long block of text, 1.5 Pro found the embedded text 99% of the time, in blocks of data as long as 1 million tokens. Gemini 1.5 Pro also shows impressive “in-context learning” skills, meaning that it can learn a new skill from information given in a long prompt, without needing additional fine-tuning. This skill was tested on the Machine Translation from One Book (MTOB) benchmark, which shows how well the model learns from information it’s never seen before. When given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person learning from the same content. As 1.5 Pro’s long context window is the first of its kind among large-scale models, new evaluations and benchmarks for testing its novel capabilities are continuously being developed.

Build with Gemini

Gemini 1.5 is available for early testing with a limited group of developers and enterprise customers. Developers can access the Gemini 1.5 Pro API via AI Studio and Vertex AI to start building with the new model.