Google Gemini Is Latest AI Technology || Gemini Is An Advanced Revolution In AI Field.
Google Gemini: Everything you need to know about its new generative AI platform
Google is trying to make waves with Gemini, its flagship suite of generative AI models, apps, and services. But what exactly is Gemini? How do you use it? And how does it compare to other generative AI tools like OpenAI’s ChatGPT, Meta’s Llama, and Microsoft’s Copilot?
To make it easier to keep up with the latest developments in Gemini, we’ve put together this helpful guide, which will continue to be updated as new Gemini models, features, and news about Google’s Gemini initiatives are announced.
What is Gemini?
Gemini is Google’s long-promised family of next-generation generative AI models. Developed by Google’s AI research lab DeepMind and Google Research, it comes in four types:
- The common meaning of "Gemini Ultra" is "Gemini Ultra"
- The common meaning of "Gemini Pro" is "Gemini Pro"
- Gemini Flash, a faster, "slimmed-down" version of the Pro
- The Gemini Nano comes in two smaller models: the Nano-1 and the slightly more powerful Nano-2, the latter designed to operate offline.
All Gemini models are trained to be natively multimodal — in other words, able to process and analyze more than just text. Google says they are pre-trained and fine-tuned on a variety of public, proprietary, and licensed audio, images, and videos, a set of code libraries, and text in different languages.
This makes Gemini different from models like Google’s own LaMDA, which is trained only on text data. LaMDA can’t understand or generate anything outside of text (like articles, emails, etc.), but this doesn’t have to be the case with the Gemini model.
We should note that there is real ambiguity about the ethics and legality of training models on public data, without the knowledge or consent of the data owner, in some cases. Google has an AI Indemnity Policy that is designed to protect certain Google Cloud customers from lawsuits, but there are some exclusions to the policy. Proceed with caution - especially if you plan to use Gemini for commercial purposes.
What is the difference between the Gemini app and the Gemini model?
Gemini is separate and distinct from the Gemini app for web and mobile (formerly Bard).
Gemini apps are clients that connect to various Gemini models and overlay a chatbot-like interface on top. Think of them as a front end for Google’s generative AI, similar to ChatGPT and Anthropic’s Claude series of apps.
Gemini’s services on the web are here. On Android, the Gemini app replaces the existing Google Assistant app. On iOS, the Google and Google Search apps serve as Gemini clients for that platform.
On Android, it's also recently been possible to bring up the Gemini overlay on top of any app to ask questions about what's on the screen (a YouTube video, for example). Just long-press the power button on a supported smartphone or say "Hey, Google"; you'll see the overlay pop up.
The Gemini app can take images, voice commands, and text — including files like PDFs and, soon, videos, either uploaded or imported from Google Drive — and can generate images. As you’d expect, if you’re logged into the same Google account on both your mobile and web use, your conversations with the Gemini app on mobile will carry over to the web, and vice versa.
The common meaning of "Gemini Advanced" is "Gemini Advanced"
The Gemini app isn't the only way to get the Gemini model to help you get things done. Slowly but surely, Gemini-infused features are trickling down to popular Google apps and services like Gmail and Google Docs.
To take advantage of most of these features, you need the Google One AI Premium plan. Technically, the AI Premium plan is part of Google One, costs $20 and enables the use of Gemini in Google Workspace apps like Docs, Slides, Sheets and Meet. It also supports what Google calls Gemini Premium, which brings the company's more sophisticated Gemini models to Gemini apps.
Gemini Premium users also get perks here and there, like early access to new features, the ability to run and edit Python code directly in Gemini, and a larger "context window." Gemini Premium can remember and reason about 750,000 words (or 1,500 pages of documents) in a single session. By comparison, the regular Gemini app can only handle 24,000 words (or 48 pages).
Another exclusive feature of Gemini Premium is trip planning in Google Search, which creates a customized travel itinerary based on prompts. Taking into account factors such as flight times (from emails in the user's Gmail inbox), dining preferences and local attractions (from Google Search and Maps data), as well as the distance between these attractions, Gemini will generate an itinerary that automatically updates to reflect any changes.
Gemini’s use of Google services is also available to enterprise customers through two plans: Gemini Business (an add-on to Google Workspace) and Gemini Enterprise. Gemini Business starts at $20 per user per month, while Gemini Enterprise — which adds meeting notes and translated subtitles as well as document classification and annotation — is priced at $30 per user per month and up. (Both plans require a one-year agreement.)
Gemini in Gmail, Docs, Chrome, Dev Tools, and More
In Gmail, Gemini lives in a sidebar that composes emails and summarizes message threads. You'll find the same sidebar in Docs, where it helps you write and refine content and come up with new ideas. In Slides, Gemini generates slides and custom images. And in Google Sheets, Gemini tracks and organizes data, creating tables and formulas.
Gemini’s reach also extends to Drive, where it can summarize files and provide quick information about projects. Meanwhile, in Meet, Gemini can translate subtitles into other languages.
Gemini recently arrived as an AI writing tool in Google's Chrome browser. You can use it to compose brand new content or rewrite existing text; Google says it takes into account the webpage you're on to make suggestions.
Elsewhere, you’ll find Gemini in Google’s database products, cloud security tools, app development platforms (including Firebase and Project IDX), and apps like Google Photos (where Gemini handles natural language search queries), YouTube (where it helps come up with video ideas), and the NotebookLM note-taking assistant.
Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, is offloading the heavy lifting to Gemini. The same is true for Google’s Gemini-powered security products, such as Gemini for Threat Intelligence, which analyzes large amounts of potentially malicious code and lets users use natural language to search for signs of ongoing threats or compromises.
Gemini Extensions and Gems
Announced at Google I/O 2024, Gemini premium users can create Gems, custom chatbots powered by Gemini models. Gems can be generated from natural language descriptions — for example, “You’re my running coach. Give me a daily running plan” — and can be shared with others or kept private.
Gems is available on desktop and mobile in 150 countries and most languages. Eventually, users will be able to take advantage of broader integration with Google services, including Google Calendar, Tasks, Keep, and YouTube Music, to complete custom tasks.
Speaking of integrations, the Gemini app for web and mobile can tap into Google services through what Google calls “Gemini extensions.” Gemini currently integrates with Google Drive, Gmail, and YouTube to respond to queries like “Can you summarize my last three emails?” Later this year, Gemini will be able to do more with Google Calendar, Keep, Tasks, YouTube Music, and Utilities (which are Android-only apps that control features on your device, like timers and alarms, media controls, flashlight, volume, Wi-Fi, Bluetooth, and more).
Gemini Live Deep Voice Chat
A new experience called Gemini Live, available only to Gemini Advanced subscribers, allows users to have “in-depth” voice chats with their Gemini. It’s available in the Gemini app on mobile and on Pixel Buds Pro 2, and can be accessed even when the phone is locked.
With Gemini Live enabled, you can interrupt the chatbot as it speaks (in one of several new voices) to ask clarifying questions, and it will adapt to your speech patterns in real time. Later this year, Gemini will be able to see and respond to your surroundings via photos or videos captured by your smartphone camera.
Live is also designed to be a virtual coach of sorts, helping you rehearse for various activities, brainstorm ideas, and more. For example, Live can suggest which skills to highlight in an upcoming job or internship interview, and it can also provide advice on public speaking.
You can read our review of Gemini Live here Spoiler alert: we think this feature has a ways to go before it becomes super useful — but admittedly, it’s still early days.
Image generation via Imagen 3
Gemini users can generate artwork and images using Google’s built-in Imagen 3 model.
Google says Imagen 3 is able to more accurately understand textual cues translated into images and produce more “creative and detailed” images than its predecessor, Imagen 2. It also produces fewer artifacts and visual errors (at least that’s what Google says), and is currently the best Imagen model at rendering text.
Back in February, Google was forced to suspend Gemini’s ability to generate images of people after users complained about historical inaccuracies, but in August the company reintroduced the ability to generate images of people for some users, specifically those in English who were signed up for one of Google’s paid Gemini plans (such as Gemini Advanced), as part of a pilot program.
Teen Gemini
In June, Google launched the Gemini experience for teens, allowing students to sign up through their Google Workspace for Education school accounts.
Gemini for teens has “additional policies and safeguards,” including a tailored onboarding process and an “AI literacy guide” that (in Google’s words) “helps teens use AI responsibly.” Otherwise, it’s pretty much the same as the standard Gemini experience, even including a “review” feature that searches the web to see if Gemini’s answer is accurate.
Gemini in smart home devices
A growing number of Google-made devices use Gemini to power their functionality, from the Google TV streaming media player to the Pixel 9 and 9 Pro to the latest Nest learning thermostat.
On the Google TV streaming player, Gemini curates recommendations from your subscriptions based on your preferences, and summarizes reviews and even entire seasons of TV shows.
Gemini will soon enhance the conversational and analytical capabilities of the Google Assistant on the latest Nest thermostats (as well as Nest speakers, cameras, and smart displays).
Later this year, subscribers to Google’s Nest Aware program will get early access to new Gemini-powered features, such as AI descriptions of what Nest cameras are capturing, natural language video search, and recommended automations. Nest cameras will be able to understand what’s happening in a live video feed (e.g., when the dog is digging in the garden), while the companion Google Home app will present video based on descriptions and create device automations (e.g., “Did the kids leave their bikes in the driveway?”, “Ask my Nest thermostat to turn up the heat every Tuesday when I get home from get off work”).
Also later this year, Google Assistant will get some upgrades on Nest-branded and other smart home devices to make conversations feel more natural. An improved voice is coming soon, along with the ability to ask follow-up questions and "[more] easy back-and-forth communication."
What can the Gemini model do?
Because Gemini models are multimodal, they are able to perform a range of multimodal tasks, from transcribing speech to adding captions to images and videos in real time. Many of these capabilities have already reached product stage (as described in the previous section), and Google promises more results in the near future.
Of course, it’s hard to take the company’s word for it. Google’s initial launch of Bard was severely underwhelming. Just recently, it released a video claiming to show off Gemini’s capabilities, sparking controversy because the video was more or less a vision than an actual reality.
Additionally, Google doesn’t offer solutions to some of the potential problems with today’s generative AI technology, such as its encoded biases and tendency to fabricate content (i.e., hallucinate). Neither do its competitors, but this is something to keep in mind when considering using or paying for Gemini.
Assuming for the purposes of this article that Google’s recent announcement is true, here’s what the different tiers of Gemini can do today, and what they’ll be able to do once they reach their full potential:
What can you do with Gemini Ultra?
Google says that thanks to its multimodal capabilities, Gemini Ultra can be used to assist with tasks such as physics homework, solving problems step by step on a worksheet, and pointing out possible errors in completed answers.
Google says Ultra can also be applied to tasks such as identifying scientific papers that are relevant to a question. For example, the model can extract information from multiple papers and update a graph in one of the papers with more up-to-date data by generating the necessary formulas to recreate the graph.
Gemini Ultra technically supports image generation. However, this feature has not yet been incorporated into the production version of the model - perhaps because the mechanism is more complex than the way applications such as ChatGPT generate images. Instead of feeding prompts into an image generator (like DALL-E 3 in ChatGPT), Gemini outputs images "natively" without the need for intermediate steps.
Ultra is available as an API through Vertex AI (Google’s fully managed AI development platform) and AI Studio (Google’s web-based tools for app and platform developers).
“What Gemini Pro is capable of”
Google says Gemini Pro improves on LaMDA in reasoning, planning, and comprehension. The latest version, Gemini 1.5 Pro, powers Gemini apps for Gemini Premium subscribers and even surpasses Ultra’s performance in some areas.
The Gemini 1.5 Pro is an improvement over its predecessor, the Gemini 1.0 Pro, in several ways, perhaps most notably in the amount of data it can process. The Gemini 1.5 Pro can take in up to 1.4 million words, two hours of video, or 22 hours of audio and draw inferences or answer questions about that data (more or less).
Gemini 1.5 Pro became generally available in June on Vertex AI and AI Studio, along with a feature called code execution, which aims to reduce errors in the code generated by a model by repeatedly optimizing it in multiple steps. (Code execution also supports Gemini Flash.)
In Vertex AI, developers can tailor Gemini Pro to specific contexts and use cases through a fine-tuning or "rooting" process. For example, Pro (and other Gemini models) can be instructed to use data from third-party providers such as Moody's, Thomson Reuters, ZoomInfo and MSCI, or to pull information from enterprise datasets or Google searches instead of using its broader knowledge base. Gemini Pro can also connect to external third-party APIs to perform specific actions, such as automating back-office workflows.
AI Studio provides templates for creating structured chat prompts using Pro. Developers can control the scope of the model's creation and provide examples to give indications of tone and style - and can also adjust Pro's security settings.
The Vertex AI Agent Builder allows people to build Gemini-powered “agents” in Vertex AI. For example, a company could create an agent that analyzes past marketing campaigns to understand a brand’s style, and then use that knowledge to help generate new ideas that fit that style.
The Gemini flash is suitable for less demanding jobs.
For less demanding applications, there is Gemini Flash. The latest version is 1.5 Flash; Gemini application users who do not subscribe to Gemini Advanced can use this.
Flash is a branch of Gemini Pro, which is small and efficient, and is built for narrow, high-frequency generative AI workloads. Like Gemini Pro, Flash is multimodal, which means it can analyze audio, video, and images as well as text (but can only generate text). Google says Flash is particularly suitable for tasks such as summarization and chat applications, as well as subtitle generation for images and videos, and extracting data from long documents and tables.
Developers using Flash and Pro can choose to take advantage of context caching, which enables them to store large amounts of information (e.g., a knowledge base or database of research papers) in a cache that can be accessed quickly and relatively cheaply by Gemini models. However, context caching is an additional cost in addition to other Gemini model usage charges.
Gemini Nano can run on your phone.
The Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and is efficient enough to run directly on (some) devices without sending the task to a server somewhere. So far, Nano powers a few features on the Pixel 8 Pro, Pixel 8, Pixel 9 Pro, Pixel 9, and Samsung Galaxy S24, including the summarization feature in Recorder and the smart reply feature in Gboard.
The Voice Recorder app allows users to tap a button to record and transcribe audio with summaries of recorded conversations, interviews, presentations, and other audio clips powered by Gemini. The summaries are available even when users don’t have a signal or Wi-Fi connection. For privacy reasons, no data leaves their phone during the process.
The Nano is also in Gboard, Google's keyboard replacement, where it powers a feature called Smart Reply, which helps suggest what you might want to say next when chatting in messaging apps like WhatsApp.
In the Google Messages app on supported devices, Nano powers Magic Compose, which can compose messages in styles such as "Excited," "Formal," and "Lyrical."
Google says future versions of Android will use Nano to alert users to potential scams during phone calls. The new weather app on Pixel phones uses the Gemini Nano to generate tailored weather forecasts. And Google's accessibility service TalkBack uses Nano to create audio descriptions of objects for low-vision and blind users.
How much does the Gemini model cost?
Gemini 1.0 Pro (the first version of Gemini Pro), 1.5 Pro, and Flash — which can be used to build apps and services via Google's Gemini API — all have free options. But the free options have usage restrictions and exclude certain features, such as context caching and batching.
The Gemini model is pay-as-you-go. Here are the base pricing as of September 2024 — not including add-ons like context caching:
- Gemini 1.0 Pro: 50 cents per 1 million input tokens, $1.50 per 1 million output tokens
- Gemini 1.5 Pro: $3.50 per 1 million input tokens (for tips up to 128K tokens) or $7 per 1 million input tokens (for tips over 128K tokens); $10.50 per 1 million output tokens (for tips up to 128K tokens) or $21.00 per 1 million output tokens (for tips over 128K tokens)
- Gemini 1.5 Flash: 7.5 cents per 1 million input tokens (for tips up to 128K tokens), 15 cents per 1 million input tokens (for tips over 128K tokens); 30 cents per 1 million output tokens (for tips up to 128K tokens), 60 cents per 1 million output tokens (for tips over 128K tokens)
Tokens are subdivisions of raw data, like the syllables "fan," "tas," and "tic" in the word "fantastic"; 1 million tokens is roughly equivalent to 700,000 words. Input refers to the tokens fed into the model, while output refers to the tokens generated by the model.
Pricing for the Ultra has not yet been announced, and the Nano is still in the early access stage.
Is Gemini coming to iPhone?
It might,
Apple said it is in talks to use Gemini and other third-party models for multiple features in its Apple Intelligence suite. Following the 2024 Worldwide Developers Conference (WWDC) keynote, Apple Senior Vice President Craig Federighi confirmed plans to work with models including Gemini, but he did not reveal any other details.
This post was originally published on February 16, 2024 and has since been updated to include new information about Gemini and Google's related plans.




Comments
Post a Comment