Get started with the Gemini API in web apps

This tutorial demonstrates how to access the Gemini API directly from your web app using the Google AI JavaScript SDK. You can use this SDK if you don't want to work directly with REST APIs or server-side code (like Node.js) for accessing Gemini models in your web app.

In this tutorial, you'll learn how to do the following:

In addition, this tutorial contains sections about advanced use cases (like counting tokens) as well as options for controlling content generation.

Prerequisites

This tutorial assumes that you're familiar with using JavaScript to develop web apps. This guide is framework-independent.

To complete this tutorial, make sure that your development environment meets the following requirements:

  • (Optional) Node.js
  • Modern web browser

Set up your project

Before calling the Gemini API, you need to set up your project, which includes obtaining an API key and initializing the model.

Set up your API key

To use the Gemini API, you'll need an API key. If you don't already have one, create a key in Google AI Studio.

Get an API key

Secure your API key

It's strongly recommended that you do not check an API key into your version control system. Instead, you should pass your API key to your app right before initializing the model.

All the snippets in this tutorial assume that you're accessing your API key as a global constant.

Initialize the Generative Model

Before you can make any API calls, you need to import and initialize the Generative Model.

<html>
  <body>
    <!-- ... Your HTML and CSS -->

    <script type="importmap">
      {
        "imports": {
          "@google/generative-ai": "https://esm.run/@google/generative-ai"
        }
      }
    </script>
    <script type="module">
      import { GoogleGenerativeAI } from "@google/generative-ai";

      // Fetch your API_KEY
      const API_KEY = "...";

      // Access your API key (see "Set up your API key" above)
      const genAI = new GoogleGenerativeAI(API_KEY);

      // ...

      const model = genAI.getGenerativeModel({ model: "MODEL_NAME"});

      // ...
    </script>
  </body>
</html>

When specifying a model, note the following:

  • Use a model that's specific to your use case (for example, gemini-pro-vision is for multimodal input). Within this guide, the instructions for each implementation list the recommended model for each use case.

Implement common use cases

Now that your project is set up, you can explore using the Gemini API to implement different use cases:

Generate text from text-only input

When the prompt input includes only text, use the gemini-pro model with the generateContent method to generate text output:

import { GoogleGenerativeAI } from "@google/generative-ai";

// Access your API key (see "Set up your API key" above)
const genAI = new GoogleGenerativeAI(API_KEY);

async function run() {
  // For text-only input, use the gemini-pro model
  const model = genAI.getGenerativeModel({ model: "gemini-pro"});

  const prompt = "Write a story about a magic backpack."

  const result = await model.generateContent(prompt);
  const response = await result.response;
  const text = response.text();
  console.log(text);
}

run();

Generate text from text-and-image input (multimodal)

Gemini provides a multimodal model (gemini-pro-vision), so you can input both text and images. Make sure to review the image requirements for input.

When the prompt input includes both text and images, use the gemini-pro-vision model with the generateContent method to generate text output:

import { GoogleGenerativeAI } from "@google/generative-ai";

// Access your API key (see "Set up your API key" above)
const genAI = new GoogleGenerativeAI(API_KEY);

// Converts a File object to a GoogleGenerativeAI.Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // For text-and-images input (multimodal), use the gemini-pro-vision model
  const model = genAI.getGenerativeModel({ model: "gemini-pro-vision" });

  const prompt = "What's different between these pictures?";

  const fileInputEl = document.querySelector("input[type=file]");
  const imageParts = await Promise.all(
    [...fileInputEl.files].map(fileToGenerativePart)
  );

  const result = await model.generateContent([prompt, ...imageParts]);
  const response = await result.response;
  const text = response.text();
  console.log(text);
}

run();

Build multi-turn conversations (chat)

Using Gemini, you can build freeform conversations across multiple turns. The SDK simplifies the process by managing the state of the conversation, so unlike with generateContent, you don't have to store the conversation history yourself.

To build a multi-turn conversation (like chat), use the gemini-pro model, and initialize the chat by calling startChat(). Then use sendMessage() to send a new user message, which will also append the message and the response to the chat history.

There are two possible options for role associated with the content in a conversation:

  • user: the role which provides the prompts. This value is the default for sendMessage calls, and the function will throw an exception if a different role is passed.

  • model: the role which provides the responses. This role can be used when calling startChat() with existing history.

import { GoogleGenerativeAI } from "@google/generative-ai";

// Access your API key (see "Set up your API key" above)
const genAI = new GoogleGenerativeAI(API_KEY);

async function run() {
  // For text-only input, use the gemini-pro model
  const model = genAI.getGenerativeModel({ model: "gemini-pro"});

  const chat = model.startChat({
    history: [
      {
        role: "user",
        parts: [{ text: "Hello, I have 2 dogs in my house." }],
      },
      {
        role: "model",
        parts: [{ text: "Great to meet you. What would you like to know?" }],
      },
    ],
    generationConfig: {
      maxOutputTokens: 100,
    },
  });

  const msg = "How many paws are in my house?";

  const result = await chat.sendMessage(msg);
  const response = await result.response;
  const text = response.text();
  console.log(text);
}

run();

Use streaming for faster interactions

By default, the model returns a response after completing the entire generation process. You can achieve faster interactions by not waiting for the entire result, and instead use streaming to handle partial results.

The following example shows how to implement streaming with the generateContentStream method to generate text from a text-and-image input prompt.

// ...

const result = await model.generateContentStream([prompt, ...imageParts]);

let text = '';
for await (const chunk of result.stream) {
  const chunkText = chunk.text();
  console.log(chunkText);
  text += chunkText;
}

// ...

You can use a similar approach for text-only input and chat use cases.

// Use streaming with text-only input
const result = await model.generateContentStream(prompt);

See chat example above for how to instantiate a chat.

// Use streaming with multi-turn conversations (like chat)
const result = await chat.sendMessageStream(msg);

Implement advanced use cases

The common use cases described in the previous section of this tutorial help you become comfortable with using the Gemini API. This section describes some use cases that might be considered more advanced.

Count tokens

When using long prompts, it might be useful to count tokens before sending any content to the model. The following examples show how to use countTokens() for various use cases:

// For text-only input
const { totalTokens } = await model.countTokens(prompt);
// For text-and-image input (multimodal)
const { totalTokens } = await model.countTokens([prompt, ...imageParts]);
// For multi-turn conversations (like chat)
const history = await chat.getHistory();
const msgContent = { role: "user", parts: [{ text: msg }] };
const contents = [...history, msgContent];
const { totalTokens } = await model.countTokens({ contents });

Options to control content generation

You can control content generation by configuring model parameters and by using safety settings.

Configure model parameters

Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. Learn more about Model parameters. The configuration is maintained for the lifetime of your model instance.

const generationConfig = {
  stopSequences: ["red"],
  maxOutputTokens: 200,
  temperature: 0.9,
  topP: 0.1,
  topK: 16,
};

const model = genAI.getGenerativeModel({ model: "MODEL_NAME",  generationConfig });

Use safety settings

You can use safety settings to adjust the likelihood of getting responses that may be considered harmful. By default, safety settings block content with medium and/or high probability of being unsafe content across all dimensions. Learn more about Safety settings.

Here's how to set one safety setting:

import { HarmBlockThreshold, HarmCategory } from "@google/generative-ai";

// ...

const safetySettings = [
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_ONLY_HIGH,
  },
];

const model = genAI.getGenerativeModel({ model: "MODEL_NAME", safetySettings });

You can also set more than one safety setting:

const safetySettings = [
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_ONLY_HIGH,
  },
  {
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
];

What's next

  • Prompt design is the process of creating prompts that elicit the desired response from language models. Writing well structured prompts is an essential part of ensuring accurate, high quality responses from a language model. Learn about best practices for prompt writing.

  • Gemini offers several model variations to meet the needs of different use cases, such as input types and complexity, implementations for chat or other dialog language tasks, and size constraints. Learn about the available Gemini models.

  • Gemini offers options for requesting rate limit increases. The rate limit for Gemini Pro models is 60 requests per minute (RPM).