Model tuning guide

Prompt design strategies such as few shot prompting may not always produce the results you need. Use model tuning to improve a model's performance on specific tasks or help the model adhere to specific output requirements when instructions aren't sufficient and you have a set of examples that demonstrate the outputs you want.

This page provides guidance on tuning the text model behind the PaLM API text service.

How model tuning works

The goal of model tuning is to further improve the performance of the model for your specific task. Model tuning works by providing the model with a training dataset containing many examples of the task. For niche tasks, you can get significant improvements in model performance by tuning the model on a modest number of examples.

Your training data should be structured as pairs of input and output examples. You can also tune models using CSV or spreadsheet files directly in MakerSuite. The goal is to teach the model to mimic the wanted behavior or task, by giving it many examples illustrating that behavior or task.

When you run a tuning job, the model learns additional parameters that help it encode the necessary information to perform the wanted task or learn the wanted behavior. These parameters can then be used at inference time. The output of the tuning job is a new model, which is effectively a combination of the newly learned parameters, and the original model.

Supported models

The following foundation model supports model tuning:

  • text-bison-001

Workflow for model tuning

The model tuning workflow is as follows:

  1. Prepare your dataset.
  2. Import the dataset if you're using Makersuite.
  3. Start a tuning job.

After model tuning completes, the name of your tuned model is displayed. You can also select it in MakerSuite as the model to use when creating a new prompt.

Prepare your dataset

Before you can start tuning, you need a dataset to tune the model with. For best performance, the examples in the dataset should be of high quality, diverse and representative of real inputs and outputs.


The examples included in your dataset should match your expected production traffic. If your dataset contains specific formatting, keywords, instructions, or information, the production data should be formatted in the same way and contain the same instructions.

For example, if the examples in your dataset include a "question:" and a "context:", production traffic should also be formatted to include a "question:" and a "context:" in the same order as it appears in the dataset examples. If you exclude the context, the model can't recognize the pattern, even if the exact question was in an example in the dataset.

Adding a prompt or preamble to each example in your dataset can also help improve the performance of the tuned model. Note, if a prompt or preamble is included in your dataset, it should also be included in the prompt to the tuned model at inference time.

Size recommendation

You can tune a model with a relatively small datasets. You should start getting better results with 100+ examples. The following table shows the recommended dataset size for tuning the text model for various common tasks:

Task No. of examples in dataset
Classification 100+
Summarization 100-500+
Document search 100+

Upload tuning dataset

Data is either passed inline using the API or through files uploaded in MakerSuite. Files uploaded to MakerSuite are deleted once the tuning is complete.

Use the Import button to import data from a file or choose a data prompt with examples to import as your tuning dataset.

Client library

To use the client library, provide the data file in the createTunedModel call. File size limit is 4MB. See the tuning quickstart with Python to get started.


To call the REST API using Curl, provide training examples in JSON format to the training_data argument. See the tuning quickstart with Curl to get started.

Create a model tuning job

When creating a tuning job, you can specify the following hyperparameters:

  • Epochs - A full training pass over the entire training set such that each example has been processed once.
  • Batch size - The set of examples used in one training iteration . The batch size determines the number of examples in a batch.
  • Learning rate - A floating-point number that tells the algorithm how strongly to adjust the model parameters on each iteration. For example, a learning rate of 0.3 would adjust weights and biases three times more powerfully than a learning rate of 0.1. High and low learning rates have their own unique trade-offs and should be adjusted based on your use case.

The following table shows the recommended configurations for tuning a foundation model:

Hyperparameter Default value Recommended adjustments
Epoch 10 If the loss starts to plateau before 10 epochs, use a smaller value.
If the loss is converging and doesn't seem to plateau, use a higher value.
Batch size 16
Learning rate 2e-2 Use a smaller value for smaller datasets.

The loss curve shows how much the model's prediction deviates from the ideal predictions in the training examples after each epoch. Ideally you want to stop training at the lowest point in the curve right before it plateaus. For example, the graph below shows the loss curve plateauing at about epoch 4-6 which means you can set the Epoch parameter to 4 and still get the same performance.

Loss curve

Check tuning job status

You can check the status of your tuning job in MakerSuite UI under the My Library tab or using the metadata property of the tuned model in the PaLM API.

Troubleshoot errors

This section includes tips on how to resolve errors you may encounter while creating your tuned model.


Tuning using the API and client library requires user authentication. An API key alone is not sufficient. If you see a 'PermissionDenied: 403 Request had insufficient authentication scopes' error, you need to set up user authentication.

To configure test OAuth credentials for Python refer to this tutorial.

Canceled models

You can cancel a model tuning job any time before the job is finished. However, the inference performance of a canceled model is unpredictable, particularly if the tuning job is canceled early in the training. If you canceled because you want to stop the training at an earlier epoch, you should create a new tuning job and set the epoch to a lower value.

What's next