REST Resource: corpora.documents.chunks

Resource: Chunk

A Chunk is a subpart of a Document that is treated as an independent unit for the purposes of vector representation and storage. A Corpus can have a maximum of 1 million Chunks.

JSON representation
{
  "name": string,
  "data": {
    object (ChunkData)
  },
  "customMetadata": [
    {
      object (CustomMetadata)
    }
  ],
  "createTime": string,
  "updateTime": string,
  "state": enum (State)
}
Fields
name

string

Immutable. Identifier. The Chunk resource name. The ID (name excluding the "corpora/*/documents/*/chunks/" prefix) can contain up to 40 characters that are lowercase alphanumeric or dashes (-). The ID cannot start or end with a dash. If the name is empty on create, a random 12-character unique ID will be generated. Example: corpora/{corpus_id}/documents/{document_id}/chunks/123a456b789c

data

object (ChunkData)

Required. The content for the Chunk, such as the text string. The maximum number of tokens per chunk is 2043.

customMetadata[]

object (CustomMetadata)

Optional. User provided custom metadata stored as key-value pairs. The maximum number of CustomMetadata per chunk is 20.

createTime

string (Timestamp format)

Output only. The Timestamp of when the Chunk was created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

updateTime

string (Timestamp format)

Output only. The Timestamp of when the Chunk was last updated.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

state

enum (State)

Output only. Current state of the Chunk.

ChunkData

Extracted data that represents the Chunk content.

JSON representation
{

  // Union field data can be only one of the following:
  "stringValue": string
  // End of list of possible types for union field data.
}
Fields

Union field data.

data can be only one of the following:

stringValue

string

The Chunk content as a string. The maximum number of tokens per chunk is 2043.

State

States for the lifecycle of a Chunk.

Enums
STATE_UNSPECIFIED The default value. This value is used if the state is omitted.
STATE_PENDING_PROCESSING Chunk is being processed (embedding and vector storage).
STATE_ACTIVE Chunk is processed and available for querying.
STATE_FAILED Chunk failed processing.

Methods

batchCreate

Batch create Chunks.

batchDelete

Batch delete Chunks.

batchUpdate

Batch update Chunks.

create

Creates a Chunk.

delete

Deletes a Chunk.

get

Gets information about a specific Chunk.

list

Lists all Chunks in a Document.

patch

Updates a Chunk.