Create a VoyageAI inference endpoint | Elasticsearch API documentation

Path parameters

task_typestring Required
The type of the inference task that the model will perform.
Values are text_embedding or rerank.
voyageai_inference_idstring Required
The unique identifier of the inference endpoint.

application/json

Body

chunking_settingsobject
Chunking configuration object
Hide chunking_settings attributes Show chunking_settings attributes object
- max_chunk_sizenumber
  The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).
- overlapnumber
  The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.
- sentence_overlapnumber
  The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.
- strategystring
  The chunking strategy: sentence or word.
servicestring Required
Value is voyageai.
service_settingsobject Required
Hide service_settings attributes Show service_settings attributes object
- dimensionsnumber
  The number of dimensions for resulting output embeddings. This setting maps to output_dimension in the VoyageAI documentation. Only for the text_embedding task type.
  External documentation
- model_idstring Required
  The name of the model to use for the inference task. Refer to the VoyageAI documentation for the list of available text embedding and rerank models.
  External documentation
- rate_limitobject
  This setting helps to minimize the number of rate limit errors returned from the service.
  Hide rate_limit attribute Show rate_limit attribute object
  requests_per_minutenumber
  The number of requests allowed per minute. By default, the number of requests allowed per minute is set by each service as follows:
  alibabacloud-ai-search service: 1000
  anthropic service: 50
  azureaistudio service: 240
  azureopenai service and task type text_embedding: 1440
  azureopenai service and task type completion: 120
  cohere service: 10000
  elastic service and task type chat_completion: 240
  googleaistudio service: 360
  googlevertexai service: 30000
  hugging_face service: 3000
  jinaai service: 2000
  mistral service: 240
  openai service and task type text_embedding: 3000
  openai service and task type completion: 500
  voyageai service: 2000
  watsonxai service: 120
- embedding_typenumber
  The data type for the embeddings to be returned. This setting maps to output_dtype in the VoyageAI documentation. Permitted values: float, int8, bit. int8 is a synonym of byte in the VoyageAI documentation. bit is a synonym of binary in the VoyageAI documentation. Only for the text_embedding task type.
  External documentation
task_settingsobject
Hide task_settings attributes Show task_settings attributes object
- input_typestring
  Type of the input text. Permitted values: ingest (maps to document in the VoyageAI documentation), search (maps to query in the VoyageAI documentation). Only for the text_embedding task type.
- return_documentsboolean
  Whether to return the source documents in the response. Only for the rerank task type.
- top_knumber
  The number of most relevant documents to return. If not specified, the reranking results of all documents will be returned. Only for the rerank task type.
- truncationboolean
  Whether to truncate the input texts to fit within the context length.

Run `PUT _inference/text_embedding/voyageai-embeddings` to create an inference endpoint that performs a `text_embedding` task. The embeddings created by requests to this endpoint will have 512 dimensions.

{
    "service": "voyageai",
    "service_settings": {
        "model_id": "voyage-3-large",
        "dimensions": 512
    }
}

Run `PUT _inference/rerank/voyageai-rerank` to create an inference endpoint that performs a `rerank` task.

{
    "service": "voyageai",
    "service_settings": {
        "model_id": "rerank-2"
    }
}

Path parameters

Body

Responses