Create an inference endpoint | Elasticsearch API documentation

Path parameters

inference_idstring Required
The inference Id

application/json

BodyRequired

chunking_settingsobject
Chunking configuration object
Hide chunking_settings attributes Show chunking_settings attributes object
- max_chunk_sizenumber
  The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).
- overlapnumber
  The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.
- sentence_overlapnumber
  The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.
- strategystring
  The chunking strategy: sentence or word.
servicestring Required
The service type
service_settingsobject Required
task_settingsobject

Responses

200 application/json
Hide response attributes Show response attributes object
Represents an inference endpoint as returned by the GET API
- chunking_settingsobject
  Chunking configuration object
  Hide chunking_settings attributes Show chunking_settings attributes object
  max_chunk_sizenumber
  The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).
  overlapnumber
  The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.
  sentence_overlapnumber
  The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.
  strategystring
  The chunking strategy: sentence or word.
- servicestring Required
  The service type
- service_settingsobject Required
- task_settingsobject
- inference_idstring Required
  The inference Id
- task_typestring Required
  Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.

PUT /_inference/{inference_id}

PUT _inference/rerank/my-rerank-model
{
 "service": "cohere",
 "service_settings": {
   "model_id": "rerank-english-v3.0",
   "api_key": "{{COHERE_API_KEY}}"
 }
}

resp = client.inference.put(
    task_type="rerank",
    inference_id="my-rerank-model",
    inference_config={
        "service": "cohere",
        "service_settings": {
            "model_id": "rerank-english-v3.0",
            "api_key": "{{COHERE_API_KEY}}"
        }
    },
)

const response = await client.inference.put({
  task_type: "rerank",
  inference_id: "my-rerank-model",
  inference_config: {
    service: "cohere",
    service_settings: {
      model_id: "rerank-english-v3.0",
      api_key: "{{COHERE_API_KEY}}",
    },
  },
});

response = client.inference.put(
  task_type: "rerank",
  inference_id: "my-rerank-model",
  body: {
    "service": "cohere",
    "service_settings": {
      "model_id": "rerank-english-v3.0",
      "api_key": "{{COHERE_API_KEY}}"
    }
  }
)

$resp = $client->inference()->put([
    "task_type" => "rerank",
    "inference_id" => "my-rerank-model",
    "body" => [
        "service" => "cohere",
        "service_settings" => [
            "model_id" => "rerank-english-v3.0",
            "api_key" => "{{COHERE_API_KEY}}",
        ],
    ],
]);

curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"cohere","service_settings":{"model_id":"rerank-english-v3.0","api_key":"{{COHERE_API_KEY}}"}}' "$ELASTICSEARCH_URL/_inference/rerank/my-rerank-model"

Request example

An example body for a `PUT _inference/rerank/my-rerank-model` request.

{
 "service": "cohere",
 "service_settings": {
   "model_id": "rerank-english-v3.0",
   "api_key": "{{COHERE_API_KEY}}"
 }
}