Path parameters
- task_type
string Required The type of the inference task that the model will perform.
Value is
sparse_embedding
. - elser_inference_id
string Required The unique identifier of the inference endpoint.
Body
- chunking_settings
object - service
string Required Value is
elser
. - service_settings
object Required
PUT /_inference/{task_type}/{elser_inference_id}
Console
PUT _inference/sparse_embedding/my-elser-model
{
"service": "elser",
"service_settings": {
"num_allocations": 1,
"num_threads": 1
}
}
curl \
--request PUT 'http://api.example.com/_inference/{task_type}/{elser_inference_id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"service\": \"elser\",\n \"service_settings\": {\n \"num_allocations\": 1,\n \"num_threads\": 1\n }\n}"'
Request examples
A sparse embedding task
Run `PUT _inference/sparse_embedding/my-elser-model` to create an inference endpoint that performs a `sparse_embedding` task. The request will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
{
"service": "elser",
"service_settings": {
"num_allocations": 1,
"num_threads": 1
}
}
Run `PUT _inference/sparse_embedding/my-elser-model` to create an inference endpoint that performs a `sparse_embedding` task with adaptive allocations. When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load.
{
"service": "elser",
"service_settings": {
"adaptive_allocations": {
"enabled": true,
"min_number_of_allocations": 3,
"max_number_of_allocations": 10
},
"num_threads": 1
}
}
Response examples (200)
A successful response when creating an ELSER inference endpoint.
{
"inference_id": "my-elser-model",
"task_type": "sparse_embedding",
"service": "elser",
"service_settings": {
"num_allocations": 1,
"num_threads": 1
},
"task_settings": {}
}