Perform chat completion inference | Elasticsearch API documentation

Path parameters

inference_idstring Required
The inference Id

Query parameters

timeoutstring
Specifies the amount of time to wait for the inference request to complete.
Values are -1 or 0.

application/json

BodyRequired

messagesarray[object] Required
A list of objects representing the conversation. Requests should generally only add new messages from the user (role user). The other message roles (assistant, system, or tool) should generally only be copied from the response to a previous completion request, such that the messages array is built up throughout a conversation.
An object representing part of the conversation.
Hide messages attributes Show messages attributes object
- contentstring | array[object]
  One of:
  MessageContentstring MessageContentarray[object]
  An object style representation of a single portion of a conversation.
  Hide attributes Show attributes object
  textstring Required
  The text content.
  typestring Required
  The type of content.
- rolestring Required
  The role of the message author. Valid values are user, assistant, system, and tool.
- tool_call_idstring
- tool_callsarray[object]
  Only for assistant role messages. The tool calls generated by the model. If it's specified, the content field is optional. Example:
  { "tool_calls": [ { "id": "call_KcAjWtAww20AihPHphUh46Gd", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"location\":\"Boston, MA\"}" } } ] }
  A tool call generated by the model.
  Hide tool_calls attributes Show tool_calls attributes object
  idstring Required
  functionobject Required
  The function that the model called.
  Hide function attributes Show function attributes object
  argumentsstring Required
  The arguments to call the function with in JSON format.
  namestring Required
  The name of the function to call.
  typestring Required
  The type of the tool call.
modelstring
The ID of the model to use.
max_completion_tokensnumber
The upper bound limit for the number of tokens that can be generated for a completion request.
stoparray[string]
A sequence of strings to control when the model should stop generating additional tokens.
temperaturenumber
The sampling temperature to use.
tool_choicestring | object
One of:
CompletionToolTypestring CompletionToolChoiceobject
Controls which tool is called by the model.
Hide attributes Show attributes
typestring Required
The type of the tool.
functionobject Required
The tool choice function.
Hide function attribute Show function attribute object
namestring Required
The name of the function to call.
toolsarray[object]
A list of tools that the model can call. Example:
```
{
  "tools": [
      {
          "type": "function",
          "function": {
              "name": "get_price_of_item",
              "description": "Get the current price of an item",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "item": {
                          "id": "12345"
                      },
                      "unit": {
                          "type": "currency"
                      }
                  }
              }
          }
      }
  ]
}
```
A list of tools that the model can call.
Hide tools attributes Show tools attributes object
- typestring Required
  The type of tool.
- functionobject Required
  The completion tool function definition.
  Hide function attributes Show function attributes object
  descriptionstring
  A description of what the function does. This is used by the model to choose when and how to call the function.
  namestring Required
  The name of the function.
  parametersobject
  The parameters the functional accepts. This should be formatted as a JSON object.
  strictboolean
  Whether to enable schema adherence when generating the function call.
top_pnumber
Nucleus sampling, an alternative to sampling with temperature.

Responses

200 application/json

POST /_inference/chat_completion/{inference_id}/_stream

POST _inference/chat_completion/openai-completion/_stream
{
  "model": "gpt-4o",
  "messages": [
      {
          "role": "user",
          "content": "What is Elastic?"
      }
  ]
}

resp = client.inference.chat_completion_unified(
    inference_id="openai-completion",
    chat_completion_request={
        "model": "gpt-4o",
        "messages": [
            {
                "role": "user",
                "content": "What is Elastic?"
            }
        ]
    },
)

const response = await client.inference.chatCompletionUnified({
  inference_id: "openai-completion",
  chat_completion_request: {
    model: "gpt-4o",
    messages: [
      {
        role: "user",
        content: "What is Elastic?",
      },
    ],
  },
});

response = client.inference.chat_completion_unified(
  inference_id: "openai-completion",
  body: {
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What is Elastic?"
      }
    ]
  }
)

$resp = $client->inference()->chatCompletionUnified([
    "inference_id" => "openai-completion",
    "body" => [
        "model" => "gpt-4o",
        "messages" => array(
            [
                "role" => "user",
                "content" => "What is Elastic?",
            ],
        ),
    ],
]);

curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"model":"gpt-4o","messages":[{"role":"user","content":"What is Elastic?"}]}' "$ELASTICSEARCH_URL/_inference/chat_completion/openai-completion/_stream"

Request examples

Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion on the example question with .

{
  "model": "gpt-4o",
  "messages": [
      {
          "role": "user",
          "content": "What is Elastic?"
      }
  ]
}

Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion using an Assistant message with `tool_calls`.

{
  "messages": [
      {
          "role": "assistant",
          "content": "Let's find out what the weather is",
          "tool_calls": [ 
              {
                  "id": "call_KcAjWtAww20AihPHphUh46Gd",
                  "type": "function",
                  "function": {
                      "name": "get_current_weather",
                      "arguments": "{\"location\":\"Boston, MA\"}"
                  }
              }
          ]
      },
      { 
          "role": "tool",
          "content": "The weather is cold",
          "tool_call_id": "call_KcAjWtAww20AihPHphUh46Gd"
      }
  ]
}

Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion using a User message with `tools` and `tool_choice`.

{
  "messages": [
      {
          "role": "user",
          "content": [
              {
                  "type": "text",
                  "text": "What's the price of a scarf?"
              }
          ]
      }
  ],
  "tools": [
      {
          "type": "function",
          "function": {
              "name": "get_current_price",
              "description": "Get the current price of a item",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "item": {
                          "id": "123"
                      }
                  }
              }
          }
      }
  ],
  "tool_choice": {
      "type": "function",
      "function": {
          "name": "get_current_price"
      }
  }
}

Response examples (200)

A successful response when performing a chat completion task using a User message with `tools` and `tool_choice`.

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":Elastic"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":" is"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

(...)

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":16,"total_tokens":44}}} 

event: message
data: [DONE]

Path parameters

Query parameters

BodyRequired

contentstring | array[object]

tool_choicestring | object

Responses