Path parameters

  • inference_idstring Required

    The inference Id

Query parameters

  • timeoutstring

    Specifies the amount of time to wait for the inference request to complete.

    Values are -1 or 0.

application/json

BodyRequired

  • messagesarray[object] Required

    A list of objects representing the conversation. Requests should generally only add new messages from the user (role user). The other message roles (assistant, system, or tool) should generally only be copied from the response to a previous completion request, such that the messages array is built up throughout a conversation.

    Hide messages attributes Show messages attributes object
    • contentstring | array[object]

      One of:
      Hide attributes Show attributes object
      • textstring Required

        The text content.

      • typestring Required

        The type of content.

    • rolestring Required

      The role of the message author.

    • tool_callsarray[object]

      The tool calls generated by the model.

      Hide tool_calls attributes Show tool_calls attributes object
      • idstring Required
      • functionobject Required
        Hide function attributes Show function attributes object
        • argumentsstring Required

          The arguments to call the function with in JSON format.

        • namestring Required

          The name of the function to call.

      • typestring Required

        The type of the tool call.

  • modelstring

    The ID of the model to use.

  • The upper bound limit for the number of tokens that can be generated for a completion request.

  • stoparray[string]

    A sequence of strings to control when the model should stop generating additional tokens.

  • The sampling temperature to use.

  • tool_choicestring | object

    One of:
  • toolsarray[object]

    A list of tools that the model can call.

    Hide tools attributes Show tools attributes object
    • typestring Required

      The type of tool.

    • functionobject Required
      Hide function attributes Show function attributes object
      • A description of what the function does. This is used by the model to choose when and how to call the function.

      • namestring Required

        The name of the function.

      • The parameters the functional accepts. This should be formatted as a JSON object.

      • strictboolean

        Whether to enable schema adherence when generating the function call.

  • top_pnumber

    Nucleus sampling, an alternative to sampling with temperature.

Responses

  • 200 application/json
POST /_inference/chat_completion/{inference_id}/_stream
curl \
 --request POST 'http://api.example.com/_inference/chat_completion/{inference_id}/_stream' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"model\": \"gpt-4o\",\n  \"messages\": [\n      {\n          \"role\": \"user\",\n          \"content\": \"What is Elastic?\"\n      }\n  ]\n}"'
Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion on the example question with .
{
  "model": "gpt-4o",
  "messages": [
      {
          "role": "user",
          "content": "What is Elastic?"
      }
  ]
}
Run `POST POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion using an Assistant message with `tool_calls`.
{
  "messages": [
      {
          "role": "assistant",
          "content": "Let's find out what the weather is",
          "tool_calls": [ 
              {
                  "id": "call_KcAjWtAww20AihPHphUh46Gd",
                  "type": "function",
                  "function": {
                      "name": "get_current_weather",
                      "arguments": "{\"location\":\"Boston, MA\"}"
                  }
              }
          ]
      },
      { 
          "role": "tool",
          "content": "The weather is cold",
          "tool_call_id": "call_KcAjWtAww20AihPHphUh46Gd"
      }
  ]
}
Run `POST POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion using a User message with `tools` and `tool_choice`.
{
  "messages": [
      {
          "role": "user",
          "content": [
              {
                  "type": "text",
                  "text": "What's the price of a scarf?"
              }
          ]
      }
  ],
  "tools": [
      {
          "type": "function",
          "function": {
              "name": "get_current_price",
              "description": "Get the current price of a item",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "item": {
                          "id": "123"
                      }
                  }
              }
          }
      }
  ],
  "tool_choice": {
      "type": "function",
      "function": {
          "name": "get_current_price"
      }
  }
}
Response examples (200)
A successful response when performing a chat completion task using a User message with `tools` and `tool_choice`.
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":Elastic"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":" is"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

(...)

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":16,"total_tokens":44}}} 

event: message
data: [DONE]