Path parameters

  • task_typestring Required

    The type of the inference task that the model will perform.

    Values are completion, rerank, space_embedding, or text_embedding.

  • The unique identifier of the inference endpoint.

application/json

Body

  • Hide chunking_settings attributes Show chunking_settings attributes object
    • The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

    • overlapnumber

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

    • The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

    • strategystring

      The chunking strategy: sentence or word.

  • servicestring Required

    Value is alibabacloud-ai-search.

  • service_settingsobject Required
    Hide service_settings attributes Show service_settings attributes object
    • api_keystring Required

      A valid API key for the AlibabaCloud AI Search API.

    • hoststring Required

      The name of the host address used for the inference task. You can find the host address in the API keys section of the documentation.

      External documentation
    • Hide rate_limit attribute Show rate_limit attribute object
    • service_idstring Required

      The name of the model service to use for the inference task. The following service IDs are available for the completion task:

      • ops-qwen-turbo
      • qwen-turbo
      • qwen-plus
      • qwen-max ÷ qwen-max-longcontext

      The following service ID is available for the rerank task:

      • ops-bge-reranker-larger

      The following service ID is available for the sparse_embedding task:

      • ops-text-sparse-embedding-001

      The following service IDs are available for the text_embedding task:

      ops-text-embedding-001 ops-text-embedding-zh-001 ops-text-embedding-en-001 ops-text-embedding-002

    • workspacestring Required

      The name of the workspace used for the inference task.

  • Hide task_settings attributes Show task_settings attributes object
    • For a sparse_embedding or text_embedding task, specify the type of input passed to the model. Valid values are:

      • ingest for storing document embeddings in a vector database.
      • search for storing embeddings of search queries run against a vector database to find relevant documents.
    • For a sparse_embedding task, it affects whether the token name will be returned in the response. It defaults to false, which means only the token ID will be returned in the response.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • Hide chunking_settings attributes Show chunking_settings attributes object
      • The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlapnumber

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategystring

        The chunking strategy: sentence or word.

    • servicestring Required

      The service type

    • service_settingsobject Required
    • inference_idstring Required

      The inference Id

    • task_typestring Required

      Values are text_embedding, rerank, completion, or sparse_embedding.

PUT /_inference/{task_type}/{alibabacloud_inference_id}
PUT _inference/completion/alibabacloud_ai_search_completion
{
    "service": "alibabacloud-ai-search",
    "service_settings": {
        "host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
        "api_key": "AlibabaCloud-API-Key",
        "service_id": "ops-qwen-turbo",
        "workspace" : "default"
    }
}
curl \
 --request PUT 'http://api.example.com/_inference/{task_type}/{alibabacloud_inference_id}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n    \"service\": \"alibabacloud-ai-search\",\n    \"service_settings\": {\n        \"host\" : \"default-j01.platform-cn-shanghai.opensearch.aliyuncs.com\",\n        \"api_key\": \"AlibabaCloud-API-Key\",\n        \"service_id\": \"ops-qwen-turbo\",\n        \"workspace\" : \"default\"\n    }\n}"'
Run `PUT _inference/completion/alibabacloud_ai_search_completion` to create an inference endpoint that performs a completion task.
{
    "service": "alibabacloud-ai-search",
    "service_settings": {
        "host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
        "api_key": "AlibabaCloud-API-Key",
        "service_id": "ops-qwen-turbo",
        "workspace" : "default"
    }
}
Run `PUT _inference/rerank/alibabacloud_ai_search_rerank` to create an inference endpoint that performs a rerank task.
{
    "service": "alibabacloud-ai-search",
    "service_settings": {
        "api_key": "AlibabaCloud-API-Key",
        "service_id": "ops-bge-reranker-larger",
        "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
        "workspace": "default"
    }
}
Run `PUT _inference/sparse_embedding/alibabacloud_ai_search_sparse` to create an inference endpoint that performs perform a sparse embedding task.
{
    "service": "alibabacloud-ai-search",
    "service_settings": {
        "api_key": "AlibabaCloud-API-Key",
        "service_id": "ops-text-sparse-embedding-001",
        "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
        "workspace": "default"
    }
}
Run `PUT _inference/text_embedding/alibabacloud_ai_search_embeddings` to create an inference endpoint that performs a text embedding task.
{
    "service": "alibabacloud-ai-search",
    "service_settings": {
        "api_key": "AlibabaCloud-API-Key",
        "service_id": "ops-text-embedding-001",
        "host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
        "workspace": "default"
    }
}