Remarque
Models for organizations and repositories is in public preview and subject to change.
Models is a workspace lowering the barrier to enterprise-grade AI adoption. It helps you move beyond isolated experimentation by embedding AI development directly into familiar workflows. Models provides tools to test large language models (LLMs), refine prompts, evaluate outputs, and make informed decisions based on structured metrics. To get started, see Optimizing your AI-powered app with Models.
Models offers a set of features to support prompt iteration, evaluation, and integration for AI development.
- Prompt development: Start AI development directly in a structured editor that supports system instructions, test inputs, and variable configuration.
- Model comparison: Test multiple models side by side with identical prompts and inputs to experiment with different outputs.
- Evaluators: Use scoring metrics such as similarity, relevance, and groundedness to analyze outputs and track performance.
- Prompt configurations: Save prompt, model, and parameter settings as
.prompt.yml
files in your repository. This enables review, collaboration, and reproducibility. - Production integration: Use your saved configuration to build AI features or connect through SDKs and the Models REST API.
There are a few ways you can start using Models, depending on your role and needs.
To use the Models API, see Experimenting with AI models using the API.
To use Models, create a new repository or open an existing one. In the repository settings, click Models in the sidebar and enable the feature.
To use Models in your organization, an enterprise owner must first enable the feature. Organization owners can then configure which models are allowed.
See Managing your team's model usage.
Manage your prompt configurations stored in the repository. Each prompt is saved as a .prompt.yml
file, which defines the model, parameters, and test inputs. From here, you can create, edit, and organize prompts to support experimentation or production use.
Use the Comparisons view to evaluate the outputs of multiple prompt configurations in a consistent, test-driven workflow. Run tests across rows of input data and view evaluator scores for each configuration, such as similarity, relevance, and groundedness. This view is ideal for refining prompts, validating changes, and avoiding regressions.
Use the Playground to quickly explore models and test prompt ideas in real time. The Playground is ideal for early experimentation, helping you understand a model’s behavior, capabilities, and response style. You can interactively select models, adjust parameters, and compare responses side by side.
To ask questions and share feedback, see this Models discussion post.
To learn how others are using Models, visit the Community discussions for Models.