LLM
Welcome to the LLM documentation. This section provides detailed information about the intelligent AI gateway for language models.
Overview
LLM provides a powerful gateway for routing AI requests to the optimal language models based on capabilities, cost, and performance requirements. It enables you to leverage the best AI models for each specific task without being locked into a single provider.
Intelligent AI Gateway
The LLM gateway intelligently routes requests to the most appropriate model based on the task requirements and model capabilities. This allows you to:
- Access multiple AI models through a single, unified interface
- Automatically select the best model for each specific task
- Optimize for cost, performance, or capabilities based on your requirements
- Avoid vendor lock-in by abstracting away model-specific implementation details
Model Selection
The gateway can select models based on various criteria:
- Capabilities: Choose models that support specific features like code generation, reasoning, or vision
- Cost: Optimize for the lowest cost model that meets your requirements
- Performance: Select models based on latency, throughput, or other performance metrics
- Quality: Use models that provide the highest quality results for your specific use case
This intelligent routing happens behind the scenes, allowing you to focus on your application logic rather than model selection and integration.
Endpoints
Generate Text
POST /generate
Generates text based on the provided prompt.
Request Body
{
"prompt": "Write a short story about a robot learning to paint.",
"model": "openai/gpt-4",
"maxTokens": 500,
"temperature": 0.7,
"topP": 0.9,
"frequencyPenalty": 0,
"presencePenalty": 0
}
Response
{
"id": "generation_123",
"text": "In a small studio apartment overlooking the city, Robot Unit 7 stood motionless before a blank canvas...",
"model": "openai/gpt-4",
"promptTokens": 10,
"completionTokens": 487,
"totalTokens": 497,
"createdAt": "2023-01-01T00:00:00Z"
}
Chat Completion
POST /chat
Generates a response to a conversation.
Request Body
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"model": "anthropic/claude-3-opus",
"maxTokens": 100,
"temperature": 0.5,
"topP": 0.9,
"frequencyPenalty": 0,
"presencePenalty": 0
}
Response
{
"id": "chat_123",
"message": {
"role": "assistant",
"content": "The capital of France is Paris. It's known as the 'City of Light' and is famous for landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral."
},
"model": "anthropic/claude-3-opus",
"promptTokens": 25,
"completionTokens": 35,
"totalTokens": 60,
"createdAt": "2023-01-01T00:00:00Z"
}
Embeddings
POST /embeddings
Generates embeddings for the provided text.
Request Body
{
"text": "The quick brown fox jumps over the lazy dog.",
"model": "openai/text-embedding-3-large"
}
Response
{
"id": "embedding_123",
"embedding": [0.1, 0.2, 0.3, ...],
"model": "openai/text-embedding-3-large",
"tokens": 10,
"createdAt": "2023-01-01T00:00:00Z"
}
List Models
GET /models
Returns a list of available language models.
Response
{
"data": [
{
"id": "openai/gpt-4",
"provider": "openai",
"name": "gpt-4",
"capabilities": ["chat", "code", "reasoning"],
"maxTokens": 8192
},
{
"id": "anthropic/claude-3-opus",
"provider": "anthropic",
"name": "claude-3-opus",
"capabilities": ["chat", "reasoning", "vision"],
"maxTokens": 100000
},
{
"id": "openai/text-embedding-3-large",
"provider": "openai",
"name": "text-embedding-3-large",
"capabilities": ["embeddings"],
"dimensions": 3072
}
]
}
Get Model
GET /models/:id
Returns details for a specific model.
Response
{
"id": "openai/gpt-4",
"provider": "openai",
"name": "gpt-4",
"capabilities": ["chat", "code", "reasoning"],
"maxTokens": 8192,
"pricing": {
"input": 0.00003,
"output": 0.00006
},
"createdAt": "2023-01-01T00:00:00Z"
}
Function Calling
POST /function-calling
Executes a function call using a language model.
Request Body
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What's the weather like in New York?"
}
],
"functions": [
{
"name": "getWeather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use"
}
},
"required": ["location"]
}
}
],
"model": "openai/gpt-4",
"temperature": 0.5
}
Response
{
"id": "function_call_123",
"functionCall": {
"name": "getWeather",
"arguments": {
"location": "New York, NY",
"unit": "fahrenheit"
}
},
"model": "openai/gpt-4",
"promptTokens": 50,
"completionTokens": 20,
"totalTokens": 70,
"createdAt": "2023-01-01T00:00:00Z"
}
Error Handling
The API uses standard HTTP status codes to indicate the success or failure of requests. For more information about error codes and how to handle errors, see the Error Handling documentation.
Rate Limiting
API requests are subject to rate limiting to ensure fair usage and system stability. For more information about rate limits and how to handle rate-limiting responses, see the Rate Limiting documentation.
Webhooks
You can configure webhooks to receive notifications about LLM events. For more information about webhooks, see the Webhooks documentation.
SDKs
We provide SDKs for popular programming languages to make it easier to integrate with the LLM API: