Chat Completions API

The Chat Completions API is the primary interface for interacting with language models. Send a list of messages and get an AI-generated response. Compatible with OpenAI's API format.

Endpoint

POST https://arvae.ai/api/v1/chat/completions

✅ OpenAI Compatible

This endpoint is fully compatible with OpenAI's Chat Completions API. You can use existing OpenAI SDKs by simply changing the base URL.

client = OpenAI(base_url="https://arvae.ai/api/v1", api_key="your-key")

Authentication

Include your API key in the Authorization header as a Bearer token:

Authorization Header
1curl -X POST https://arvae.ai/api/v1/chat/completions \
2  -H "Content-Type: application/json" \
3  -H "Authorization: Bearer YOUR_API_KEY" \
4  -d '{"model": "openai/gpt-3.5-turbo", "messages": [...]}'

Security: Never expose your API key in client-side code. Always make requests from your backend or use environment variables.

Request Format

Example Request
1{
2  "model": "openai/gpt-3.5-turbo",
3  "messages": [
4    {
5      "role": "system",
6      "content": "You are a helpful assistant."
7    },
8    {
9      "role": "user", 
10      "content": "What is the capital of France?"
11    }
12  ],
13  "max_tokens": 150,
14  "temperature": 0.7,
15  "stream": false
16}

Parameters

Required Parameters

modelstring

The AI model to use for completion. See available models.

Examples: "openai/gpt-3.5-turbo", "anthropic/claude-3.5-sonnet", "hanooman/hanooman-everest"
messagesarray

Array of message objects that form the conversation. Each message has a role ("system", "user", or "assistant") and content.

[{"role": "user", "content": "Hello!"}, {"role": "assistant", "content": "Hi there!"}]

Optional Parameters

max_tokensinteger

Maximum number of tokens to generate. Default varies by model (typically 1024-4096).

Range: 1 to model's maximum context length
temperaturenumber

Controls randomness. Higher values make output more creative, lower values more focused and deterministic.

Range: 0.0 to 2.0, Default: 1.0
top_pnumber

Nucleus sampling parameter. Alternative to temperature for controlling randomness.

Range: 0.0 to 1.0, Default: 1.0
streamboolean

Whether to stream the response as it's generated (server-sent events) or return it all at once.

Default: false
stopstring | array

Sequences where the model should stop generating tokens. Can be a string or array of strings.

Example: ["\n", "."]
presence_penaltynumber

Penalizes tokens that have already appeared, encouraging the model to talk about new topics.

Range: -2.0 to 2.0, Default: 0.0

Response Format

Example Response
1{
2  "id": "chatcmpl-abc123",
3  "object": "chat.completion",
4  "created": 1699896916,
5  "model": "openai/gpt-3.5-turbo",
6  "choices": [
7    {
8      "index": 0,
9      "message": {
10        "role": "assistant",
11        "content": "The capital of France is Paris."
12      },
13      "logprobs": null,
14      "finish_reason": "stop"
15    }
16  ],
17  "usage": {
18    "prompt_tokens": 19,
19    "completion_tokens": 8,
20    "total_tokens": 27
21  }
22}

Response Fields

choices[].message.content

The AI-generated response text

choices[].finish_reason

Why the model stopped: "stop" (natural end), "length" (hit token limit), "content_filter" (filtered)

usage.total_tokens

Total tokens used (prompt + completion). Used for billing.

model

The model that was actually used (may differ from request if model routing occurred)

Code Examples

Python (OpenAI SDK)

chat_completion.py
1import openai
2
3# Initialize the OpenAI client
4client = openai.OpenAI(
5    api_key="YOUR_API_KEY",
6    base_url="https://arvae.ai/api/v1"
7)
8
9# Define the messages to send
10messages = [
11    {
12        "role": "user",
13        "content": "Hello, can you help me with a coding question?"
14    }
15]
16
17# Call the API
18response = client.chat.completions.create(
19    model="<Model Name>",
20    messages=messages,
21    temperature=0.7,
22    max_tokens=1000,
23    top_p=1.0,
24    stream=False
25)
26
27# Print the response
28print(response.choices[0].message.content)

Node.js (OpenAI SDK)

chatCompletion.js
1import OpenAI from 'openai';
2
3// Initialize the OpenAI client
4const openai = new OpenAI({
5  apiKey: 'YOUR_API_KEY',
6  baseURL: 'https://arvae.ai/api/v1'
7});
8
9// Define the messages to send
10const messages = [
11  {
12    "role": "user",
13    "content": "Hello, can you help me with a coding question?"
14  }
15];
16
17async function main() {
18  try {
19    // Call the API
20    const response = await openai.chat.completions.create({
21      model: "<Model Name>",
22      messages,
23      temperature: 0.7,
24      max_tokens: 1000,
25      top_p: 1.0,
26      stream: false
27    });
28
29    // Log the response
30    console.log(response.choices[0].message.content);
31  } catch (error) {
32    console.error('Error:', error);
33  }
34}
35
36main();

cURL

Terminal
1curl https://arvae.ai/api/v1/chat/completions \
2  -H "Content-Type: application/json" \
3  -H "Authorization: Bearer YOUR_API_KEY" \
4  -d '{
5    "model": "<Model Name>",
6    "messages": [
7      {
8        "role": "user",
9        "content": "Hello, who are you?"
10      }
11    ],
12    "temperature": 0.7,
13    "max_tokens": 1024,
14    "top_p": 1.0,
15    "stream": false
16  }'

Streaming Responses

Set stream: true to receive the response as it's generated. This provides a better user experience for longer responses.

Streaming Example
1from openai import OpenAI
2
3client = OpenAI(
4    api_key="YOUR_API_KEY",
5    base_url="https://arvae.ai/api/v1"
6)
7
8stream = client.chat.completions.create(
9    model="openai/gpt-3.5-turbo",
10    messages=[{"role": "user", "content": "Write a short story"}],
11    stream=True
12)
13
14for chunk in stream:
15    if chunk.choices[0].delta.content is not None:
16        print(chunk.choices[0].delta.content, end="")

Note: When streaming, you receive multiple events. Each contains a delta with the new content. The final event has finish_reason set and no content.

Error Handling

Common HTTP Status Codes

400 - Bad Request (invalid parameters)
401 - Unauthorized (invalid API key)
402 - Payment Required (insufficient credits)
429 - Too Many Requests (rate limited)
500 - Internal Server Error

Error Response Format

1{
2  "error": {
3    "message": "Invalid API key provided",
4    "type": "invalid_request_error",
5    "code": "invalid_api_key"
6  }
7}

Best Practices

✅ Do

  • • Use system messages to set context and behavior
  • • Handle errors gracefully with try-catch blocks
  • • Monitor token usage to control costs
  • • Use streaming for long responses
  • • Set appropriate max_tokens limits
  • • Include conversation history for context

❌ Don't

  • • Expose API keys in client-side code
  • • Send extremely long messages without need
  • • Ignore rate limits and error responses
  • • Use high temperature for factual queries
  • • Forget to validate user inputs
  • • Hardcode model names without fallbacks

Next Steps

🚀 Try the Playground

Test different parameters and models interactively before implementing in your code.

Open Playground →

📖 Learn About Models

Understand different AI models and choose the right one for your use case.

Model Guide →

🔧 Advanced Features

Learn about streaming, function calling, and other advanced API features.

Advanced Guides →

💡 See Examples

Browse practical examples and complete applications built with the API.

View Examples →