Chat Completions API
The Chat Completions API is the primary interface for interacting with language models. Send a list of messages and get an AI-generated response. Compatible with OpenAI's API format.
Endpoint
POST https://arvae.ai/api/v1/chat/completions
✅ OpenAI Compatible
This endpoint is fully compatible with OpenAI's Chat Completions API. You can use existing OpenAI SDKs by simply changing the base URL.
client = OpenAI(base_url="https://arvae.ai/api/v1", api_key="your-key")
Authentication
Include your API key in the Authorization header as a Bearer token:
1curl -X POST https://arvae.ai/api/v1/chat/completions \
2 -H "Content-Type: application/json" \
3 -H "Authorization: Bearer YOUR_API_KEY" \
4 -d '{"model": "openai/gpt-3.5-turbo", "messages": [...]}'
Security: Never expose your API key in client-side code. Always make requests from your backend or use environment variables.
Request Format
1{
2 "model": "openai/gpt-3.5-turbo",
3 "messages": [
4 {
5 "role": "system",
6 "content": "You are a helpful assistant."
7 },
8 {
9 "role": "user",
10 "content": "What is the capital of France?"
11 }
12 ],
13 "max_tokens": 150,
14 "temperature": 0.7,
15 "stream": false
16}
Parameters
Required Parameters
model
stringThe AI model to use for completion. See available models.
"openai/gpt-3.5-turbo"
, "anthropic/claude-3.5-sonnet"
, "hanooman/hanooman-everest"
messages
arrayArray of message objects that form the conversation. Each message has a role ("system", "user", or "assistant") and content.
[{"role": "user", "content": "Hello!"}, {"role": "assistant", "content": "Hi there!"}]
Optional Parameters
max_tokens
integerMaximum number of tokens to generate. Default varies by model (typically 1024-4096).
temperature
numberControls randomness. Higher values make output more creative, lower values more focused and deterministic.
top_p
numberNucleus sampling parameter. Alternative to temperature for controlling randomness.
stream
booleanWhether to stream the response as it's generated (server-sent events) or return it all at once.
stop
string | arraySequences where the model should stop generating tokens. Can be a string or array of strings.
["\n", "."]
presence_penalty
numberPenalizes tokens that have already appeared, encouraging the model to talk about new topics.
Response Format
1{
2 "id": "chatcmpl-abc123",
3 "object": "chat.completion",
4 "created": 1699896916,
5 "model": "openai/gpt-3.5-turbo",
6 "choices": [
7 {
8 "index": 0,
9 "message": {
10 "role": "assistant",
11 "content": "The capital of France is Paris."
12 },
13 "logprobs": null,
14 "finish_reason": "stop"
15 }
16 ],
17 "usage": {
18 "prompt_tokens": 19,
19 "completion_tokens": 8,
20 "total_tokens": 27
21 }
22}
Response Fields
choices[].message.content
The AI-generated response text
choices[].finish_reason
Why the model stopped: "stop" (natural end), "length" (hit token limit), "content_filter" (filtered)
usage.total_tokens
Total tokens used (prompt + completion). Used for billing.
model
The model that was actually used (may differ from request if model routing occurred)
Code Examples
Python (OpenAI SDK)
1import openai
2
3# Initialize the OpenAI client
4client = openai.OpenAI(
5 api_key="YOUR_API_KEY",
6 base_url="https://arvae.ai/api/v1"
7)
8
9# Define the messages to send
10messages = [
11 {
12 "role": "user",
13 "content": "Hello, can you help me with a coding question?"
14 }
15]
16
17# Call the API
18response = client.chat.completions.create(
19 model="<Model Name>",
20 messages=messages,
21 temperature=0.7,
22 max_tokens=1000,
23 top_p=1.0,
24 stream=False
25)
26
27# Print the response
28print(response.choices[0].message.content)
Node.js (OpenAI SDK)
1import OpenAI from 'openai';
2
3// Initialize the OpenAI client
4const openai = new OpenAI({
5 apiKey: 'YOUR_API_KEY',
6 baseURL: 'https://arvae.ai/api/v1'
7});
8
9// Define the messages to send
10const messages = [
11 {
12 "role": "user",
13 "content": "Hello, can you help me with a coding question?"
14 }
15];
16
17async function main() {
18 try {
19 // Call the API
20 const response = await openai.chat.completions.create({
21 model: "<Model Name>",
22 messages,
23 temperature: 0.7,
24 max_tokens: 1000,
25 top_p: 1.0,
26 stream: false
27 });
28
29 // Log the response
30 console.log(response.choices[0].message.content);
31 } catch (error) {
32 console.error('Error:', error);
33 }
34}
35
36main();
cURL
1curl https://arvae.ai/api/v1/chat/completions \
2 -H "Content-Type: application/json" \
3 -H "Authorization: Bearer YOUR_API_KEY" \
4 -d '{
5 "model": "<Model Name>",
6 "messages": [
7 {
8 "role": "user",
9 "content": "Hello, who are you?"
10 }
11 ],
12 "temperature": 0.7,
13 "max_tokens": 1024,
14 "top_p": 1.0,
15 "stream": false
16 }'
Streaming Responses
Set stream: true
to receive the response as it's generated. This provides a better user experience for longer responses.
1from openai import OpenAI
2
3client = OpenAI(
4 api_key="YOUR_API_KEY",
5 base_url="https://arvae.ai/api/v1"
6)
7
8stream = client.chat.completions.create(
9 model="openai/gpt-3.5-turbo",
10 messages=[{"role": "user", "content": "Write a short story"}],
11 stream=True
12)
13
14for chunk in stream:
15 if chunk.choices[0].delta.content is not None:
16 print(chunk.choices[0].delta.content, end="")
Note: When streaming, you receive multiple events. Each contains a delta with the new content. The final event has finish_reason
set and no content.
Error Handling
Common HTTP Status Codes
400
- Bad Request (invalid parameters)401
- Unauthorized (invalid API key)402
- Payment Required (insufficient credits)429
- Too Many Requests (rate limited)500
- Internal Server ErrorError Response Format
1{
2 "error": {
3 "message": "Invalid API key provided",
4 "type": "invalid_request_error",
5 "code": "invalid_api_key"
6 }
7}
Best Practices
✅ Do
- • Use system messages to set context and behavior
- • Handle errors gracefully with try-catch blocks
- • Monitor token usage to control costs
- • Use streaming for long responses
- • Set appropriate max_tokens limits
- • Include conversation history for context
❌ Don't
- • Expose API keys in client-side code
- • Send extremely long messages without need
- • Ignore rate limits and error responses
- • Use high temperature for factual queries
- • Forget to validate user inputs
- • Hardcode model names without fallbacks
Next Steps
🚀 Try the Playground
Test different parameters and models interactively before implementing in your code.
Open Playground →📖 Learn About Models
Understand different AI models and choose the right one for your use case.
Model Guide →🔧 Advanced Features
Learn about streaming, function calling, and other advanced API features.
Advanced Guides →💡 See Examples
Browse practical examples and complete applications built with the API.
View Examples →