Streaming Responses

Streaming enables your application to receive the model's response in real-time as it's being generated, rather than waiting for the complete response.

This guide explains how to implement streaming with the Arvae AI API across different programming languages and frameworks.

Why Use Streaming?

Better User Experience

Users see responses as they're generated, creating a more interactive and engaging experience.

Perceived Performance

Improves the perceived responsiveness of your application, even for longer responses.

Early Cancellation

Allows users to cancel or interrupt generation if they've seen enough or want to refine their prompt.

When to Use Streaming

Streaming is particularly valuable for chat interfaces, content generation tools, and any application where users are waiting for potentially lengthy AI-generated responses.

Enabling Streaming

To enable streaming, set the stream parameter to true in your API request:

1{
2  "model": "openai/chatgpt-4o-latest",
3  "messages": [
4    {"role": "user", "content": "Write a short story about a robot learning to paint."}
5  ],
6  "stream": true
7}

Implementation Examples

JavaScript (Fetch API)

1// Streaming with Fetch API
2async function streamCompletion() {
3  const response = await fetch('https://arvae.ai/api/v1/chat/completions', {
4    method: 'POST',
5    headers: {
6      'Content-Type': 'application/json',
7      'Authorization': 'Bearer YOUR_API_KEY'
8    },
9    body: JSON.stringify({
10      model: 'openai/chatgpt-4o-latest',
11      messages: [
12        {role: 'user', content: 'Write a poem about the ocean.'}
13      ],
14      stream: true
15    })
16  });
17
18  // Check if response is ok
19  if (!response.ok) {
20    const error = await response.json();
21    throw new Error(error.error?.message || 'API request failed');
22  }
23
24  // Create a reader from the response body stream
25  const reader = response.body.getReader();
26  const decoder = new TextDecoder('utf-8');
27  
28  // Display container
29  const outputDiv = document.getElementById('output');
30  outputDiv.textContent = '';
31
32  // Process the stream
33  let buffer = '';
34  while (true) {
35    const { done, value } = await reader.read();
36    if (done) break;
37    
38    // Decode the stream chunk
39    const chunk = decoder.decode(value);
40    buffer += chunk;
41
42    // Process complete stream events
43    let lines = buffer.split('\n');
44    buffer = lines.pop() || '';
45
46    for (const line of lines) {
47      if (line.startsWith('data: ')) {
48        const data = line.slice(6);
49        
50        // Check for [DONE] message
51        if (data === '[DONE]') continue;
52        
53        try {
54          const parsed = JSON.parse(data);
55          const content = parsed.choices[0]?.delta?.content || '';
56          
57          if (content) {
58            // Append content to the output
59            outputDiv.textContent += content;
60          }
61        } catch (e) {
62          console.error('Error parsing stream data:', e);
63        }
64      }
65    }
66  }
67}

Python (Requests)

1import requests
2import json
3
4def stream_completion():
5    url = "https://arvae.ai/api/v1/chat/completions"
6    headers = {
7        "Content-Type": "application/json",
8        "Authorization": f"Bearer YOUR_API_KEY"
9    }
10    data = {
11        "model": "openai/chatgpt-4o-latest",
12        "messages": [
13            {"role": "user", "content": "Write a poem about the ocean."}
14        ],
15        "stream": True
16    }
17    
18    # Make the request with stream=True
19    response = requests.post(url, headers=headers, json=data, stream=True)
20    
21    if response.status_code != 200:
22        raise Exception(f"API request failed: {response.text}")
23    
24    # Process the stream
25    collected_content = ""
26    
27    for line in response.iter_lines():
28        if line:
29            # Remove 'data: ' prefix
30            line = line.decode('utf-8')
31            if line.startswith('data: '):
32                data = line[6:]  # Remove 'data: ' prefix
33                
34                # Check for [DONE] message
35                if data == '[DONE]':
36                    break
37                
38                try:
39                    parsed = json.loads(data)
40                    content = parsed.get("choices", [{}])[0].get("delta", {}).get("content", "")
41                    
42                    if content:
43                        # Print content as it comes
44                        print(content, end="", flush=True)
45                        collected_content += content
46                except json.JSONDecodeError:
47                    print(f"Failed to parse: {data}")
48    
49    return collected_content
50
51# Example usage
52result = stream_completion()
53print("\nComplete response:", result)

Node.js (OpenAI Library)

You can use the OpenAI Node.js library with Arvae by setting the baseURL when initializing the client:

1import OpenAI from 'openai';
2
3const openai = new OpenAI({
4  apiKey: 'YOUR_API_KEY',
5  baseURL: 'https://arvae.ai/api/v1'
6});
7
8async function streamCompletion() {
9  const stream = await openai.chat.completions.create({
10    model: 'openai/chatgpt-4o-latest',
11    messages: [
12      {role: 'user', content: 'Write a poem about the ocean.'}
13    ],
14    stream: true
15  });
16
17  // Handle the stream
18  for await (const chunk of stream) {
19    // Process each chunk
20    const content = chunk.choices[0]?.delta?.content || '';
21    if (content) {
22      // In a real application, you would append this to your UI
23      process.stdout.write(content);
24    }
25  }
26  
27  console.log('\nStream complete');
28}

Stream Format

Each stream event is a JSON object prefixed with data: and followed by a newline. The final event is data: [DONE].

Example Stream Event

Note: Each line below represents a separate chunk received from the streaming API. These chunks would be sent sequentially in a real stream.
data:
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"openai/chatgpt-4o-latest","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data:
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"openai/chatgpt-4o-latest","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
data:
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"openai/chatgpt-4o-latest","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data:
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"openai/chatgpt-4o-latest","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data:
[DONE]

Notice that each chunk contains only the delta (the new tokens) rather than the entire text so far. Your client code needs to concatenate these deltas to build the complete response.

Error Handling in Streams

When using streaming, error handling is particularly important:

  1. Check HTTP status: Before processing the stream, check the HTTP status of the response. A non-200 status indicates an error.
  2. Handle stream parsing errors: Wrap JSON parsing in try-catch blocks to handle malformed stream data.
  3. Network failures: Implement retry logic for network interruptions during streaming.
  4. Content filter interruptions: Be prepared for streams that may end early due to content filtering.