Streaming Responses
Streaming enables your application to receive the model's response in real-time as it's being generated, rather than waiting for the complete response.
This guide explains how to implement streaming with the Arvae AI API across different programming languages and frameworks.
Why Use Streaming?
Better User Experience
Users see responses as they're generated, creating a more interactive and engaging experience.
Perceived Performance
Improves the perceived responsiveness of your application, even for longer responses.
Early Cancellation
Allows users to cancel or interrupt generation if they've seen enough or want to refine their prompt.
When to Use Streaming
Streaming is particularly valuable for chat interfaces, content generation tools, and any application where users are waiting for potentially lengthy AI-generated responses.
Enabling Streaming
To enable streaming, set the stream
parameter to true
in your API request:
1{
2 "model": "openai/chatgpt-4o-latest",
3 "messages": [
4 {"role": "user", "content": "Write a short story about a robot learning to paint."}
5 ],
6 "stream": true
7}
Implementation Examples
JavaScript (Fetch API)
1// Streaming with Fetch API
2async function streamCompletion() {
3 const response = await fetch('https://arvae.ai/api/v1/chat/completions', {
4 method: 'POST',
5 headers: {
6 'Content-Type': 'application/json',
7 'Authorization': 'Bearer YOUR_API_KEY'
8 },
9 body: JSON.stringify({
10 model: 'openai/chatgpt-4o-latest',
11 messages: [
12 {role: 'user', content: 'Write a poem about the ocean.'}
13 ],
14 stream: true
15 })
16 });
17
18 // Check if response is ok
19 if (!response.ok) {
20 const error = await response.json();
21 throw new Error(error.error?.message || 'API request failed');
22 }
23
24 // Create a reader from the response body stream
25 const reader = response.body.getReader();
26 const decoder = new TextDecoder('utf-8');
27
28 // Display container
29 const outputDiv = document.getElementById('output');
30 outputDiv.textContent = '';
31
32 // Process the stream
33 let buffer = '';
34 while (true) {
35 const { done, value } = await reader.read();
36 if (done) break;
37
38 // Decode the stream chunk
39 const chunk = decoder.decode(value);
40 buffer += chunk;
41
42 // Process complete stream events
43 let lines = buffer.split('\n');
44 buffer = lines.pop() || '';
45
46 for (const line of lines) {
47 if (line.startsWith('data: ')) {
48 const data = line.slice(6);
49
50 // Check for [DONE] message
51 if (data === '[DONE]') continue;
52
53 try {
54 const parsed = JSON.parse(data);
55 const content = parsed.choices[0]?.delta?.content || '';
56
57 if (content) {
58 // Append content to the output
59 outputDiv.textContent += content;
60 }
61 } catch (e) {
62 console.error('Error parsing stream data:', e);
63 }
64 }
65 }
66 }
67}
Python (Requests)
1import requests
2import json
3
4def stream_completion():
5 url = "https://arvae.ai/api/v1/chat/completions"
6 headers = {
7 "Content-Type": "application/json",
8 "Authorization": f"Bearer YOUR_API_KEY"
9 }
10 data = {
11 "model": "openai/chatgpt-4o-latest",
12 "messages": [
13 {"role": "user", "content": "Write a poem about the ocean."}
14 ],
15 "stream": True
16 }
17
18 # Make the request with stream=True
19 response = requests.post(url, headers=headers, json=data, stream=True)
20
21 if response.status_code != 200:
22 raise Exception(f"API request failed: {response.text}")
23
24 # Process the stream
25 collected_content = ""
26
27 for line in response.iter_lines():
28 if line:
29 # Remove 'data: ' prefix
30 line = line.decode('utf-8')
31 if line.startswith('data: '):
32 data = line[6:] # Remove 'data: ' prefix
33
34 # Check for [DONE] message
35 if data == '[DONE]':
36 break
37
38 try:
39 parsed = json.loads(data)
40 content = parsed.get("choices", [{}])[0].get("delta", {}).get("content", "")
41
42 if content:
43 # Print content as it comes
44 print(content, end="", flush=True)
45 collected_content += content
46 except json.JSONDecodeError:
47 print(f"Failed to parse: {data}")
48
49 return collected_content
50
51# Example usage
52result = stream_completion()
53print("\nComplete response:", result)
Node.js (OpenAI Library)
You can use the OpenAI Node.js library with Arvae by setting the baseURL
when initializing the client:
1import OpenAI from 'openai';
2
3const openai = new OpenAI({
4 apiKey: 'YOUR_API_KEY',
5 baseURL: 'https://arvae.ai/api/v1'
6});
7
8async function streamCompletion() {
9 const stream = await openai.chat.completions.create({
10 model: 'openai/chatgpt-4o-latest',
11 messages: [
12 {role: 'user', content: 'Write a poem about the ocean.'}
13 ],
14 stream: true
15 });
16
17 // Handle the stream
18 for await (const chunk of stream) {
19 // Process each chunk
20 const content = chunk.choices[0]?.delta?.content || '';
21 if (content) {
22 // In a real application, you would append this to your UI
23 process.stdout.write(content);
24 }
25 }
26
27 console.log('\nStream complete');
28}
Stream Format
Each stream event is a JSON object prefixed with data:
and followed by a newline. The final event is data: [DONE]
.
Example Stream Event
Notice that each chunk contains only the delta (the new tokens) rather than the entire text so far. Your client code needs to concatenate these deltas to build the complete response.
Error Handling in Streams
When using streaming, error handling is particularly important:
- Check HTTP status: Before processing the stream, check the HTTP status of the response. A non-200 status indicates an error.
- Handle stream parsing errors: Wrap JSON parsing in try-catch blocks to handle malformed stream data.
- Network failures: Implement retry logic for network interruptions during streaming.
- Content filter interruptions: Be prepared for streams that may end early due to content filtering.