AI Foundations Training


            ← Back to Course

            
                Claude API in Practice – A Complete Developer Reference
                
                    Lesson 4: Streaming Responses                

                            Log in to enroll.
            
                
                
                    Lesson Objectives
By the end of this lesson, students should be able to:
Implement a streaming API call using the Anthropic SDK
Handle streaming events in the correct order
Apply progressive rendering for text streaming in a UI
Handle tool use in a streaming response
Lesson Content
Why streaming matters.
Without streaming, a user waits for the entire response to generate before seeing anything. For a 500-token response at typical generation speed, that is a 3-8 second blank wait. With streaming, tokens appear as they are generated – the user sees the first word in under a second, dramatically improving perceived responsiveness.
Streaming is important for: chat interfaces, document generation UIs, any user-facing application where the response is read as it arrives.
Streaming with the SDK (Python example pattern).
“python with client.messages.stream( model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role": "user", "content": "Write a short summary."}], ) as stream: for text in stream.text_stream: print(text, end="", flush=True) “
The .text_stream iterator yields text tokens as they arrive. For a web application, each token chunk is sent to the client via Server-Sent Events (SSE) or WebSocket.
The streaming event model.
The raw streaming API produces events (verify current event types at docs.anthropic.com):
message_start: Message metadata (ID, model, initial usage)
content_block_start: Start of a content block
content_block_delta: Token chunk for text, or tool input delta
content_block_stop: End of content block
message_delta: Updated stop_reason and output token count
message_stop: Stream complete
The SDK's .text_stream abstracts these events into a simple text iterator. For production implementations needing full event access (to capture usage, handle tool use, or implement custom rendering), use the raw event stream.
Streaming with tool use.
Tool use in a streaming response requires collecting the full tool call before executing it (tool parameters arrive incrementally). The SDK handles this with the get_final_message() method – which collects the full streamed response including tool calls before returning a complete message object. For most applications, use .get_final_message() rather than processing raw tool input deltas.
Progressive rendering patterns.
For web UIs: stream tokens via SSE to the client, append to a DOM element as chunks arrive. Markdown rendering should happen after the full message is received to avoid re-rendering flicker – or use a streaming Markdown renderer that handles incremental updates.
Practical Example
A documentation generator takes 8-12 seconds to produce a complete response.
Without streaming, users see a loading spinner for the full duration.
With streaming, users see the first line in under one second and can begin reading immediately.
In user testing, streaming mode produces 40% higher satisfaction scores for the same content.
The streaming implementation took two hours – the impact on perceived quality was substantial.
Safety Notes
Streaming responses that render progressively to users may display content that would be modified or reversed later in the response. For sensitive content (medical, legal, financial), progressive rendering may expose preliminary statements before Claude has added the full context or qualifications. Consider whether progressive rendering is appropriate for your specific use case and content domain.
                

                            Log in and enroll to access lesson quizzes.
            
                        
            
                                    
                        Previous Lesson
                        ← Tool Use – Connecting Claude to Your Code
                    
                            
            
                                    
                        Next Lesson
                        Production Patterns – Reliability and Cost at Scale →
                    
                            
        
                    

            ← Back to Course