Claude API in Practice – A Complete Developer Reference By the end of this lesson, students should be able to: Why streaming matters. Without streaming, a user waits for the entire response to generate before seeing anything. For a 500-token response at typical generation speed, that is a 3-8 second blank wait. With streaming, tokens appear as they are generated – the user sees the first word in under a second, dramatically improving perceived responsiveness. Streaming is important for: chat interfaces, document generation UIs, any user-facing application where the response is read as it arrives. Streaming with the SDK (Python example pattern). “ The The streaming event model. The raw streaming API produces events (verify current event types at docs.anthropic.com): The SDK's Streaming with tool use. Tool use in a streaming response requires collecting the full tool call before executing it (tool parameters arrive incrementally). The SDK handles this with the Progressive rendering patterns. For web UIs: stream tokens via SSE to the client, append to a DOM element as chunks arrive. Markdown rendering should happen after the full message is received to avoid re-rendering flicker – or use a streaming Markdown renderer that handles incremental updates. A documentation generator takes 8-12 seconds to produce a complete response. Without streaming, users see a loading spinner for the full duration. With streaming, users see the first line in under one second and can begin reading immediately. In user testing, streaming mode produces 40% higher satisfaction scores for the same content. The streaming implementation took two hours – the impact on perceived quality was substantial. Streaming responses that render progressively to users may display content that would be modified or reversed later in the response. For sensitive content (medical, legal, financial), progressive rendering may expose preliminary statements before Claude has added the full context or qualifications. Consider whether progressive rendering is appropriate for your specific use case and content domain. Log in and enroll to access lesson quizzes.
Lesson 4: Streaming Responses
Lesson Objectives
Lesson Content
python with client.messages.stream( model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role": "user", "content": "Write a short summary."}], ) as stream: for text in stream.text_stream: print(text, end="", flush=True) “.text_stream iterator yields text tokens as they arrive. For a web application, each token chunk is sent to the client via Server-Sent Events (SSE) or WebSocket.message_start: Message metadata (ID, model, initial usage)content_block_start: Start of a content blockcontent_block_delta: Token chunk for text, or tool input deltacontent_block_stop: End of content blockmessage_delta: Updated stop_reason and output token countmessage_stop: Stream complete.text_stream abstracts these events into a simple text iterator. For production implementations needing full event access (to capture usage, handle tool use, or implement custom rendering), use the raw event stream.get_final_message() method – which collects the full streamed response including tool calls before returning a complete message object. For most applications, use .get_final_message() rather than processing raw tool input deltas.Practical Example
Safety Notes