daBongo LMS AI Training Courses

Claude API in Practice – A Complete Developer Reference

Lesson 4: Streaming Responses

Lesson Objectives

By the end of this lesson, students should be able to:

  • Implement a streaming API call using the Anthropic SDK
  • Handle streaming events in the correct order
  • Apply progressive rendering for text streaming in a UI
  • Handle tool use in a streaming response

Lesson Content

Why streaming matters.

Without streaming, a user waits for the entire response to generate before seeing anything. For a 500-token response at typical generation speed, that is a 3-8 second blank wait. With streaming, tokens appear as they are generated – the user sees the first word in under a second, dramatically improving perceived responsiveness.

Streaming is important for: chat interfaces, document generation UIs, any user-facing application where the response is read as it arrives.

Streaming with the SDK (Python example pattern).

python with client.messages.stream( model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role": "user", "content": "Write a short summary."}], ) as stream: for text in stream.text_stream: print(text, end="", flush=True)

The .text_stream iterator yields text tokens as they arrive. For a web application, each token chunk is sent to the client via Server-Sent Events (SSE) or WebSocket.

The streaming event model.

The raw streaming API produces events (verify current event types at docs.anthropic.com):

  • message_start: Message metadata (ID, model, initial usage)
  • content_block_start: Start of a content block
  • content_block_delta: Token chunk for text, or tool input delta
  • content_block_stop: End of content block
  • message_delta: Updated stop_reason and output token count
  • message_stop: Stream complete

The SDK's .text_stream abstracts these events into a simple text iterator. For production implementations needing full event access (to capture usage, handle tool use, or implement custom rendering), use the raw event stream.

Streaming with tool use.

Tool use in a streaming response requires collecting the full tool call before executing it (tool parameters arrive incrementally). The SDK handles this with the get_final_message() method – which collects the full streamed response including tool calls before returning a complete message object. For most applications, use .get_final_message() rather than processing raw tool input deltas.

Progressive rendering patterns.

For web UIs: stream tokens via SSE to the client, append to a DOM element as chunks arrive. Markdown rendering should happen after the full message is received to avoid re-rendering flicker – or use a streaming Markdown renderer that handles incremental updates.

Practical Example

A documentation generator takes 8-12 seconds to produce a complete response.

Without streaming, users see a loading spinner for the full duration.

With streaming, users see the first line in under one second and can begin reading immediately.

In user testing, streaming mode produces 40% higher satisfaction scores for the same content.

The streaming implementation took two hours – the impact on perceived quality was substantial.

Safety Notes

Streaming responses that render progressively to users may display content that would be modified or reversed later in the response. For sensitive content (medical, legal, financial), progressive rendering may expose preliminary statements before Claude has added the full context or qualifications. Consider whether progressive rendering is appropriate for your specific use case and content domain.

Log in and enroll to access lesson quizzes.

Scroll to Top