Model Context Protocol – Advanced Patterns for Production By the end of this lesson, students should be able to: What sampling is. In standard MCP flow, Claude (the model) calls tools on the server. Sampling reverses this: the server requests a completion from the model via the host client. The server does not have direct API access – it asks the host to make an LLM request on its behalf and return the result. This enables servers to use AI capabilities without managing API keys, billing, or model selection. The host's existing model connection is reused. When to use sampling. Use sampling in a server when: Do not use sampling when a direct API call from the server is simpler and clearer – sampling adds a round-trip through the host and is appropriate for cases where sharing the host's model connection is the right architecture. Implementing sampling (Python pattern). “`python @app.call_tool() async def call_tool(name: str, arguments: dict): if name == "analyze_and_summarize": raw_data = fetch_data(arguments["source"]) sampling_result = await app.request_sampling( messages=[ {"role": "user", "content": f"Summarize this data: {raw_data}"} ], max_tokens=200 ) return [types.TextContent(type="text", text=sampling_result.content[0].text)] “` Verify current SDK method signatures for sampling at modelcontextprotocol.io/sdk – the pattern may vary between SDK versions. Human-in-the-loop via sampling. Sampling passes through the host – hosts can (and often do) surface sampling requests to the user for approval before sending to the model. This creates a human-in-the-loop mechanism for AI-driven server decisions. Design sampling requests with this in mind: if your server makes sensitive decisions via sampling, the human-approval path is a feature, not an obstacle. A code analysis MCP server fetches code from a repository, then uses sampling to ask Claude to categorize the code's main patterns before returning the analysis. The server does not have its own API key – it uses the host's model connection via sampling. The host (Claude Code) shows the sampling request to the developer before sending it. The developer approves. The server receives the categorization and includes it in its tool response. Zero new API credentials for the server. Sampling requests from a server flow through the host's model connection and are billed to the host's account. For servers that make frequent or large sampling requests, the token consumption may be substantial and unexpected. Document sampling usage clearly so server users understand the token cost implications. Hosts can implement sampling rate limits or approval gates to manage this. Log in and enroll to access lesson quizzes.
Lesson 1: Sampling – Servers That Make LLM Requests
Lesson Objectives
Lesson Content
Request a completion from the host via sampling
Practical Example
Safety Notes