AI Foundations Training


            ← Back to Course

            
                Model Context Protocol – Advanced Patterns for Production
                
                    Lesson 1: Sampling – Servers That Make LLM Requests                

                            Log in to enroll.
            
                
                
                    Lesson Objectives
By the end of this lesson, students should be able to:
Explain what MCP sampling is and when to use it
Implement a sampling request from a server to the host
Handle sampling responses in a tool handler
Identify the appropriate use cases for sampling vs. direct API calls
Lesson Content
What sampling is.
In standard MCP flow, Claude (the model) calls tools on the server. Sampling reverses this: the server requests a completion from the model via the host client. The server does not have direct API access – it asks the host to make an LLM request on its behalf and return the result.
This enables servers to use AI capabilities without managing API keys, billing, or model selection. The host's existing model connection is reused.
When to use sampling.
Use sampling in a server when:
A tool handler needs to process data with AI assistance before returning a result
The server needs to make a decision that benefits from language model reasoning
You want server-side AI processing without adding API access to the server itself
Do not use sampling when a direct API call from the server is simpler and clearer – sampling adds a round-trip through the host and is appropriate for cases where sharing the host's model connection is the right architecture.
Implementing sampling (Python pattern).
“`python @app.call_tool() async def call_tool(name: str, arguments: dict): if name == "analyze_and_summarize": raw_data = fetch_data(arguments["source"])
Request a completion from the host via sampling
sampling_result = await app.request_sampling( messages=[ {"role": "user", "content": f"Summarize this data: {raw_data}"} ], max_tokens=200 )
return [types.TextContent(type="text", text=sampling_result.content[0].text)] “`
Verify current SDK method signatures for sampling at modelcontextprotocol.io/sdk – the pattern may vary between SDK versions.
Human-in-the-loop via sampling.
Sampling passes through the host – hosts can (and often do) surface sampling requests to the user for approval before sending to the model. This creates a human-in-the-loop mechanism for AI-driven server decisions. Design sampling requests with this in mind: if your server makes sensitive decisions via sampling, the human-approval path is a feature, not an obstacle.
Practical Example
A code analysis MCP server fetches code from a repository, then uses sampling to ask Claude to categorize the code's main patterns before returning the analysis.
The server does not have its own API key – it uses the host's model connection via sampling.
The host (Claude Code) shows the sampling request to the developer before sending it.
The developer approves.
The server receives the categorization and includes it in its tool response.
Zero new API credentials for the server.
Safety Notes
Sampling requests from a server flow through the host's model connection and are billed to the host's account. For servers that make frequent or large sampling requests, the token consumption may be substantial and unexpected. Document sampling usage clearly so server users understand the token cost implications. Hosts can implement sampling rate limits or approval gates to manage this.
                

                            Log in and enroll to access lesson quizzes.
            
                        
            
                            
            
                                    
                        Next Lesson
                        Notifications and Real-Time Updates →
                    
                            
        
                    

            ← Back to Course