How Microsoft Built the Learn MCP Server

Required Knowledge

Model Context Protocol (MCP) - An open standard that lets AI agents discover and call external tools dynamically. Think of it like a USB standard for AI: instead of custom wiring for every integration, any compliant agent can plug into any compliant server and immediately know what it can do.

RAG (Retrieval-Augmented Generation) - A technique where an AI model fetches relevant documents before answering, rather than relying solely on its training data. This keeps answers grounded in current, accurate sources instead of hallucinated facts.

Vector Search - A way of finding documents by meaning rather than exact keywords. Text is converted to numerical embeddings, and search finds the closest matches in that space. "How do I deploy to Azure?" finds relevant results even if the docs say "publish to Azure" instead.

Session Affinity - When a distributed system routes requests from the same client to the same server instance. Important for stateful protocols where the server needs to remember prior messages in a conversation.

Streamable HTTP Transport - A HTTP-based connection pattern where the server can push data back to the client incrementally, useful for long-running AI responses that arrive token by token rather than all at once.

My Key Takeaways

Tool descriptions in MCP servers are effectively UX copy for AI agents. Small wording changes shifted tool activation rates materially, so Microsoft built automated evaluation loops to iterate them based on observed agent behavior.
Reusing an existing internal RAG service (the same backend powering "Ask Learn") let the team launch a production-quality MCP server without building a new retrieval pipeline from scratch. Leverage existing infrastructure where you can.
Even with a dynamic tool discovery protocol, clients hardcoded schemas in practice. When Microsoft renamed a parameter from "question" to "query," 2–5% of requests broke. Always support both names during a deprecation window.
Shipping a public remote MCP server is an infra problem as much as a protocol problem: cross-region deployments, CORS, session affinity, statelessness, and data protection all required explicit solutions.
Splitting search and fetch into two separate tools mirrors how humans actually read docs (search → skim → open article), which improved agent groundedness and citation quality versus a single monolithic retrieval call.