Building a Secure MCP Server on AWS for a Million‑Company B2B Platform – What Changed, How It Stacks Up, and the Business Impact
#Infrastructure

Building a Secure MCP Server on AWS for a Million‑Company B2B Platform – What Changed, How It Stacks Up, and the Business Impact

Cloud Reporter
6 min read

A deep‑dive into the design of a production‑grade Model Context Protocol (MCP) server on AWS that serves over one million company profiles. The article explains the architectural shifts, compares the AWS‑centric approach with alternative cloud providers, and outlines the operational and financial implications for enterprises adopting LLM‑driven data services.

What Changed

Fullinfo’s B2B intelligence platform moved from a demo‑only GraphQL wrapper to a first‑class MCP server running on AWS. The shift was driven by three concrete needs:

  1. Scale – the backend stores > 1 M company records, so a thin proxy would choke under real‑world traffic.
  2. Safety – LLM clients can generate ambiguous prompts; without strict contracts they could unintentionally mutate production data.
  3. Observability – production teams require audit trails, latency metrics, and clear error semantics before they trust an AI‑driven interface.

The result is a Go‑based MCP service that sits in front of an AWS AppSync GraphQL layer, enforces a default‑deny mutation policy, and exposes nine narrowly scoped tools (six read‑only, three write‑capable). The server authenticates per‑request with OIDC bearer tokens, logs its configuration at startup, and provides a local inspection CLI called MCP Inspector for rapid validation.


Provider Comparison

Aspect AWS Implementation (Fullinfo) Azure Alternative GCP Alternative
API Gateway / Transport Uses AppSync for GraphQL with built‑in OIDC support. Would rely on Azure API Management + Azure Functions to host a GraphQL endpoint. Could use Cloud Run with Apollo Server or App Engine for GraphQL.
Authentication OIDC tokens from corporate IdP, validated by AppSync’s @aws_oidc directive. Azure AD tokens validated by Azure AD B2C or Managed Identities in Functions. GCP IAM‑based JWTs validated by Cloud Endpoints or IAM‑authenticated Cloud Run.
Mutation Guard Simple CLI flag --allow-mutations toggles a boolean in every write tool. Would need a Feature Flag service (e.g., Azure App Configuration) or custom policy in Azure Functions. Could use Firebase Remote Config or a ConfigMap in GKE‑based deployment.
Observability Structured logs via Go’s slog to stderr, captured by CloudWatch Logs; future per‑request metrics can be sent to CloudWatch Metrics. Azure Monitor + Log Analytics would collect stdout logs; custom metrics via Application Insights. Stackdriver Logging + Monitoring would ingest logs; custom metrics via Cloud Monitoring.
Rate‑Limiting In‑process limiter (5 req/min for ai_search, 10 h for request_email_discovery). Azure API Management can enforce quotas, but would add latency; in‑process limiter similar to AWS is possible. GCP’s API Gateway provides quota enforcement, but again adds a network hop.
Cost Profile Pay‑as‑you‑go for AppSync (queries & data transfer) + EC2/ECS for the MCP binary. Azure Functions consumption plan + API Management tier; comparable but API Management adds a fixed monthly cost. Cloud Run request‑based pricing + Cloud Functions for tooling; generally cheaper at low volume but can spike with high query rates.
Migration Considerations Minimal – the MCP server talks to AppSync via standard GraphQL; existing IAM roles can be reused. Requires rewriting the GraphQL client to use Azure SDKs and possibly switching to Apollo Server for schema stitching. Need to replace mcp-go with a generic GraphQL client; authentication flow changes from OIDC to GCP IAM.

Key takeaway: AWS offers the most seamless integration for an MCP server that already depends on AppSync and OIDC. Azure and GCP can replicate the pattern, but they introduce extra moving parts (API Management, Cloud Endpoints) that increase latency and operational overhead.


Business Impact

1. Faster Time‑to‑Value for LLM‑Driven Workflows

By exposing a structured tool set (search_companies, get_company, add_to_collection, etc.), product teams can embed natural‑language queries directly into internal dashboards or external SaaS portals. The flat response format (CompanySummary) removes the need for downstream parsing logic, reducing integration effort by an estimated 30 % for each new consumer.

2. Reduced Risk and Compliance Exposure

The default‑deny mutation model ensures that no write operation reaches the backend unless an explicit --allow-mutations flag is set. This aligns with most enterprise compliance frameworks (e.g., SOC 2, ISO 27001) that require least‑privilege access. Auditors can verify that the MCP server logs every mutation attempt, even when blocked, providing a clear trail.

3. Predictable Operating Costs

Because the MCP server delegates all data reads to AppSync, the primary cost drivers are:

  • AppSync query units (charged per 1 M reads). With a hard cap of 100 results per call, the average query cost stays below 0.001 USD.
  • EC2/ECS instance running the Go binary (≈ $30 / month for a t3.medium).

Compared with a monolithic API that would need to scale horizontally to handle LLM traffic, the MCP approach keeps the compute footprint low and the cost model linear.

4. Operational Visibility Enables Continuous Improvement

Structured startup logs (auth=oidc mutations=false tools=8) give ops teams an instant sanity check. Adding per‑request logging (tool name, latency, outcome) will feed into CloudWatch Insights dashboards, allowing data‑driven decisions about:

  • Adjusting rate‑limit thresholds.
  • Promoting a write‑capable tool from --allow-mutations to production after a controlled pilot.
  • Identifying noisy queries that repeatedly hit the 100‑record cap and may need additional filters.

5. Migration Path for Existing Customers

Enterprises already on AWS can adopt the MCP server without refactoring their existing GraphQL contracts. The only required change is client‑side tooling – swapping a REST call for an MCP tool invocation. For customers on Azure or GCP, the article’s comparison table outlines a concrete migration checklist:

  1. Export the GraphQL schema from AppSync.
  2. Deploy an equivalent GraphQL service (Azure Functions + Apollo, or Cloud Run + gqlgen).
  3. Replace mcp-go with a generic GraphQL client.
  4. Re‑implement the mutation flag using the provider’s feature‑flag service.
  5. Wire logs to the provider’s observability stack.

Practical Recommendations

  • Treat the MCP layer as a product, not a wrapper. Define narrow tool contracts early and lock them down with unit tests and real‑system validation.
  • Enable the mutation flag only in controlled environments (e.g., dev or a staged pilot). Record every blocked attempt in logs.
  • Invest in local inspection tooling like MCP Inspector to catch backend bugs before they reach production – the create_collection Lambda null‑pointer error was discovered this way.
  • Add per‑request telemetry (tool name, duration, error type) to CloudWatch or the equivalent service; this turns a “gate” into a measurable SLA.
  • Monitor query patterns and adjust rate limits or default result caps as the user base grows. The 5 req/min for ai_search proved sufficient for early pilots but may need scaling for broader adoption.

Conclusion

The shift from a demo GraphQL proxy to a secure, production‑grade MCP server on AWS demonstrates that LLM‑driven interfaces can be safe, observable, and cost‑effective when built with classic engineering discipline. The key levers – narrow tool contracts, default‑deny mutations, OIDC‑backed authentication, and layered testing – are cloud‑agnostic, but AWS’s native services (AppSync, CloudWatch, OIDC integration) make the implementation especially tight.

Enterprises looking to expose their own data assets to LLMs should start by designing the MCP server as a first‑class API, using the comparison table to choose the provider that minimizes additional components. Once the contracts are in place, the business can unlock rapid AI‑enabled workflows while keeping risk and cost under control.


About the Author

Shadi Elyafi is a Senior Backend Engineer at Fullinfo, responsible for large‑scale B2B data platforms. His work spans Go, GraphQL, and AWS, with a focus on building resilient, cloud‑native services that power AI‑enabled products.

Featured image

Comments

Loading comments...