#LLMs

Cohere Command A+ Joins Microsoft Foundry Managed Compute – What It Means for Multi‑Cloud AI Strategies

Cloud Reporter
5 min read

Cohere’s new Command A+ model is now offered as a Managed Compute service in Microsoft Foundry, giving enterprises a high‑performance, open‑source MoE model on dedicated Azure infrastructure. The announcement reshapes the competitive field among Azure, AWS, and Google Cloud for enterprise‑grade agentic AI, prompting a reassessment of pricing, migration paths, and operational trade‑offs.

What changed

Microsoft announced that Cohere Command A+ is available through the Microsoft Foundry Managed Compute marketplace. The offering bundles Cohere’s latest mixture‑of‑experts (MoE) model—218 B parameters total, 25 B active—with a fully managed serving stack on Azure hardware. Enterprises can now spin up the model on a single NVIDIA Blackwell GPU or a pair of H100s, benefit from speculative decoding, and access a 128 K context window that supports text, images, and tool‑use inputs. Because the model is released under an Apache 2.0 license, customers retain the right to modify, redistribute, or embed it in proprietary pipelines while enjoying Azure’s governance, monitoring, and scaling tools.

Provider comparison

Feature Microsoft Foundry (Managed Compute) AWS Bedrock (custom model) Google Vertex AI (Model Garden)
Model availability Cohere Command A+ (open‑source, Apache 2.0) Upload custom Docker image or use SageMaker JumpStart Upload custom TensorFlow/PyTorch model; limited open‑source catalog
Hardware options Dedicated Blackwell or H100 GPUs, auto‑scaled clusters EC2 P4/P5 instances, Elastic Inference, custom GPU fleets A2 GPU instances (L4, H100) with auto‑scaling
Pricing model Pay‑as‑you‑go compute (per‑second VM) + Managed Compute fee; no extra model royalty Instance‑hour rates + data‑processing fees; optional model‑usage royalty for Bedrock‑hosted models VM‑hour rates + Vertex AI‑specific usage fees; no royalty for user‑uploaded models
Operational stack Integrated with Azure AI Studio, Azure Monitor, Azure Policy, and Azure OpenAI governance hooks SageMaker Pipelines, Model Monitor, CloudWatch, IAM policies Vertex AI Pipelines, Model Monitoring, Cloud Logging, IAM
Multi‑cloud portability Model can be exported (ONNX, vLLM) for on‑prem or other clouds; Azure provides secure artifact store Export via SageMaker Model Registry; requires re‑deployment on other clouds Export via Vertex AI Model Registry; compatible with GKE or Anthos for hybrid
Support for 48 languages & multimodal Native in Command A+, no extra licensing Depends on custom model; language support must be built in Same as custom model; Google provides pre‑trained multilingual models but not Cohere’s MoE
Governance & compliance Azure Policy, Azure Security Center, regional data residency options AWS Config, IAM, Macie; region‑specific compliance varies Google Cloud Security Command Center, VPC Service Controls

Pricing considerations

  • Compute cost – Azure’s per‑second pricing for a Blackwell GPU is roughly $3.20 / hour, while an H100 costs about $4.50 / hour. The Managed Compute layer adds a modest service fee (≈ 10 % of compute). By contrast, AWS’s p4d.24xlarge (8 × H100) runs at $32 / hour, and SageMaker adds a $0.10 / hour overhead for model hosting. Google’s A2‑high‑gpu (1 × H100) is priced near $4.80 / hour plus Vertex AI usage fees.
  • Licensing – Because Command A+ is Apache 2.0, there is no per‑token royalty, unlike proprietary models on Bedrock or Vertex AI that charge $0.0004‑$0.001 per 1 K tokens. Enterprises can therefore calculate total cost of ownership (TCO) primarily from compute and storage.
  • Scaling discounts – Azure offers reserved instance discounts up to 40 % for 1‑year commitments; similar programs exist on AWS and Google. For bursty workloads, the on‑demand rates described above dominate.

Migration considerations

  1. Artifact portability – Cohere publishes the model in both vLLM and Transformers formats. Teams can pull the Docker image from the Cohere container registry, test locally, then push the same image to Azure Container Registry for Managed Compute. The same image can be re‑tagged for Amazon Elastic Container Registry (ECR) or Google Artifact Registry, preserving a single build pipeline.
  2. Data pipelines – Existing RAG pipelines that rely on Azure Cognitive Search can be kept intact; the Managed Compute service integrates with Azure Search via built‑in connectors. Migrating to AWS would require rebuilding the retrieval layer with OpenSearch or Kendra, adding latency and operational overhead.
  3. Observability – Azure Monitor provides out‑of‑the‑box metrics for request latency, token throughput, and GPU utilization. When moving to another cloud, teams must recreate dashboards in CloudWatch or Cloud Operations Suite, and re‑implement alerting rules.
  4. Security posture – Azure’s Confidential Computing options (e.g., SGX‑enabled VMs) can host the model in an encrypted enclave, a capability not yet matched by AWS or Google for GPU workloads. Enterprises with strict data‑in‑use requirements may favor Azure for this reason.
  5. Team skill set – Organizations already using Azure AI Studio will face a lower learning curve. Switching to SageMaker or Vertex AI would require new CI/CD pipelines, IAM policies, and possibly retraining staff on different SDKs.

Business impact

  • Cost efficiency – The MoE architecture of Command A+ means that only a fraction of the 218 B parameters are active per token. Benchmarks from Cohere show up to 63 % more output tokens per second compared with a dense 70 B model on the same hardware. For a typical enterprise workload that generates 10 M tokens per day, this translates to roughly 2 M fewer GPU‑seconds, saving several thousand dollars per month.
  • Speed to market – Managed Compute eliminates the need for in‑house GPU orchestration. Teams can provision a production‑grade endpoint in under 15 minutes, run automated compliance scans, and start A/B testing agentic workflows without a dedicated DevOps sprint.
  • Vendor lock‑in mitigation – Because the model is open source and the container image is portable, enterprises retain the option to relocate workloads to on‑prem hardware or another cloud if pricing or compliance requirements change. This flexibility reduces strategic risk compared with proprietary models that tie usage to a single provider’s pricing.
  • Competitive positioning – Companies that adopt Command A+ via Azure can advertise a unified AI stack that spans data ingestion, retrieval, reasoning, and multimodal processing. This can be a differentiator in sectors such as finance, legal, and healthcare, where end‑to‑end traceability and multilingual support are critical.
  • Future roadmap – Cohere has indicated that upcoming releases will extend the MoE architecture to 500 B total parameters while keeping active parameters under 30 B. Azure’s commitment to offering newer GPU generations (e.g., Blackwell successors) suggests that performance gaps will continue to shrink, making the Managed Compute model a long‑term strategic asset.

Getting started – To trial Command A+, visit the Microsoft Foundry Managed Compute catalog and follow the quick‑start guide. For deeper technical details, see Cohere’s model card on GitHub and the Azure AI documentation on vLLM integration.

Comments

Loading comments...