AWS Brings Custom Nova Models to SageMaker Inference

AWS has launched general availability of custom Amazon Nova model support in Amazon SageMaker Inference, enabling production-grade deployment of fine-tuned Nova models with enhanced control over instance types, auto-scaling, and inference parameters.

AWS has announced the general availability of custom Amazon Nova model support in Amazon SageMaker Inference, marking a significant expansion of its managed inference capabilities for AI workloads. This release addresses customer demand for production-grade deployment of customized Nova models with enhanced control over infrastructure and inference parameters.

Enhanced Control for Production Workloads

The new feature provides developers with granular control over instance types, auto-scaling policies, context length, and concurrency settings—capabilities that production workloads demand. Customers can now deploy customized Nova models with continued pre-training, supervised fine-tuning, or reinforcement fine-tuning for their specific use cases.

Key infrastructure options include EC2 G5 and G6 instances, which offer optimized GPU utilization compared to P5 instances, potentially reducing inference costs. The service supports auto-scaling based on 5-minute usage patterns and allows configurable inference parameters to optimize the latency-cost-accuracy tradeoff for specific workloads.

Supported Model Variants and Regions

At launch, the service supports Nova Micro, Nova Lite, and Nova 2 Lite models with reasoning capabilities. Regional availability includes US East (N. Virginia) and US West (Oregon), with additional regions planned according to the AWS Capabilities by Region documentation.

Instance type support varies by model:

Nova Micro: g5.12xlarge, g5.24xlarge, g5.48xlarge, g6.12xlarge, g6.24xlarge, g6.48xlarge, and p5.48xlarge
Nova Lite: g5.24xlarge, g5.48xlarge, g6.24xlarge, g6.48xlarge, and p5.48xlarge
Nova 2 Lite: p5.48xlarge only

Deployment Options and Configuration

Customers can deploy custom Nova models through multiple pathways. The SageMaker Studio interface provides a visual deployment experience where users can select trained Nova models from the Models menu and deploy them with configurable options including instance count, permissions, and networking settings.

For programmatic deployment, the SageMaker AI SDK enables creation of model objects that reference Nova model artifacts stored in S3. The SDK supports advanced configuration parameters including context length, concurrency settings, temperature, and top_p values. A typical deployment workflow involves creating a SageMaker model object, defining an endpoint configuration with production variants, and creating a real-time endpoint.

Inference Capabilities and Testing

Once deployed, endpoints support both synchronous and asynchronous inference patterns. Synchronous endpoints handle real-time requests with streaming and non-streaming modes, while asynchronous endpoints process batch workloads. The service includes a Playground tab in SageMaker Studio for interactive testing with chat interfaces.

Advanced inference parameters include support for streaming requests with comprehensive options such as max_tokens, temperature, top_p, top_k, logprobs, and reasoning_effort settings. The streaming implementation provides chunked responses for real-time applications, with automatic detection of streaming versus non-streaming requests.

Cost Structure and Billing

The service follows a pay-per-use model with per-hour billing and no minimum commitments. Customers pay only for the compute instances they use, making it suitable for both development and production workloads. The pricing structure is detailed on the Amazon SageMaker AI Pricing page.

Integration with Existing Workflows

This release complements AWS's existing serverless customization capabilities introduced at re:Invent 2025, which enable model selection and customization through a few clicks. For organizations with existing custom Nova model artifacts, the new inference support provides a seamless path from training to production deployment.

Getting Started

Developers can begin using the service immediately through the Amazon SageMaker AI console. Comprehensive documentation is available through the Best Practices for SageMaker AI guide, and full code examples are provided in the Customizing Amazon Nova models on Amazon SageMaker AI documentation.

Feedback channels include AWS re:Post for SageMaker and traditional AWS Support contacts, allowing customers to influence future development of the service.

The general availability of custom Nova model support in SageMaker Inference represents AWS's continued investment in providing comprehensive AI model lifecycle management, from training and customization through to scalable production deployment.

#AWS #SageMaker #Nova #inference #Machine Learning