Inside Supermicro's NVIDIA B300 Solutions: Building the Next Generation of AI Factories

Supermicro's comprehensive tour reveals how the NVIDIA B300 generation transforms AI infrastructure with integrated networking, advanced cooling, and scalable designs for modern AI factories.

Modern AI servers are remarkable feats of engineering. Each successive generation raises the bar not just on raw GPU performance, but on the complexity of the systems required to support those GPUs. Networking, power delivery, liquid cooling, and software management have all evolved dramatically, making it genuinely challenging to keep track of all the changes.

To address that, we visited Supermicro at its headquarters in San Jose, California, for a comprehensive tour of the major systems and infrastructure components that make today's AI factories possible.

The scope of Supermicro's in-house design and manufacturing is striking. The company is not simply integrating third-party components into a chassis. From the server nodes themselves to the cold plates, cooling manifolds, in-row CDUs, rear door heat exchangers, power shelves, and even large outdoor cooling towers, Supermicro is engineering and producing the full stack.

Supermicro is doing this to deliver NVIDIA B300-generation solutions at scale. I asked Supermicro and NVIDIA if we could look at the differences between the NVIDIA B200 and B300 generations with the other Data Center Building Block Solutions that Supermicro makes, supporting its AI Factory efforts. Of course, we do not have all of this hardware sitting around, and we are filming inside Supermicro's factory in San Jose, California. We have to say this is sponsored. My thought was that it is worth just showing some of the key changes side-by-side, and so we needed help to get all of the components together. We started with more modest goals, then we just ended up looking at almost everything in the video because we were on site, and Supermicro gave us access to its San Jose, California, factory.

The Two Generations: NVIDIA HGX B300 vs. HGX B200 Air-Cooled Servers

Our first stop covered Supermicro's air-cooled HGX server line, where the generational comparison between NVIDIA B200 and B300 becomes immediately tangible. Supermicro has offered air-cooled HGX 8-GPU servers for multiple generations, and placing a B200-generation system next to a B300-generation system makes the key changes obvious.

On the B200-generation system, a row of eight discrete NVIDIA ConnectX-7 network interface cards is visible along the bottom of the front panel. Each of those cards operates at 400 gigabits per second and is paired with a single GPU, for a total of eight NICs across eight GPUs. The space above those NICs is dedicated solely to air-cooling hardware.

The B300-generation server looks different immediately. Those eight discrete NICs are gone from the front. That is because the NVIDIA HGX B300 8-GPU baseboard integrates NVIDIA ConnectX-8 networking directly onto the board itself. Each ConnectX-8 interface provides 800 gigabits per second, and because the NICs are no longer occupying front-panel slots, those 800Gbps ports are instead exposed as direct external ports across the top of the system.

The result is a cleaner front-panel layout and, more importantly, double the network throughput per GPU compared to the B200 generation.

Liquid-Cooled HGX B300 and B200: What Changes and What Stays the Same

Moving to the liquid-cooled variants of these servers, the generational story is similar but told differently. At first glance, a liquid-cooled NVIDIA HGX B200 server and a liquid-cooled NVIDIA HGX B300 server appear nearly identical.

A distinction that matters again, beyond the GPUs being used, is the networking. On the liquid-cooled Supermicro NVIDIA HGX B200 system, slots for discrete NVIDIA ConnectX-7 NICs are visible at the front of the chassis. Eight GPUs, eight NICs, 400Gbps per NIC. That is a proven, well-understood configuration that has served many production deployments well.

On the liquid-cooled Supermicro NVIDIA HGX B300 system, those front-panel NICs are gone, replaced by the integrated NVIDIA ConnectX-8 architecture. One ConnectX-8 per GPU, 800Gbps per GPU, for a total of 6.4Tbps of network throughput across the full 8-GPU system.

This is worth emphasizing because most coverage of the B300 generation focuses on the memory capacity and compute improvements (e.g. from 192GB to 288GB of HBM3E.) The doubling of network bandwidth is at least as significant for distributed training and large-scale inference deployments.

Supermicro designs and manufactures the liquid cooling components for these systems in-house. The warm and cool coolant hoses exiting the rear of each server connect to rack manifolds that Supermicro also produces. The cold plates that make direct contact with GPUs, CPUs, and other thermal loads are also Supermicro designs.

This level of vertical integration in the cooling stack is one of the key factors that enable faster delivery of complete, validated systems.

The ORV3 B300 NVL8 System: Up to 144 GPUs in a 48U Rack

One of the more distinctive systems on the tour was an Open Rack V3 (ORV3) chassis hosting an NVIDIA B300 NVL8 baseboard. This is a two-unit system with a clear separation of function: the lower unit contains the host node with standard CPUs and memory, while the upper unit houses the NVL8 GPU baseboard itself.

The NVL8 baseboard connects eight B300 GPUs via NVLink, giving those GPUs a shared high-bandwidth interconnect within the node. Each GPU carries 288 GB of HBM3E memory, and the integrated ConnectX-8 networking delivers over 800Gbps of east-west bandwidth out of the node. All of this runs in a liquid-cooled ORV3 enclosure with blind-mate connectors at the rear for both the cooling loop and power delivery.

Because the 2U ORV3 NVL8 chassis is considerably denser than a standard rackmount form factor, a 48U ORV3 rack can accommodate up to 144 B300 GPUs. The shared ORV3 infrastructure with the NVL72 means that investments in power and cooling infrastructure are compatible across both product lines, which simplifies planning for operators who need to deploy both form factors.

PCIe GPU Servers and the Move to ConnectX-8 Switch Architecture

Not every AI workload calls for an HGX system. For mixed workloads that combine inference with tasks such as graphics rendering, VDI, engineering simulation, or other GPU-accelerated applications, PCIe GPU servers remain highly relevant. Supermicro is bringing the same ConnectX-8 networking improvements to this server class.

The new-generation PCIe GPU server uses a ConnectX-8 PCIe switchboard design. Four ConnectX-8 NICs are installed, with each NIC serving two GPUs. Each NIC provides 400 Gbps, giving a total of 3.2Tbps across all four NICs for the full 8-GPU system. That is double the throughput of the older design that used four ConnectX-7 NICs at 400Gbps each, producing only 1.6Tbps total.

The ConnectX-8 PCIe switch architecture also introduces PCIe Gen 6 support, which matters for high-throughput GPUs like the NVIDIA RTX Pro 6000 Blackwell Server Edition. Those GPUs fully utilize PCIe Gen 5 x16 bandwidth, and the ConnectX-8 switch gives them a network interface commensurate with the PCIe slot throughput itself.

Supermicro also continues to offer standard 2U rackmount servers with its own riser and motherboard designs engineered to properly power, cool, and connect multiple GPUs in conventional rack infrastructure.

The Cooling Tower: Scaling Liquid Cooling for AI Factories

One of the most impressive sights was Supermicro's cooling tower infrastructure. These outdoor units can reject up to 2.5 megawatts of heat per tower, making them suitable for large-scale AI deployments. The towers use evaporative cooling technology, which provides excellent power usage effectiveness (PUE) for data centers in appropriate climates.

Each tower connects to Supermicro's in-row CDUs and cooling manifolds, creating a complete liquid cooling ecosystem. The company manufactures these towers in-house, giving customers a single-vendor solution for their entire cooling infrastructure.

Power Delivery and Infrastructure

The power delivery systems have also evolved significantly. Supermicro's ORV3 racks include high-density power shelves that can deliver up to 120kW per rack. These shelves integrate with the company's monitoring and management systems, providing real-time visibility into power consumption, temperature, and system health.

The power distribution units (PDUs) have been redesigned to handle the higher currents required by modern AI systems. This includes improved cable management, redundant power feeds, and hot-swappable components to minimize downtime during maintenance.

Software and Management

Supermicro's management software has been updated to handle the complexity of modern AI infrastructure. The SuperCloud Composer provides a unified interface for monitoring and managing thousands of servers across multiple data centers. This includes GPU health monitoring, power optimization, and automated firmware updates.

The software also integrates with NVIDIA's management tools, providing a complete solution for deploying and maintaining AI workloads. This integration extends to the cooling systems, where the software can adjust cooling parameters based on workload demands to optimize efficiency.

The Complete AI Factory Solution

What becomes clear from the tour is that Supermicro is not just building servers; it is creating complete AI factory solutions. This includes:

Server nodes with integrated networking and cooling
Power distribution and management systems
Liquid cooling infrastructure from cold plates to cooling towers
Software for deployment and management
Manufacturing capabilities to deliver at scale

The company's vertical integration allows it to optimize these components as a complete system rather than as separate pieces. This integration reduces complexity for customers and can improve reliability and performance.

Looking Ahead

The NVIDIA B300 generation represents more than just incremental improvements. The integrated networking, increased memory capacity, and improved power efficiency create new possibilities for AI model sizes and training approaches. Supermicro's comprehensive solution approach means that customers can deploy these systems at scale without having to integrate components from multiple vendors.

As AI models continue to grow in size and complexity, the infrastructure to support them becomes increasingly critical. Companies like Supermicro that can provide complete, integrated solutions will play a crucial role in enabling the next generation of AI applications.

The tour demonstrated that building an AI factory requires expertise across multiple domains: electrical engineering, thermal management, networking, software development, and manufacturing. Supermicro's ability to handle all of these aspects in-house positions it well to support the continued growth of AI infrastructure.

#Supermicro #NVIDIA B300 #AI_Infrastructure #liquid cooling #ConnectX-8