NVIDIA Rubin: 6 New Chips for AI Supercomputers

Six chips in one rack: NVIDIA's blueprint for 2026's AI supercomputers. The name? Rubin, honoring astronomer Vera Florence Cooper Rubin who changed how we understand the universe. While the name sounds romantic, the numbers are brutally business-focused: 10x lower cost per token for inference and 4x fewer GPUs needed for training mixture-of-experts (MoE) models. Anthropic, OpenAI and xAI have already committed to adopting the platform.

🚀 The Architecture of Tomorrow

NVIDIA Rubin isn't just a Blackwell upgrade. It's a completely different design philosophy called "extreme codesign" — all six chips designed together to function as one entity. At the system's heart sits the NVIDIA Vera Rubin NVL72, a rack-scale system combining 72 Rubin GPUs with 36 Vera CPUs. What do these numbers mean in practice? Each GPU delivers 3.6TB/s bandwidth, while the entire rack hits 260TB/s — more than the entire internet combined, as NVIDIA hyperbolically claims. NVLink 6 handles GPU-to-GPU communication, critical for the massive MoE models dominating today's landscape.

The Vera CPU: Built for Agentic AI

The NVIDIA Vera CPU represents something entirely new in the ecosystem. With 88 custom Olympus cores and full Armv9.2 compatibility, it's designed specifically for agentic reasoning — what we call the new generation of AI that thinks in steps and makes decisions. The difference from traditional CPUs: Ultrafast NVLink-C2C connectivity lets it communicate directly with GPUs, eliminating bottlenecks. This becomes critical when AI models need to "think" through long token sequences — something that happens constantly in reasoning models.

📊 Why OpenAI and Anthropic Are Betting on Rubin

The list of companies adopting Rubin reads like a who's who of the AI industry. Sam Altman from OpenAI put it simply: "Intelligence scales with compute." If you want to scale frontier models, you need the best infrastructure.

What the CEOs are saying:
• Dario Amodei (Anthropic): "The efficiency gains enable longer memory, better reasoning and more reliable outputs"
• Elon Musk (xAI): "Rubin will be the rocket engine for AI — if you want frontier models at scale, this is what you use"
• Mark Zuckerberg (Meta): "It promises the step-change in performance we need for deployment to billions of people"

Microsoft announced "Fairwater AI superfactories" with Vera Rubin NVL72 systems that will scale to hundreds of thousands of Vera Rubin Superchips. CoreWeave will be among the first to offer Rubin through its Mission Control platform.

Why Now?

The timing isn't coincidental. Agentic AI and reasoning models expose new computational constraints. When a model must "think" through multiple steps, process long token sequences and maintain context across thousands of interactions, traditional systems show their limits.

⚡ Five Innovations That Change Everything

NVLink 6: The Nervous System

The sixth generation of NVLink brings built-in, in-network compute to accelerate collective operations. What does this mean? Instead of GPUs waiting to exchange data and then process it, processing happens during transfer — like having computers inside the cables.

Third-Generation Transformer Engine

Hardware-accelerated adaptive compression: the system automatically compresses data where possible without losing accuracy. Result? 50 petaflops NVFP4 compute for AI inference — a number that confuses even experts.

NVIDIA Confidential Computing

For the first time in a rack-scale platform, NVIDIA brings Confidential Computing that protects data across CPU, GPU and NVLink domains simultaneously. Why does this matter? Because proprietary models are worth billions, and security can't be an afterthought.

🔧 AI-Native Storage: The Silent Revolution

BlueField-4 Storage Processor

ASTRA Advanced Secure Architecture

One of the less advertised but equally critical pieces is the NVIDIA Inference Context Memory Storage Platform. Imagine having a conversation with an AI that remembers everything you've discussed over months — not just with you, but with thousands of other users. This requires massive amounts of key-value cache data. The BlueField-4 DPU handles this challenging task, enabling efficient sharing and reuse of context across the entire AI infrastructure. Meanwhile, ASTRA provides system-level trust architecture — a trusted control point to provision, isolate and operate large-scale AI environments.

The Context Problem

Why is context management so crucial? Because agentic AI systems aren't chatbots. They're systems that must remember previous interactions, connect information from different sessions and "learn" from each interaction. This creates exponentially growing demands on memory and storage.

📖 Read more: AI and Jobs: 15 Professions at Risk

🏭 From Lab to Production: The Scale-Up

Two main versions will be available. The NVIDIA Vera Rubin NVL72 is the flagship — a unified system combining 72 Rubin GPUs, 36 Vera CPUs and all supporting chips in one rack. For smaller applications, the NVIDIA HGX Rubin NVL8 offers eight Rubin GPUs in server board format supporting x86 platforms. The NVIDIA DGX SuperPOD serves as reference architecture for large-scale deployments. It combines either DGX Vera Rubin NVL72 or DGX Rubin NVL8 systems with BlueField-4 DPUs, ConnectX-9 SuperNICs and InfiniBand networking.

"AI factories are shifting toward bare-metal and multi-tenant deployment models. You need strong infrastructure control and isolation without sacrificing performance."
NVIDIA Engineering Team

Spectrum-X Ethernet: The Quiet Upgrade

One element that passed almost unnoticed is NVIDIA Spectrum-X Ethernet Photonics. The new switch systems promise 5x better power efficiency and uptime. Why does this matter? Because data centers consume enormous amounts of energy, and every efficiency improvement translates to millions in savings.

⚙ The Practical Dimension: What Actually Changes

What does all this mean for the average developer or enterprise wanting to integrate AI? First, lower costs — the 10x reduction in inference token cost isn't marketing, it's the difference between a profitable AI service and one that burns money. Second, better performance on complex reasoning tasks. Multi-step problems that take seconds today will run in milliseconds. Third, greater reliability — the second-generation RAS Engine offers real-time health checks and proactive maintenance.

18x Faster Maintenance

The modular, cable-free tray design enables up to 18x faster assembly and service compared to Blackwell

Rack-Scale Security

First rack-scale platform with NVIDIA Confidential Computing across CPU, GPU and NVLink domains

NVIDIA hasn't answered when these systems will reach production. The 2026 timeline looks optimistic given Blackwell's delays. AI hardware supply chain remains a bottleneck that hasn't been fully resolved.

NVIDIA Rubin Platform AI Supercomputers GPU Architecture OpenAI Anthropic Frontier Models AI Hardware

Sources:

NVIDIA's Rubin Platform: The 6-Chip Architecture Powering Next-Gen AI Supercomputers for 2026