GPU Virtualization: Choosing Between Pass-Through, vGPU, and MIG

Every GPU virtualization conversation eventually hits the same fork in the road: do you give a virtual machine the entire GPU, share it across multiple VMs, or carve it into hardware-isolated slices? The answer depends on your workload, your budget, and how much isolation you actually need. Getting it wrong costs you either performance or money, and often both.

Key Takeaways
  • GPU virtualization takes three distinct forms — pass-through, vGPU, and MIG — each optimized for a different workload profile and cost model
  • Pass-through delivers maximum raw performance but eliminates sharing, wastes utilization between jobs, and blocks live VM migration
  • NVIDIA vGPU enables multi-VM GPU sharing with full driver compatibility, but requires per-instance software licensing on top of hardware cost
  • MIG enforces hardware-level isolation between GPU slices, delivering guaranteed performance predictability without additional licensing fees
  • Choosing the wrong GPU virtualization approach costs you performance, budget, or operational flexibility — often all three
  • VergeOS supports all three GPU virtualization methods natively from a single unified platform with no separate hypervisors or management consoles

GPU Pass-Through: When You Need Everything the Card Has

GPU pass-through dedicates a physical GPU entirely to a single virtual machine. The hypervisor steps aside, and the VM communicates directly with the hardware. There is no sharing, no scheduler overhead, and no other workload competing for memory bandwidth or compute throughput. The VM sees the GPU exactly as a bare-metal server would.

Key Terms
GPU Pass-Through
A virtualization method that assigns an entire physical GPU exclusively to a single VM, bypassing the hypervisor for direct hardware access and maximum throughput.
NVIDIA vGPU (Virtual GPU)
A software stack that partitions a physical GPU into multiple virtual GPU instances, each with a dedicated VRAM allocation and a full NVIDIA driver running inside the guest OS.
MIG (Multi-Instance GPU)
A hardware-level partitioning feature introduced with NVIDIA Ampere that carves a GPU into isolated slices — each with its own compute engines, memory, and memory bandwidth — enforced in silicon, not software.
VRAM (Video RAM)
High-bandwidth on-card memory used to store model weights, textures, and intermediate compute data. VRAM capacity is typically the binding constraint in LLM inference and training workloads.
Hypervisor
The software layer that creates and manages virtual machines by abstracting physical compute, memory, storage, and network resources from the underlying hardware.
VDI (Virtual Desktop Infrastructure)
A delivery model where desktop operating systems run as VMs on centralized servers and stream to end users over a network. GPU-accelerated VDI is a primary use case for NVIDIA vGPU.
Live Migration
The ability to move a running VM between physical hosts without downtime. GPU pass-through prevents live migration in most hypervisors; vGPU and MIG both support it.

You choose pass-through when a single workload demands the full card: large language model training runs, high-fidelity simulation, or rendering pipelines that consume every gigabyte of VRAM. It also makes sense when the software stack requires bare-metal GPU behavior and cannot tolerate any abstraction layer between the application and the hardware.

The negatives are significant. One VM per GPU means utilization collapses the moment that workload stops running. Pass-through also complicates live migration, since most hypervisors cannot migrate a VM with a passed-through device without stopping it first. At scale, pass-through turns expensive GPU hardware into a single-tenant resource with no flexibility.

vGPU: When You Need Multiple VMs to Share a Single GPU

NVIDIA vGPU is a full GPU virtualization stack. The physical GPU is partitioned into virtual GPU instances, and each VM gets a dedicated vGPU with its own memory allocation and its own NVIDIA driver running inside the guest. From the guest OS perspective, the vGPU looks and behaves like a discrete GPU. Applications that require a certified NVIDIA driver run without modification.

This is the right model for VDI environments where knowledge workers need GPU-accelerated desktops, for inference endpoints where multiple services share a model loaded into VRAM, and for developer environments where teams need GPU access but no single engineer needs an entire card. vGPU delivers density without sacrificing driver compatibility.

The negatives center on cost and scheduling. NVIDIA vGPU requires a software license on top of the hardware cost, and those licenses are recurring. Workloads competing simultaneously for GPU cycles will see throughput variation. vGPU profiles are also fixed at VM creation time, limiting reallocation without VM reconfiguration.

MIG: When You Need Hardware-Enforced Isolation

Multi-Instance GPU is a hardware-level feature introduced with NVIDIA’s Ampere architecture and extended in Hopper. Instead of virtualizing GPU access in software, MIG partitions the physical GPU in silicon. Each instance gets its own dedicated compute engines, memory, and memory bandwidth, and the isolation is enforced by the hardware itself — not the driver or the hypervisor.

You choose MIG when workloads need predictable, guaranteed performance and true fault isolation. If one MIG instance encounters an error or runs a noisy workload, it does not affect neighboring instances. This matters in multi-tenant environments and in regulated industries where workload isolation is a compliance requirement, not a preference.

The negatives are inflexibility and fixed geometry. MIG slice sizes are defined by the GPU architecture, and you cannot create arbitrary partition sizes. MIG is available on data center-class cards starting with A100 and H100, which narrows the hardware footprint where it applies.

GPU Virtualization: Choosing the Right Approach

Pass-Through vGPU MIG
Best for LLM training, full-GPU simulation, bare-metal-dependent software VDI, multi-user inference, developer GPU access Multi-tenant isolation, regulated workloads, guaranteed inference SLAs
GPU sharing None — one VM owns the card Yes — time-shared across VMs Yes — hardware-partitioned slices
Isolation Complete — dedicated hardware Software-level Hardware-enforced in silicon
Performance predictability Highest Variable under contention Guaranteed per slice
Live VM migration Not supported Supported Supported
Hardware requirement Any NVIDIA GPU vGPU-supported cards A100, H100, or newer
Additional licensing None NVIDIA vGPU license required None beyond hardware
Key limitation Poor utilization, no sharing, no live migration Recurring license cost, throughput variation under load Fixed slice geometry, limited GPU hardware support

How VergeOS Handles All Three

VergeOS supports GPU pass-through, NVIDIA vGPU, and MIG as native capabilities within its unified platform. You do not need separate hypervisors, separate management consoles, or separate licensing stacks for each GPU virtualization method. The same platform that manages your compute, storage, and networking manages your GPU resources.

Administrators configure GPU resources through a single interface and assign them to VMs using the appropriate model for each workload. The operational advantage is consistency. Your team learns one workflow instead of three. GPU utilization appears alongside CPU and memory in the same monitoring dashboard, and workloads move between hosts using the same live migration mechanics that govern the rest of the environment.

GPU virtualization is not a one-size-fits-all decision. But the platform managing those workloads should not make the decision harder than it has to be.

See It in Action — April 2 Webinar

Join us live with NVIDIA’s Jimmy Rotella as we walk through how VergeOS delivers vGPU, pass-through, and MIG in a unified private cloud environment — on your existing hardware.

Register Now

Frequently Asked Questions

What is the difference between NVIDIA vGPU and MIG?

vGPU is a software partitioning model — multiple VMs share GPU time and memory through an NVIDIA scheduling layer, with each guest running a full NVIDIA driver. MIG is a hardware partitioning model — the GPU is physically divided in silicon into isolated instances with their own compute engines and dedicated memory bandwidth. MIG provides stronger isolation and guaranteed performance per slice; vGPU provides broader hardware support and more flexible profile sizing.

Do I need special hardware for GPU virtualization?

GPU pass-through works with any NVIDIA GPU that supports IOMMU and SR-IOV on the host platform. NVIDIA vGPU requires a supported data center or professional GPU from the vGPU-capable product line. MIG requires Ampere-generation or newer data center cards — specifically A100 or H100 class hardware. Consumer gaming GPUs do not support vGPU or MIG.

Can I use GPU pass-through and still live-migrate VMs?

Not typically. Most hypervisors cannot live-migrate a VM that holds a passed-through PCI device without stopping the VM first. The VM owns the physical device, so the device cannot move transparently between hosts. If live migration is a requirement, vGPU or MIG are the correct approaches.

Does NVIDIA vGPU licensing apply per VM or per GPU?

NVIDIA vGPU licensing is per concurrent user or per virtual GPU instance depending on the license tier — not per physical GPU. Organizations running dense vGPU deployments should factor recurring license costs into the total cost model, as they scale with active instances rather than physical card count.

How does VergeOS simplify GPU resource management across all three methods?

VergeOS manages GPU pass-through, vGPU, and MIG through the same unified interface that handles compute, storage, and networking. Administrators assign GPU resources to VMs using a single workflow regardless of the virtualization method. GPU utilization metrics surface in the same dashboards as CPU and memory, and live migration for vGPU and MIG workloads uses the same mechanics as non-GPU VMs — no separate management consoles required.

Which GPU virtualization method is best for AI inference?

It depends on scale and isolation requirements. For a small number of high-throughput services that need maximum GPU access, pass-through is appropriate. For multi-tenant inference platforms where multiple teams share GPU hardware, vGPU provides density with full driver compatibility. For regulated environments or SLA-bound inference services requiring predictable latency guarantees, MIG’s hardware-enforced isolation is the correct choice.

Unknown's avatar

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: , , , ,
Posted in Article, Blog

Leave a comment

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 17.4K other subscribers
Blog Stats
  • 1,990,410 views