Your VMware Exit Will Hurt Unless You Start Now: A CTO’s 12‑Month Playbook

By Diogo Hudson Dias
Senior SRE installing a 1U server in a well-organized data center aisle with neatly routed cables.

Your hypervisor is now a board-level risk. After Broadcom’s licensing overhaul, many enterprises discovered their vSphere bill wasn’t just going up — it was doubling, sometimes worse. And this isn’t hypothetical: one US carrier reportedly began moving tens of thousands of VMs off VMware while fighting it out in court. If they can’t stomach the lock‑in, what makes you think you can?

You don’t solve this with a press release or a two-week “lift-and-shift.” Cross‑hypervisor live migration doesn’t exist. Your NSX constructs don’t map 1:1 to anything outside VMware. And taking the lazy route — “just put it all in EC2” — will hand you a new bill that’s 2–3x your fully amortized on‑prem cost once you add egress, reserved instances, backup, and the inevitable performance overprovisioning.

Here’s the uncomfortable truth: you need a disciplined exit plan, and it starts this quarter. Below is a 12‑month playbook, technology options with real trade‑offs, and the traps I’ve watched teams hit when they try to sprint for the door.

The Decision Framework: Four Viable Paths Off VMware

Pick a lane based on workload mix, skills, latency constraints, and compliance. This is a portfolio question, not a beauty contest.

1) KVM-first HCI (Proxmox VE, Harvester, oVirt derivatives)

Best for: Linux‑heavy estates, moderate scale (dozens to low hundreds of nodes), teams comfortable with Linux ops.

  • Why it works: KVM is the de facto hypervisor under AWS, GCP, and many clouds. Proxmox VE gives you a clean UI/API, clustering, live migration, and native backup. Harvester (KubeVirt + Longhorn) adds a Kubernetes substrate if you want to harmonize VM and container ops.
  • Economics: Proxmox VE is free to use; enterprise repo subscriptions run roughly €110–€935 per CPU socket per year depending on support tier. Even with support, you’ll usually land at 10–30% of your legacy VMware software spend for like‑for‑like capacity.
  • Trade‑offs: You lose some NSX/vSAN ergonomics. You’ll rebuild network overlays (OVN/OVS) and storage (Ceph/ZFS/Longhorn) yourself or with a partner.

2) OpenStack (with KVM, OVN, Ceph)

Best for: Multi‑team private clouds, multi‑tenant needs, strong ops maturity.

  • Why it works: OpenStack gives you IaaS primitives (compute, network, storage) with quotas, projects, and solid APIs. It’s the closest thing to a private EC2.
  • Economics: Software is open source; the cost is operations. Expect to invest in 24/7 SRE coverage and CI/CD for your cloud itself.
  • Trade‑offs: Complexity. If your team has never run a control plane before, don’t learn on your production exit.

3) Commercial HCI alternatives (Nutanix AHV)

Best for: Mixed Windows/Linux estates, teams wanting VMware‑like polish without VMware.

  • Why it works: Mature management plane, Windows guest tooling, integrated storage networking. AHV is KVM under the hood with Nutanix’s control plane.
  • Economics: You’ll pay real money, but for many orgs it still undercuts “new VMware” TCO once you price in Broadcom bundles you don’t need.
  • Trade‑offs: Another vendor relationship, with its own roadmap risk. Less DIY flexibility than Proxmox/OpenStack.

4) Public cloud rehost (EC2/Azure/GCP)

Best for: Small estates or teams already >70% cloud where on‑prem is technical debt.

  • Why it works: Fastest path to get out from under a VMware renewal. Rich ecosystem for DR, backups, and managed services.
  • Economics: Sticker shock is common. Once steady‑state, expect 1.5–3.0x vs. a well‑run on‑prem cluster of the same performance profile, depending on egress and RI discipline.
  • Trade‑offs: Latency, data residency, and cost control. Do not use this path to move chatty, storage‑heavy, low‑margin workloads unless you enjoy surprise bills.

Your 12‑Month VMware Exit Plan

0–30 Days: Freeze, Inventory, Baseline

  • Freeze features: Stop introducing NSX/vSAN‑specific constructs. No new SRM dependencies.
  • Inventory like a lender: Export with RVTools or your CMDB, but validate by sampling actual guest OS. Tag each VM with owner, RTO/RPO, CPU/RAM/IOPS, OS, and dependencies (DNS/AD/NTP, databases, message buses).
  • Cost baseline: Last 12 months of VMware line items, support tickets, backup licenses, power, and space. You need this to defend the business case.
  • Kill pet projects: If nobody can name the owner of a VM in 48 hours, put it on a deprecation path. Dead weight kills migrations.

30–90 Days: Pick the Target, Build a Pilot

  • Choose your lane. If you have mostly Linux and in‑house ops, Proxmox VE with Ceph is the fastest practical path. If you need multi‑tenant projects and quotas, evaluate OpenStack. If Windows admin comfort and white‑glove UX matter, shortlist Nutanix AHV. If your estate is tiny or already 80% cloud, plan a rehost.
  • Stand up a pilot: 3 nodes minimum (NVMe cache, 25/50Gbps NICs, 256–512GB RAM). For Proxmox, add Proxmox Backup Server. For storage, start with Ceph or ZFS mirrors depending on scale.
  • Exercise conversions: Use virt‑v2v for Linux VMs and Windows (remove VMware Tools, install virtio drivers). Expect a maintenance window; cross‑hypervisor live migration isn’t real.
  • Prove the boring stuff: Backups, restores, snapshots, template builds, and RBAC. If you haven’t restored a Windows DC from your new stack, you haven’t tested anything.

90–180 Days: Networking, Storage, and Wave 1

  • Network plan: Recreate VLANs and overlays with OVN/OVS or your switch fabric. Map NSX firewall rules to host‑based firewalls or upstream firewalls. Put DHCP/DNS under proper HA.
  • Storage plan: vSAN ≠ Ceph. Start with 3‑5 MON/MDS hosts with SSD/NVMe. Validate IOPS under your real workloads. Enable TRIM/UNMAP on guests to avoid I/O bloat.
  • Automation: Embrace IaC now. Use OpenTofu/Terraform providers for Proxmox or your chosen platform. Bake golden images with Packer/Cloud‑Init.
  • Wave 1 migrations: Pick three classes: a stateless app, a moderate DB, and a Windows service (print/ADCS/legacy line‑of‑business). Validate performance and operability. Document the runbooks you actually used, not the ones you wrote.

180–270 Days: Scale, DR, and Governance

  • Scale out: Expand to target capacity. Add a second rack or cluster for DR. Replicate backups offsite. For Proxmox, test cluster quorum failures and fencing.
  • DR patterns: Accept reality: failover will be cold‑to‑warm for most VMs. Script boot order, DNS flips, and database promotion. Time the whole exercise; if your RTO says 60 minutes and you hit 140, adjust the plan or the SLA.
  • Observability: Prometheus + Grafana for hosts and guests; Proxmox or OpenStack exporters; log aggregation with Loki/ELK. Track noisy neighbors and back them with resource limits or dedicated nodes.
  • Access and audit: SSO for consoles and APIs, short‑lived tokens, MFA requirements. If contractors participate (nearshore or otherwise), lock to device posture and IP allowlists.

270–360 Days: Cutover Waves and Decommission

  • Wave schedule: Treat this like a product launch calendar. Communicate freeze windows to stakeholders 30 days out; dry‑run complex moves in a staging cluster.
  • Performance tuning: Pin CPUs for latency‑sensitive DBs, enable hugepages where it helps, choose virtio‑scsi over legacy disk types, and verify NUMA alignment on big boxes.
  • Security hardening: UEFI Secure Boot on hosts, encrypted boot drives, TPM passthrough where needed, host patch pipelines. If you use AMD EPYC, evaluate SEV‑ES for VM memory encryption on hostile multi‑tenant clusters.
  • Shrink VMware: Turn off decommissioned hosts. Archive configs. Keep a small island only if you must (e.g., vendor‑locked appliances). Enter renewal talks with usage facts, not feelings.

What This Actually Costs

Numbers you can take to finance. These are directional; plug in your real quotes and power rates.

  • Hardware: A modern 1U dual‑socket server with 256–512GB RAM, 2×25Gbps NICs, and a pair of 1.6–3.2TB NVMe drives will land in the US$6k–$12k range depending on vendor and discounts. You’ll want 6–12 nodes to start if you’re replacing a modest vSphere estate.
  • Power: A virtualization node idles around 150–250W and hits 300–500W under load. Ten nodes at 300W average = ~3kW. At $0.12/kWh, that’s ~$315/month in power per 3kW before cooling and PUE overhead.
  • Software/support: Proxmox VE subscriptions per CPU socket run roughly €110 (Basic) to €935 (Premium) per year; many teams pick Standard (~€280/socket) for enterprise repo + support. Ceph is open source; consider paid support only if you lack in‑house skills. AHV and OpenStack distributions vary; you’ll pay for support and convenience.
  • People: This is the real cost. Expect 1–2 FTE of focused effort for 6–12 months to plan, pilot, and migrate a 150–300 VM estate, plus help from app owners for testing. If you don’t have that capacity, a dedicated nearshore SRE pod working with 6–8 hours overlap can compress the calendar without wrecking your team’s roadmap.

The Traps That Blow Up Timelines

  • Trying to live‑migrate cross‑hypervisor. You can’t. Plan for cutover windows with delta syncs, not magic. Use rsync/ZFS send to pre‑stage data where feasible.
  • Underestimating Windows work. Rip out VMware Tools cleanly, pre‑install virtio drivers, and validate activation/licensing. Time sync and domain trust issues will ambush you at 2 a.m.
  • Ignoring NSX translations. Distributed firewall rules and micro‑segments won’t port themselves. Either rebuild them upstream (Palo Alto, Fortinet, etc.) or implement host‑based rules and accept some sprawl.
  • Assuming vSAN performance will copy‑paste. Ceph and Longhorn behave differently under random write loads. Run fio under your real patterns, not marketing defaults.
  • Skipping backup/restore rehearsals. A backup you’ve never restored is a story, not a control. Prove full VM and file‑level restores from your new stack before any production cut.
  • Leaving owners out. If app owners learn about migration from a status email, you’ll fail UAT and roll back. Bring them into war rooms early.

What About Kubernetes?

Kubernetes is not a hypervisor. If you’re already container‑heavy, seize the moment to kill a chunk of VM sprawl — but don’t turn your VMware exit into a platform rewrite. Two sane patterns:

  • KubeVirt or Harvester for the subset of VMs that must live near your clusters. Works well for Linux services and certain stateful apps. Not a great home for old Windows line‑of‑business apps.
  • Split brain intentionally: Run Proxmox/OpenStack for VMs, Kubernetes for containers. Federate identity and observability. Do not force everything through one API because it’s elegant.

Security and Compliance: Don’t Rebuild Yesterday’s Risks

  • Short‑lived credentials: Use OIDC/SAML into your virtualization API/console with enforced MFA and session TTLs. Kill the long‑lived admin password culture.
  • Secrets and images: Image pipeline attestation. Sign VM templates. Store secrets in a central vault with per‑project access.
  • Data residency and privacy: If you operate in or serve US states tightening data rules, keep telemetry and backups region‑bound. Treat metadata (especially geolocation and IP data) as regulated — the headlines are moving that way.
  • Host hardening: UEFI Secure Boot, TPM, FDE on host OS disks, and a fast cadence for microcode/kernel updates. Don’t wait for a CVE‑storm to practice rolling reboots.

When a Nearshore Pod Actually Helps

This isn’t about throwing bodies at a migration. It’s about keeping your core engineers focused on product while a specialized team handles the undifferentiated heavy lifting: inventory sanity, host builds, network/storage bring‑up, conversion runbooks, and after‑hours cutovers. In Brazil, you’ll get 6–8 hours of overlap with US time zones and senior Linux talent that’s 20–30% cheaper than US onshore — without the 10–12‑hour gaps you get offshore. Use that overlap for the gnarly stuff: driver issues on Windows Server 2012 R2, Ceph placement group tuning, or OVN ACLs that don’t behave like NSX.

Reality Check on Timelines

We’ve seen 150–300 VM estates move in 6–9 months with a dedicated team and strong app owner engagement. Above 1,000 VMs, plan for 9–18 months in waves. If your leadership expects “end‑of‑quarter” because a cloud vendor dangled credits, push back with a migration calendar tied to app criticality and business seasonality. The penalty for rushing is usually paid in rollback nights and reputational damage, not hero points.

Why Start Now

Two reasons. First, your renewal clock is ticking. Every month you delay narrows your options and hands negotiating leverage to the incumbent. Second, the market is moving. T‑Mobile going public about a massive VMware exit is a signal, not an anomaly. Hardware lead times are still lumpy in 2026. You want your POs in before everybody else wakes up to the same Plan B.

Key Takeaways

  • Pick a lane based on workload mix and ops maturity: KVM HCI (Proxmox/Harvester), OpenStack, Nutanix AHV, or a targeted cloud rehost.
  • Cross‑hypervisor live migration isn’t real. Plan for cutover windows with pre‑staged data and tested runbooks.
  • Budget for people more than software. Expect 1–2 focused FTE for 6–12 months to move a 150–300 VM estate.
  • Rebuild network and storage deliberately: OVN/OVS for overlays, Ceph/ZFS/Longhorn for storage. Don’t assume NSX/vSAN behaviors carry over.
  • Test restores before any production cutover. A backup you haven’t restored is a rumor.
  • Use the exit to modernize security: SSO/MFA to consoles, signed images, encrypted hosts, and short‑lived access.
  • Nearshore pods with 6–8 hours overlap can compress timelines without derailing your product roadmap.
  • Start now to keep negotiating leverage and avoid hardware/supplier crunch as others rush for the exits.

Ready to scale your engineering team?

Tell us about your project and we'll get back to you within 24 hours.

Start a conversation