QEMU suspend-to-disk and live-migration issues with UCS-4.2

Known issues when updating to UCS 4.2 with UVMM

UCS-4.2 ships with the newer QEMU version 2.8, which is in parts incompatible with the QEMU version 1.1 previously used until UCS-4.1. Symptoms include

  • guest operating systems freeze and get stuck in a busy-loop using 100% virtual CPU time
  • virtual machine reboots
  • virtual machine does not reboot after shutdown request

This can happen after

  • restoring the virtual machine from the suspend-to-disk state
  • restoring the virtual machine from a snapshot taken from a running system
  • live-migration from a UCS-4.1 (or older) system

This article will outline how to mitigate some of these issues.

Mitigation

Servers running the ‘KVM virtualization server’ app can simply stay on UCS 4.1 if the update of the KVM server is currently not possible. All features will remain functional when controlling virtual machies via UVMM. The server will have to be updated once the maintenance period for UCS 4.1 ends.

If the UCS servers running the ‘KVM virtualization server’ app have to be updated to UCS 4.2, the issues can be mitigated by

  • shutting down all virtual machines cleanly before performing the update
  • creating and using snapshots only of cleanly shut down virtual machines
  • migrating only shut down virtual machines between hosts using different UCS releases

All of these options must be performed before a virtual machine is migrated or the host system is updated. If this is not possible the virtual machine might crash, which can lead to data loss (only the state of the RAM, not the data on disk). After such a runtime crash the guest operating system usually performs a file system check to recover consistency. For snapshots this process can be repeated multiple times, but not for live migration and suspended machines, as the runtime state there is only available once.

How can I find virtual machines suspended to disk?

You can run the command virsh list --all --with-managed-save.
As an alternative you can directly look at the file systeme level: for each VM there exists a file “/var/lib/libvirt/qemu/save/$VM.save”.

How can I remove the managed save state of a virtual machine suspended to disk, if I prefer to start the VM in a crashed state instead of not starting it at all?

virsh managedsave-remove $VM

How can I reset a virtual machine if it is not responding after restoring the running state?

Try virsh restart $VM to trigger a reboot. This will re-use the running QEMU process.
If that does not work, stop it with virsh destroy $VM and start it with virsh start $VM again. This will create a new QEMU process, which reloads all files from disk.

Mastodon