Home > VMware > VMware ESX date bug fix

VMware ESX date bug fix

August 20th, 2008 Leave a comment Go to comments

I use VMware ESX 3.5 and Virtual Center 2.5 since implementing it at my current workplace. Like many sysadmins I regularly keep on top of patching. I manually patch my ESX hosts when I see fit after pulling updates through Virtual Center. Subsequently, I recently applied the latest update (3.5 u2 or update 2) to all ESX hosts. As a frequent reader of The Register I was later surprised to learn of a bug where people had encountered issues when booting or VMotioning Virtual Machines on or after 12th August 2008.

Post 12th August I could not VMotion and was concerned (as people on the VMware forums were reporting they could not power on Virtual Machines that were off) that in the event of an ESX host failing High Availability (HA) would try to unsuccessfully boot the Virtual Machines on a different host.

I always try to distribute my Virtual Machines equally throughout a HA cluster to ensure minimum disruption to organisation services in the event of a ESX host failure, and to compensate for Virtual Machines that use more of the ESX host(s) resources. It soon became apparent though that it would have been a nightmare to shut down all Virtual Machines on any one ESX host in order to patch it with the update VMware had released (since you cannot patch until all Virtual Machines are VMotioned off of the host or shut down). On the VMware forums it was mentioned that if the date on an ESX host was changed to one prior to the bug taking effect you could successfully VMotion Virtual Machines and power them on.

I inevitably had to test this as it would cause the least impact to the organisation I work for. One concern I initially had was that Virtual Machines have an option of synchronizing their date with the ESX host on which they reside. If a domain controller Virtual Machine were to synchronize its date with an ESX host the time skew between the domain controller and its clients on the network would cause obvious problems. I verified that the default setting for time synchronization in VMware tools on each Virtual Machine is disabled by default:

vmclienttimesync.png

I changed the date back to 10th August 2008 and disabled NTP on a couple of my ESX hosts using the VIClient through Virtual Center:

vmhostdate.png

I was then able to VMotion the Virtual Machines from one ESX host onto the other. I corrected the date(s) and now had one ESX host (with no running Virtual Machines) I could patch using VMware Update Manager. I patched the ESX host and after it rebooted I again changed the date on another upatched ESX host to 10th August 2008 and moved all of its Virtual Machines onto the newly patched ESX host. A combination of changing dates on unpatched ESX hosts and then VMotioning Virtual Machines from them onto patched ESX hosts allowed me to patch all ESX hosts in the cluster. One thing I did notice, after changing the date to 10th August 2008 on an unpatched ESX host I would receive the error message "HA agent on (servername) in cluster (clustername) has an error" after a few minutes. When this error appears you obviously cannot initiate a VMotion but a VMotion that is already in progress remains unaffected.

It seems after some testing that when VMotioning a Virtual Machine from one ESX host to another the Virtual Machine can sometimes synchronise its time with the host on which it is migrating onto. My advice would be to patch ESX hosts without domain controller Virtual Machines running on them and when you do VMotion your domain controller Virtual Machines off of an unpatched ESX host onto an already patched ESX host make sure the ESX host that you are migrating the Virtual Machines onto has the correct date!

As a final test, a colleague took a single ESX host running a single Virtual Machine both with correct dates. He shutdown the Virtual Machine and then changed the date on the host (as though he was going to patch it). After changing the date on the host and rebooting it he then powered on the single Virtual Machine. Even though it had time synchronization disabled in the VMware tools the Virtual Machine automatically changed its date to match the host on which it resided during BIOS post. So if you do have any Virtual Machines on an ESX host that are powered off make sure you change the ESX host date back to the correct date before powering any Virtual Machines on!

All of my ESX hosts are now: 3.5.0 Build 110268.

Categories: VMware Tags: ,
  1. x-dragon
    August 22nd, 2008 at 15:58 | #1

    I saw it just in time and refrained from updating...
    Will plan updating our servers after the new update has been put online

*

code