The upgrade was no better or worse than any other I have previously undertaken. In fact, compared to one nightmare upgrade from vCenter 1.0 to 2.0 it was a walk in the park. What follows is a blow-by-blow account of my experiences – it got quite lengthy. If you want a quick summary – scroll to the end and just read the conclusions)
Since writing this post – I’ve had a thought. What if I’d fully patched the ESXi host first with vCenter4.0 BEFORE upgrading vCenter4.0 to vCenter 4.0 U1. Would that have stopped the disconnect from taking place? I can’t test that approach myself – but I would LOVE to hear from those who can, so I can perhaps find a work around to the disconnects that took place in my upgrade process.
Well, it’s that time already – a new update to VMware’s flagship virtualization platform – vSphere4. I probably would have not upgraded from 4.0 to 4.0 Update 1, if it hadn’t been for the almost simultaneous release of View4 and the eagerly awaited – PCoIP protocol. One of View4 pre-requisites is for vSphere4 U1. Despite this, I started off with a deploy of View4 on vSphere4.0 to see how hard and fast that “pre-requisite” was. In truth I was a bit nervous (perhaps more than normal) about this Update 1 roll-out.
I’m so desperate to try View4 and PCoIP – I thought I would snap the affected VMs (my domain controller(s), SQL2005, vCenter4.0 and SRM are all VMs) and jump in with both feet. Firstly, I uploaded the new ESX 4.0 U1 DVD ISO to Carl’s “Ultimate Deployment Appliance”, and I’m pleased to say the new image was copied and mounted to the UDA without an error. What I’m not too sure about is how I tell the UDA to use the new OS and Flavor. It seems those fields are read only.
I dropped Carl an email – and asked him… After all I shouldn’t have to create new templates and sub-templates when all I’m doing is mounting a different OS/Flavor. I should be able to change that on the fly. In the short term I imagine Carl will be able to tell me where these references are located and just hand modify them. … Anyway, Carl did get back to me – apparently he has a development build that adds a pull-down option for the change of OS/Flavor. In the meantime he told me to edit:
and modify the flavor line to read the new flavor name (in my case esx40U1). True to his word, this worked – and I didn’t even have to restart the UDA web-service. I didn’t test this – because I needed to keep my existing hosts configuration – to check it against the upgrade.
Anyway, I was just about to embark on the upgrade – when a tweet came to my attention about a possible upgrade bug/problem. I decided to hang fire until this was released. It turned out to be quite important:
It turns out that Update Manager – doesn’t know how to handle certain 3rd party agents (hardware agents?) and would cause a PSOD during the upgrade process. I decided to de-install my HP SIM Agents on my ESX “Classic” hosts. Of course I was hoping for better results with an upgrade of my ESXi hosts because they have a built-in SIM which shouldn’t cause a problem.
Whilst I was waiting for the full fall-out over this upgrade bug to come to light – I decided to see if I could get View4.0 running with vSphere4.0 without the Update 1. I found that View4.0 install worked fine with vSphere4.0, so that clarifies the old “support” question. You know when vendors say something is not supported you never really know if that means:
- Don’t even bother it doesn’t work
- Don’t even bother it’s too flakey
- Don’t bother, but some feature you may/or may not use – may or may not work
- Don’t bother, the performance is subpar
- We didn’t QA it against the configuration you are using – so you are on your own
As far as I can tell it’s the last one – you can run View4.0 on vSphere4.0 – but it’s not supported… So, the next question was now that I had my SRM4.0/View4.0 build working on vSphere4.0 – did I have the guts to attempt an upgrade from vSphere4.0 to vSphere4.0 Update1…?
Upgrading vCenter 4.0 to vCenter 4.0 Update 1
I started out with an upgrade of vCenter. Things could have gone badly very quickly. I uploaded the vCenter4.0 U1 .iso to my SAN where I hold my .ISOs but fortunately noticed it was a very small file compared to the vCenter.40 (800MB/1.9GB). Quickly I realized that once again IE had let me down, and I must have lost internet connection during the download – creating a corrupted file. So it was time to download again. So once again the adage of running a MD5SUM check again ALL download media – is a must. After seeing that, I checked the MD5SUM of my ESX4.0 U1 DVD and found it was good.
My main anxiety about the upgrade to vCenter4.0 Update 1, is my recent switch from SQL Authentication to Windows Authentication in vCenter. As you may, or may not know, it’s only recently that VMware have started to support Windows Authentication to an EXTERNAL SQL host. Up until vCenter4.0 if your SQL host was separate from the vCenter (as is the case in most corporate environments) then they only supported SQL Authentication. I was pretty badly burned by this fact in an upgrade from vCenter 1.0 to 2.0. You see, back in vCenter 1.0 I had “frigged” a Windows Authentication configuration – it was dead easy to do, just monkey about with the Service mmc and grant the “right to login as a service”. In fact I recently did just the same thing in an install of SRM4.0 because it CURRENTLY does NOT support Windows Authentication. Ever so curious, I’m always interested in breaking ”support” because I work in a lab environment and I can, but also to learn what’s possible, and more importantly impossible. Anyway, this upgrade from vCenter 1.0 to 2.0 went pear-shaped because I’d forgotten my frigged configuration done back – I had to frig the SQL configuration back to SQL Authentication. Not difficult, but it did take me nearly a week to find out what was going on. It was my own fault. The moral of the story is that frig configurations can/will come back and bite you sometime/someday. Go on – square that with my previous statement about View4.0 on vSphere4.0. If it’s not supported, it’s not supported!
Prior to starting the upgrade – I thought I would run VUM on the existing vSphere4.0 build to see if there were any patches, and to see if the ESX hosts could be “staged”. Staging is where patches are downloaded from the web, and downloaded to the ESX host – but not installed. I heard a rumor via a friend at EMC that ESXi hosts need a patch before they can be upgraded from 4.0 to 4.0 U1. How ironic. A patch before the patching process. I found whilst the ESX ‘Classic’ host staged without an error the ESXi host returned a problem. In truth I’ve always had problems with VUM and ESXi. In most cases I’ve had to resort to the “Host Update Utility” that gets installed with the vSphere Client as PlanB, if Plan A doesn’t work. In the end the cause of this staging error – was one I had seen countless times before (and I always bloody well forget!) – that you cannot remediate or stage an ESX host with vCenter running as a VM on the same ESX host. You must manually VMotion it somewhere else before starting the stage process. The graphic below shows the error:
I have my DRS Cluster set to be Fully Automated. So yes, you guessed it – after moving the vCenter to my other ESX hosts (it’s a two node cluster BTW), DRS promptly tried to put it back on the ESX host I was trying to stage up. So in the end, I was forced to use VM rule setting the VM to be “manual” so I could then keep the VM off the box. I’m a big fan of virtualizing vCenter – and this lack of integration/automation – doesn’t add weight to my argument – when I am trying to convince folks that a physical vCenter is a BAD IDEA…
Anyway, putting this aside – I attached the vCenter4.0 U1 DVD to my virtual vCenter – and cranked up the auto-run. First thing I noticed was a utility on the splash screen called the “Agent Pre-Upgrade Check”. I thought it was perhaps a smart move to let this baby run through and see if it flagged any issues up – before I jumped in with both feet!
This spins up and asks for your credentials for your vCenter. I had logged into windows using the DB Account configured for Windows Authentication.
After logging you in it will then scan either all the hosts (or just the ones you select – standard or custom mode). Then it runs a ‘pre-check” – at the end of the process this then leaves you with a report about if your ESX hosts pass or fail the pre-check.
I was a bit disappointed that there were no warnings or errors (I’m a masochistic) that contacted folks via twitter asking for the errors. One of my followers gave me his errors:
I find these reports very interesting – it says the host needs to be pre-patched before install U1. I’m totally convinced my ESX needed pre-patching too (as you will see later) but I didn’t get the same warning. As usual, tools that tell you if you upgrade everything will-be-all-right sometimes just lie to you!
So… I click that I wanted to Install vCenter – and the first message I got was this would actually perform an upgrade. After nexting my way through the usual suspects (eula etc) I got to the DB part of the upgrade. The account I’d logged into (vcdbuser-nyc) had rights to just one DSN, so there was no choice there.
I stopped briefly at this dialog box – remembering the installation I’d done a couple of months ago:
As I AM using Windows Authentication (not sure why various dialog boxes in many vendors’ documentations refer to NT Authentication). I could safely click next. It feels a bit weird just clicking next, if you are a habitual filler-in of dialog boxes (especially when they ask for usernames and passwords). The next message I received was one of those worrying warning boxes, which actually you shouldn’t worry about. It’s just a warning that the current install of VUM4.0 would be incompatible with vCenter4.0 U1, until I upgrade it. In this deployment – my vCenter/VUM are the SAME VM. I wouldn’t recommend that in the ‘real’ world. Normally, I do keep VUM separate and in separate databases (which I have done…)… Anyway, however you run your vCenter/VUM I think you get the same warning message…
Now, despite the fact that earlier in the process you were told the install would be an upgrade – you do get given the opportunity to ‘re-initialize’ (i.e totally destroy) your existing vCenter DB. Fortunately, the dialog box is VERY clear, and defaults are CORRECT. If you have been upgrading vCenter for sometime you might know this particular part of the upgrade was always a bit notorious for quite opaque phraseology in the dialog box, and the wrong default. There are no excuses now if you choose the wrong option. Although quite why an upgrade needs to give you this functionality is a bit debatable.
After this I was asked to re-confirm my credentials for the Service Account that runs the vCenter Service. I was surprised to see this – as these privileges have already been assigned… I was given the opportunity to change the TCP Port numbers and then a big fat ‘Install’ button!
In short if I hadn’t been writing this blog post at the time of doing the upgrade it would have been a next-next-next (type in a password for the Service Account) and install exercise… If you know the install of vCenter well – you know there’s quite some time staring at status bars. In this time I decided to do some all-important filling. Of course, rather stupidly, I was doing this upgrade – whilst connected to the very vCenter I was upgrading. So mid-way through my Remote Console session hung, and the vSphere Client closed. It was time to crank up the vSphere Client on the host – to keep an eye on the upgrade.
The upgrade went pretty smoothly. Except for no apparent reason the ESXi host became disconnected. I had to right-click the ESXi host in vCenter and choose “Connect” this appearred to trigger a re-install (ED. surely upgrade?) of the vCenter Agent. The ESX ‘Classic’ host remained connected to the vCenter system. This attempt to re-install the agent then failed. So I was left with an ESX host with running VMs on it – that I couldn’t connect to or upgrade. This first install failed, and then produced the “Add ESX host” wizard which you normally see when connecting an ESX host to vCenter for the first time. I decided to proceed with this wizard to see what happened.
So in the screen grab above. You can see the first Reconnect, which failed, followed by the second Reconnect when I was prompted for root credentials. It was at this stage – that I decided I would crank up the ILO on esx2.corp.com – and see if restarting the management network or management agents would allow me to reconnect to the ESX host. This restart didn’t seem to help. Still I could manage the ESX host directly without vCenter, so I decided to press on and do an upgrade of VMware Update Manager (VUM). In the back of my mind I was thinking I would kill the ESX host – thus forcing a HA event – which would at least get my VMs on to my other ESX host which was functioning. Those affected VMs were my vCenter and my domain controller. Nice! I don’t recall the Upgrade Checker telling me I would have this problem. This was, as you might expect, far from ideal. Fortunately, there were only two VMs on the box, and it was a lab environment. But I sat there thinking nasty unpleasant things about ESXi for some time.
Anyway, I wasn’t panicking – I figured I could do an upgrade of the ESXi host through the Host Update Utility – which has served me well in other upgrades surrounding ESXi…
With the VUM upgrade I got a similar message that the install would in fact upgrade VUM. I had to re-supply my password for vCenter which was painless – and do the same ‘next’ routine when asked about ‘NT Authentication’ as I had done in the vCenter upgrade. I confirmed I did indeed want to upgrade my VUM DB, and that I had taken a backup. Interestingly, this dialog box differs in look and feel from the similar dialog box for the vCenter DB:
Anyway, the VUM upgrade went through without a blip. I thought it was perhaps time to reload the vSphere Client – and get a client upgrade running and probably a VUM plug-in update as well. I was right, the first reload of the client triggered the upgrade of the vSphere Client:
I’ve had bad experiences with this update wizard in the past. Whatever you do (Run or Save) never ever CANCEL a client upgrade once it has started – you will probably find you will never be able to install the client again (or any other VMware product on that Windows instance). I chose to run the installer… There were a couple of somewhat perfunctory messages – like one saying it needed to close the client (which had never fully loaded anyway because of the upgrade message) before it could run the installer – until I finally got the upgrade message and the main installer executed. During this time I made sure I also updated the “Host Update Utility” I had a feeling I wouldn’t get very fair with disconnected ESXi host in vCenter – and trying to upgrade it with VUM.
After the upgrade of the client – I was asked to do a reboot. That wasn’t a problem. My client is actually View4 Windows XP desktop. All my kit is in a remote colocation facility. I have backup TS and Citrix PS box in case it is unavailable. I’ve been toying with the idea of giving my co-location facility “remote hand” into my vCenter to reboot VMs if they become unavailable.
The first load of the client created this cryptic error:
I figured this unhelpful error message – probably had to do with plug-ins which had perhaps become invalid since the upgrade of vCenter. Alongside the usual plug-ins, such as VUM, I have NetApp’s RCU and Virtual Console – and also EMC Storage Views installed as well. I am wondering if these might need an upgrade as well. The dialog box opened a number of times, and kept on acknowledging it – and fortunately the client did open. I must admit I was a bit worried for second. Once the client had loaded, my next job was to update the VUM Plug-in from the plug-ins manager. I was hoping my EMC/NetApp plug-ins weren’t the culprits – and that the VUM plug-in upgrade would make this nasty message disappear.
Unfortunately, this wasn’t the case. My initial hunch was correct: the cryptic error messages were to do with these 3rd party plug-ins. Disabling the EMC/NetApp plug-ins made this error message go away. I don’t need/use the plug-ins all the time – so I just disabled them in the plug-in manager. I will be contacting both companies to get updated plug-ins once I have some free time. I did later return to the issue. It seems that the problem plug-in was the EMC Storage View. I’ve emailed my contacts from EMC to request an update. I’m not an official EMC customer and my PowerLinks login doesn’t allow me to download software/licenses willy-nilly, which is fair enough.
My much bigger worry was that my SRM Service had stopped during the upgrade – which is kind of to be expected because of the loss of the vCenter Service during the upgrade. The bigger worry was it wouldn’t start again. To be totally honest, it wasn’t the best of builds. I had set it up using Windows Authentication with a frigged configuration – and it took a while to get it working again. But in the back of my mind – I had a lurking doubt that it would need to be re-installed or re-built. If you remember at the beginning I was tempted not to do the upgrade because I was afraid that might happen. Anyway, for the moment that was the least of my woes. I need to get the disconnected ESXi host up and running – AND also upgrade the Recovery Site – before I cross that bridge. It was this moment that I thought perhaps I should have checked the release notes to see if SRM4.0 was supported on vCenter 4.0 Update 1. Now, that would have been REALLY FUNNY if it wasn’t. There is a compatible matrix for Update 1 – but it doesn’t mention SRM. Low and behold I found that on the SRM page – that the compatibility matrix for SRM 4.0 has been updated to state the Update 1 is supported. Phew!
So where was I at this stage? vCenter4 and VUM4 had been successfully upgraded – and so had my client. The only problem was my disconnected ESXi host. I’d been avoiding the situation by focusing on the client upgrade piece (because I had to) but now it was staring me in the face. So what to do….? Here was my plan. I would bounce the ESXi host, and hope that it would cause an HA event – transferring my VMs to the ESX host that was up (incidentally, despite it being disconnected – split-brain had not been triggered – which in a way wasn’t a bad thing). I was hoping the reboot would cause the ESXi host to rejoin vCenter – and then I could upgrade it and my other hosts with VUM. If that didn’t work, I would try the ‘Host Update Utility’ – and if that didn’t work – I would remove it from vCenter, do a factory reset – and then upgrade it – probably with the ‘Host Update Utility.’ The bounce would cause my domain controller and vCenter to be transferred – so I closed down my vSphere Client and opened it on the ESX Hosts – the first thing I would lose would be my vSphere Client connection. Unfortunately, this idea failed at the first hurdle. The reboot of the ESX host did not trigger HA (as it would normally), and my valuable VMs were ‘stuck’ on an ESXi host. I never did find out why HA didn’t trigger properly. I want to be generous – and blame myself for not validating HA functionality or some daft setting – rather than blame VMware.
However, I could manage the ESXi host directly – so my next step was to bring my domain controller and vCenter up again – and then PRAY that the reboot of the ESXi had once again made it manageable by vCenter… It wasn’t. So I tried (for the upteenth time) to reconnect to it using vCenter. The ESXi host did reconnect to the vCenter.
Upgrading ESX Hosts
Well, by now it was 5pm. I’d been working on the setup up of View4 and then the upgrade to Update 1 all day. But I was ready to start the ESX host upgrade. I have my DRS fully-automated but I prefer to maintenance mode manually – simply because I’ve seen maintenance mode go to 2% and hang so much – I’d rather make sure I have evacuated the host of VMs first before doing anything else.
For good measure (paranoia?) I did another scan and stage before hitting the remediate button. Before I did that I made a note of my build number:
It was: VMware ESX, 4.0.0, 175625
Why? Well, one of the things that irks me about VMware’s move to U1, U2, U3 and U4 numbering is the clients still give you build numbers, so you can’t quickly tell which update you’re running. So I want to compare the build number with the client to check to see if the upgrade has worked, and compare it to the build numbers on the VMware website.
Anyway, this pre-staging which had worked before on esx1.corp.com – failed when I tried it again with vSphere 4.0 Update 1.
Despite there being no new patches. I wondered if this was my own silly fault for trying to stage what had already been staged. There were no VMs on this host, so I chose to ignore these errors and whack the remediate button.
So, it’s the next morning when I discover that the remediate had failed. So I tried again and this time it worked. There is no reason for it that I can fathom. My build number had changed to:
VMware ESX, 4.0.0, 208167.
As you can see from the screen grab above my ESXi host (esx2.corp.com) is already in maintenance mode – and waiting to be patched. Yes, that’s right ESX ‘Classic’ and ESXi in the SAME cluster. Not a good idea, and certainly not something I would recommend. BUT, currently I have only 4 ESX hosts in two different vCenter environments – and simply don’t have enough boxes to validate the stuff I do, and also run two different flavors of ESX…
Despite being essentially the same ‘vmkernel’, ESXi does have different build numbers from its bigger and older brother. So my ESXi was on build number 193498 after the VUM had done its work – this build number changed to VMware ESXi, 4.0.0, 208167. I must say – I felt I had to press that remediate button more than once to get it to work. Once again, I wonder if the pre-staging I did on this ESX host was the source of the problem. To tell you the truth I wasn’t watching the ESXi upgrade too closely, as I was engrossed in a DB error I had on my other site’s vCenter. When it came to upgrading my second site I watched this process MUCH more closely (via the ILO etc)
So here’s where I am now: my vCenter environment for “New York” (my protected site for SRM) was successfully upgraded to vSphere 4.0 U1. I still had an outstanding issue with the SRM service failing to start. But more importantly – I had the “New Jersey” location (my recovery site for SRM) to upgrade – and do have everything I have annotated here to repeat there. So whilst esx2.corp.com was being upgraded, I began the process there. I would have a second attempt to do it again – see if the errors I had seen repeated themselves – and see if I could make it a smoother experience because of what I had learned in New York.
Now the process was relatively the same – but I decided to do something slightly different. Firstly, I put my entire “infrastructure” of VMs on the ESX ‘Classic’ hosts – that was quite tricky because I was very short of RAM on that cluster. My reason for doing this was that should ESXi disconnect I would still have control over my infrastructure. In fact I was so low on memory – I had to triage my VMs and power off VMs which I did not need during the upgrade.
I also stopped the SRM Service in New Jersey – as there was no point in running it whilst vCenter was upgraded. I also stopped the EMC/NetApp plug-ins that had caused me to skip a heartbeat on the vSphere Client first-login. Finally, I didn’t pre-stage the ESX hosts with the VUM 4.0 but decided to leave them alone – and stage/patch them with VUM 4.0 U1 instead. I was hoping that this might reduce some of the staging and remediation errors I’d seen with New York. Finally, I engaged the brain – and connected to my vCenter via the ESX host or RDP – rather than vCenter. It was rather silly of me to patch the very system I was connected to.
Anyway, how did the second upgrade go? Well, I was hoping to see the same experience or less. However, that didn’t happen. During the upgrade of vCenter I received a DB permissions error which I had not seen previously.
What is odd about this is the “created by another user” bit. That can’t be right. I specifically did the install using Windows Authentication – using an account called vcdbuser-nj. So my first instinct was this wasn’t a permissions issue. It just can’t be. Can it? The other thing was the reference to the job name “does not exist’. It made me wonder if the original vCenter install hadn’t created this job, and I couldn’t have rights to something that didn’t exist. It was time to dig around in Microsoft SQL 2005 to see if this job actually existed or not. I’m no SQL 2005 guru by any stretch of the imagination – but I have half a brain so I thought I would check it out. The job was there, so I thought I would check its permissions or “ownership”. I quickly discovered the ownership was different on the job that did the ‘Past Day’ roll-up from any of the other jobs. I’m not sure how that happened. There’s no other admin but me, and I don’t change permissions on a system created object like this – I don’t know enough to make arbitrary changes. The two screen grabs below show the differences:
So the logical thing to do was to change the owner and try again with the upgrade. This was successful. What’s more important to know is– if I didn’t change this setting – does that infer a possible fault in the original vCenter4.0 installation? Is that fault mine or the installers? Anyway, according to my sources in VMware there is a KB article that describes this issue – I have been unsuccessful in locating it (although to be honest I didn’t try very hard)
Anyways, after the upgrade to vCenter 4.0 U1, once again I saw the ESXi host had disconnected. So I’m regarding this as an (un)known issue – as I have seen it happen twice. I found a reboot and reconnect worked, which (re)installed the vCenter Agent. I’m convinced there has to be a better way of doing this – I’m worried how ESXi only customers might react to the upgrade of vCenter disconnecting their hosts in this way…
The interesting thing about the remediate process this time was this. On first scan/remediate vCenter said there was some 40 odd patches for ESXI. That’s quite high I thought. The remediate went to 33% and then failed due to timeout. The error was VERY similar to the previous errors I had seen. For good measure I tried the remediate again, this time the number of patches to be applied – had gone to just two. I did another scan and stage – and checked what these two patches were. This time the patches were purely for the U1 release. It seems as if the ESX 4.0.0 host was patched, and then once that initial pre-patch had been successful(ish) it was then ready to apply the update. A little of “if at first you don’t succeed – try, try and try again!”
Every time I tried to stage or apply this U1 update – it failed. So I decided to reboot the ESXi host (because it looked like it had been patched) to give the system a chance to apply the patches that had been applied. So after the reboot – I carried out the remediate task again – and this time it was successful. It would be nice to hear from folks who can reproduce the same experience.
So where was I now? Well, I now had two vSphere 4.0 U1 environments upgraded… two emails (one to NetApp and the other to EMC) asking if they had updated plug-ins for vSphere4.0 U1 – and two SRM Servers that needed repairing. The repair of the SRM was very easy. Just open up Add/Remove Programs – and click the ‘Change’ button next to VMware Site Recovery Manager – then select the Repair option
Then I re-supplied the username/password credentials that allow SRM to talk to vCenter. I had to confirm the SHA-1 thumbprint and select which certificate model I was using – that was an easy task of selecting “Use Existing Certificate”. Finally, I had to provide my DB credentials for SRM to connect the SQL backend. As with vCenter, I had the opportunity to indicate that I just wanted to use my existing database, rather than “re-initializing” it. After the repair I was able to get the SRM plug-in to connect to the SRM Service. I had a couple of errors – but I think these were caused by the Recovery Site SRM Service not yet being repaired.
VMware Tools Update:
You know what, I almost completely forgot about this issue! But I was quickly reminded during the SRM fix that VMware Tools was out of date. In the past VMware Tools upgrades, have completely shafted the VMs static IP settings. Sometimes this causes a new “Local Area Connection” to be created, and subsequent loss of IP details. So the first thing I did was take a non-essential VM and do an ipconfig /all > ipdetails.txt to the desktop of the VM so I would have the complete record beforehand. Then I did a manual VMware Tools update to it to see what happened. I have a feeling that different guest operating systems will react differently to the VMware Tools update. So let me qualify my experiences by saying the only guest operating systems I run regularly inside my vSphere lab are Windows 2003 (32-Bit) Enterprise R2 and Windows XP Professional SP3. The Windows 2003 builds all use the latest vmxnet3 driver, which was released in vSphere4. I did a successful upgrade of VMware Tools on Windows 2003 and I did not lose my IP settings.
I used a different strategy for upgrading my VMware Tools. I have some VMs which are just play VMs, and I really don’t care what order they come and down in. With these I used a VUM Baseline designed for upgrading VMware Tools. It was the same base line I used in my upgrade from Vi3 to Vs4. There are some other VMs I care a little bit more about but I knew I could tolerate their reboot without being disconnected from my co-location. For these, I used the right-click option to automatically upgrade VMware Tools. This left a small portion of VMs where I wanted to strictly control the reboot process, these included:
- The SQL DB’s for the New York and New Jersey Site
- The vCenter’s for the New York and New Jersey Site
Well, I’m at the end of the upgrade process – and I thought it would be interesting/useful to draw together some caveats from the experience.
- Beware of the 3rd Party Agents problem: http://kb.vmware.com/kb/1016070
- Always connect to virtual vCenter by using the ESX Host – unless, like the idiot I am, you enjoy being disconnected in the middle of an upgrade!!!
- Beware of ESXi, upgrades of vCenter can cause disconnects
- Prior to upgrades – you might as well stop dependent services such as Site Recovery Manager
- Prior to upgrade – you might as well disable other vendors plug-ins
- Prior to remediating an ESX host – confirm the virtual vCenter is not running the host to be upgraded and that it has been moved to another ESX host
- If staging/remediation fails – try, try and try again.
- Watch out for mangled permissions on the vCenter DB. Specifically, confirm that the Jobs are owned by the account you are doing the upgrade with…