With today’s exciting release of VMware vSphere 6.5, I thought I’d celebrate by upgrading the vCenter Server Appliance (VCSA) in my lab from version 6.0 u2 to version 6.5.  Since I’m doing this more or less blind, without having read any documentation whatsoever, I thought I would write a (hopefully short) post about the problems that arise during the upgrade process, and how to get past them.

Before we get started, I should point out that I’m running all this stuff from a Windows 10 workstation, and my lab VCSA is an extremely compact deployment consisting of a VCSA with an integrated PSC, and a separate VUM instance running on Windows Server 2012 R2.  The VCSA 6.5 contains an integrated VUM component, which I’m looking forward to seeing.

The VCSA 6.5 upgrade is a 2 stage process.  The first stage involves creating the new VCSA VM and getting it on the network.  The second stage involves doing a data migration from the old VCSA to the new one.  By the end of this post, a new VCSA 6.5 appliance with an integrated PSC and VUM should have replaced my original VCSA, and the original VCSA will be shut down and ready to delete.  If your topology is more complex than mine (and if you’re looking to perform this upgrade outside of a small lab environment that’s highly probable), your upgrade steps and outcomes may vary from mine.

Starting the upgrade

Assuming you’ve downloaded the new VCSA 6.5 iso image from the vmware website, mount the iso and run the installer.exe located on the mounted iso at \vcsa-ui-installer\win32\.  This will open the snazzy new VCSA 6.5 installer, which no longer requires any browser plugins.  The Upgrade option is what I’m looking for today.

vsphere-6-5-upgrading-your-vcsa-img1

Note: during the upgrade process, it’s a good idea to ensure DRS is not set to fully automated.  This will ensure the source or destination VCSA VMs will not be migrated around the cluster by DRS during the upgrade process.

Hurdle 1 – backing up the VCSA database

The first step of the upgrade wizard advises us to back up all data on your appliance before you start the process.  Luckily, backing up the data on a VCSA is extremely easy, and is documented in KB2091961.  Note that the process as described in this KB article is only supported for restoring the VCSA’s integrated configuration database on the same appliance as the backup was taken from.  Trying to restore the database to another VCSA probably won’t work.

vsphere-6-5-upgrading-your-vcsa-img2

As per the documented steps:

  1. Download 091961_linux_backup_restore.zip from the bottom of the KB article and extract the backup_lin.py script to /tmp on your VCSA using WinSCP.
  2. Login to your VCSA via SSH.
  3. Run the following command to make the backup_lin.py script executable: chmod 700 /tmp/backup_lin.py.
  4. Run the backup using the following command: python /tmp/backup_lin.py -f /tmp/backupVCDB.bak.  You will be notified when the backup completes sucessfully.

vsphere-6-5-upgrading-your-vcsa-img3

Job done.  The backupVCDB.bak file will be stored in the /tmp directory, and because I’m running in a small lab environment mine was only 8MB in size.  Hopefully we won’t have to use it…

Hurdle 2 – the migration assistant

At stage 3 of the migration process, after entering in the correct details for my vCenter appliance and my ESXi host, I was greeted with the following error message: Unable to retrieve the migration assistant extension on source vCenter Server.  Make sure the migration assistant is running on the VUM server.

vsphere-6-5-upgrading-your-vcsa-img4

I looked on the mounted iso and noticed a folder called \migration-assistant which contained an .exe.  Looks promising.  Since my VUM server was running within my vSphere cluster, I mounted the iso via the VMware Remote Console, browsed to that directory, and ran VMware-Migration-Assistant.exe.  That will launch a script, which will ask you to supply the password for the windows account you’re logged in with.  That windows account must have permissions in vCenter – I’m not sure how much permission, but my account is an administrator in vCenter.

The migration assistant will then run through some prechecks, and then give you a brief summary of what is about to happen and tell you to leave the window open until the upgrade has finished.

vsphere-6-5-upgrading-your-vcsa-img5

You will then be able to return to the vCenter Server Appliance Installer and click “next” to continue the process.  Onwards!

Hurdle 3 – deployment size

I’ll immediately admit that this isn’t a hurdle so much as a decision… and a bit of an odd one.  The VCSA installer has always allowed us to choose a deployment size, which simply configures the VCSA with a certain amount of CPU and Memory resources to support a specified number of hosts and/or VMs.  Step 6 of the 6.5 installer now also allows us to select a storage size, which according to the description will simply allocate more permission to the SEAT (stats, events, alarms, tasks) data storage partition.  The reason I said this is a bit of an odd decision is because when you set the storage size to “default”, the smallest deployment size is “small”:

vsphere-6-5-upgrading-your-vcsa-img6

But when you set the storage size to “large”, it will allow you to specify an even smaller deployment size of “tiny”, albeit one which requires nearly 3x more storage than the Small/Default deployment:

vsphere-6-5-upgrading-your-vcsa-img7

So: a Tiny deployment size, which supports less hosts and VMs, and requires less CPU and RAM, needs nearly 3x more storage than a Small/Default deployment.  As I said, it seems odd.

Hurdle 4 – network configuration

If you get to step 8 of the wizard and you notice the network drop down is empty, it’s probably because non-ephemeral dvportgroups aren’t supported.  This has always been the case, but it’s easy to forget.  Simply create a new dvportgroup on your chosen dvswitch with a port binding method of Ephemeral – no binding, configure the relevant port settings for your network subnet, then return to the VCSA installer and skip back to step 7 before progressing forward again to step 8.  Don’t forget to migrate your new VCSA to the correct dvportgroup and delete the one that was just created.

Hurdle 5 – VMware Enhanced Authentication Plugin doesn’t work

After the upgrade had completed and I was greeted with the login page for the vSphere Web Client, I was prompted to install the VMware Enhanced Authentication Plugin.  This plugin allows you to use the “Use Windows session authentication” option to login to the web client, and is the updated version of the old vSphere 6.0 Client Integration Plugin.  The problem was that after installing the plugin and restarting my browser, I was still getting prompted to install the plugin after reopening.  This was occuring in both Chrome and Internet Explorer, so I knew it wasn’t browser-specific.  Restarting my workstation didn’t help.  What did help was uninstalling the Enhanced Authentication Plugin, the VMware Plug-in Service, and the old Client Integration Plugin.  I then reinstalled the Ehnanced Authentication Plugin, and upon relaunching the vSphere Web Client I was given the correct prompts to allow the plugin to launch.  Problem solved!

Conclusion

After ploughing through those issues, I’m happy to report that I’ve got a happy and functional VCSA 6.5 appliance.  Hopefully this post helps a few people out there with some niggling “early-adopter” issues when deploying their appliance.  There are plenty of really awesome features in vSphere 6.5, and I can’t wait to expand on a few of these features in upcoming posts.

This is an (extremely) quick post to cover the steps required to decommission a Platform Services Controler (PSC ) or vCenter Server from the vSphere single-sign on (SSO) domain.  The steps below are for a VCSA; steps for a Windows VC are very similar, and are contained in the VMware KB article I used as a reference for writing this post: KB 2106736.

Decommission a PSC

    1. Ensure no vCenter server instances are using the PSC that is to be decommissioned.  Instructions on how to query which PSC a vCenter instance is pointing to and subsequently repoint it are listed in my post here.
    2. Shut down the PSC.
    3. Connect to another PSC in the same SSO domain, either by SSH or using the console.  Enter the shell.
    4. Run the following cmsso-util command:
    5. Remove the decommissioned PSC from the vSphere inventory.

    Once these steps have been completed you can verify via the vSphere Web Client that the PSC has been decommissioned successfully by navigating to Administration > System Configuration > Nodes and ensuring that the decommissioned PSC is not present in the list of nodes.

    Decommission a vCenter Server (VCSA)

    1. Query the to-be-decommissioned vCenter server to identify the PSC it’s pointing to.  Instructions on how to query the PSC vCenter is pointing to are listed in my post here.
    2. Connect to the PSC the VCSA is pointing to, either by SSH or using the console.  Enter the shell.
    3. Run the following cmsso-util command:
    4. Power off the VCSA and remove it from the inventory.

    If you have multiple vCenter instances in a single SSO domain and you have just decommissioned one (or more), you may need to log out and log back into the vSphere Web Client before the decommissioned instance(s) disappear from the vSphere inventory tree.

Here’s a quick guide on how to query and change the Platform Services Controller (PSC) being used by vCenter.  Querying for the in-use PSC is possible on vCenter 6.0, but changing the PSC is only possible on 6.0 Update 1 or newer.  Note that I performed these steps on the vCenter Server Appliance (VCSA), and while I have also included some commands for a Windows-based vCenter server, I haven’t tested them myself.

Query the PSC being used by vCenter Server

There are two ways to identify this information: via the appliance console or SSH session, and via vSphere Web Client.

Option 1: via the appliance console or SSH session
On a VCSA

On a Windows vCenter

Option 2: via the vSphere Web Client

In the vSphere Web Client, navigate to the server’s Advanced Settings (vCenter > Manage > Settings > Advanced settings) there is a property called “config.vpxd.sso.admin.uri“.  The value of this property is the PSC that vCenter is currently using.

quick-post-query-and-change-the-platform-services-controller-being-used-by-vcenter-server-6-0-imga

 

Change (repoint) the PSC being used by vCenter Server

This step is a bit more detailed, as it depends on whether you are changing/repointing between PSC’s in a single SSO site, between SSO sites, or moving from an embedded PSC to an external PSC, and also whether you are using a VCSA or a Windows-based vCenter Server.  For this reason I’ll link directly to the VMware documentation for each scenario.

Option 1: Repointing within a site (KB 2113917).

    1. Connect to the VCSA console, or via SSH.
    2. Enable the shell (if necessary) and enter it.
    3. Run the vmafd-cli command.

  1. Restart the PSC services.

Option 2: Repointing between sites (KB 2131191).

Review the KB article for a full set of steps.

Option 3:  Repointing from an embedded PSC to an External PSC

See Reconfigure vCenter Server with Embedded Platform Services Controller to vCenter Server with External Platform Services Controller in the vSphere 6.0 documentation center.

As VMware continues to push in the direction of unix-based appliances for their vSphere management components, those without a Unix background (like myself) are having to come to grips with the Unix versions of common administrative tasks. Increasing the disk size on a vCenter Server appliance (VCSA) is one such task.  In vCenter 6.0 VMware has introduced Logical Volume Management (LVM) which really simplifies the process of increasing the size of a disk, and allows it to be done while the appliance is online.  VMware KB 2126276 covers all the steps required to increase the size of a disk, but this guide will cover it in slightly more detail.

Step 1: identify which disk (if any) has a problem with free space.

To do this, I connect to the appliance via SSH or the console, enable and enter the shell, and use the df -h command.
For more information on using command line tools for working with disk space can be found in my post Useful Unix commands for managing disk space on VMware appliances.

increasing-the-disk-size-on-a-vcenter-server-appliance-in-vsphere-6-0-a

I can see that both /storage/core and /storage/log are 100% used.  I’m guessing that /storage/core is full with vpxd crashdumps that are being generated because vCenter is crashing after being unable to generate logs on /storage/log.  Based on this guess, I’ll increase the size of /storage/log and then manually delete the crashdumps on /storage/core and monitor the situation.  I won’t cover the steps involved in deleting the vpxd crashdumps in this post, but it basically involves deleting core.vpxd.* and *.tgz files in the /storage/core directory.

Step 2: Increase the size of the affected disk using the vSphere Web Client

Looking at the table in VMware KB 2126276, it tells me that the disk mounted to /storage/log is VMDK5.  The way this is presented is a bit confusing in my opinion, because the disk we’re looking for is listed as hard disk 5 in the web client, but the filename of the disk is vmname_4.vmdk (the numbering of virtual disks is thrown out in this way because hard disk 1 is vmname.vmdk, and hard disk 2 is vmname_1.vmdk).  Where the KB article says “VMDK5”, it really just means “the fifth VMDK file”.

The reason my /storage/logs disk filled up is because I’ve increased the logging levels on my vCenter appliance to try to catch an issue that had been occuring.  Because of the increased amount of logs being generated, I’m going to increase the size of this VMDK to 25GB.  I don’t want to go overboard because the disks are thick provisioned by default.

increasing-the-disk-size-on-a-vcenter-server-appliance-in-vsphere-6-0-b

Step 3: expand the logical drive and confirm that it has grown successfully

Return to the SSH session and expand the logical drive(s) that have been resized.  The following command will expand any disks that have had their vmdk files resized.

If the operation is successful, you should see a message similar to the following.

In my case, I did get that message eventually,  but I also got a bunch of the following errors:

The reason I saw that error is because my /storage/core disk is 100% used.  As mentioned I’m going to free up space on that drive manually, so I’ll ignore that error for now.

If I run df -h again, I can see that /storage/log is now 25GB in total size.  Job done!

increasing-the-disk-size-on-a-vcenter-server-appliance-in-vsphere-6-0-c

Note: In the vCenter 5.x appliance, increasing disk sizes was a bit of a pain. The operation had to be performed while vCenter was offline, and involved adding a brand new disk, copying files from the old disk to the new one, and editing mount points.  For anyone who is working with a vCenter 5.x appliance, the steps are in KB 2056764.

If you’ve set up a vCenter 6.0 appliance or a Platform Services Controller and tried to connect via WinSCP, you will have noticed the following error:

Host is not communicating for more than 15 seconds.  Still waiting…

resolving-the-host-is-not-communicating-for-more-than-15-seconds-error-when-connecting-to-a-vsphere-6-0-appliance-with-winscp-a

This error arises because vSphere 6.0 appliances now come with two shells: the appliance shell (which is the default shell for the root user), and the BASH shell.  WinSCP throws the above error when the root user is configured to use the appliance shell.  The error is easily resolved by configuring the root user to use the BASH shell.

To do so, connect to the appliance via SSH or the console and enter the following commands:

Problem solved!

After you’ve done what you need to do with WinSCP you can change the default shell back easily enough using the command below, however I’m not aware of any downside for leaving it set to the BASH shell (and the upside is you won’t need to manually change the shell every time you want to connect with WinSCP).

VMware published a KB article (KB 2100508) in March 2015 on the subject, but if you’re seeing this error for the first time chances are you have no idea what the root cause is, so good luck finding the solution through google.  Hopefully this helps!