I recently had a storage network outage in my lab environment, and after powering back on my vCenter Server Appliance (VCSA) I was rudely greeted with the following information at the console:

fixing-disk-corruption-in-the-vcenter-server-appliance-img01

Ouch!  I’ve never dealt with file system corruption in a VCSA before, and the internet doesn’t seem to contain much information on what to do next.  This post is my effort towards changing that.

The first thing I did was type “journalctl” to view the system logs, as suggested.  That displayed some additional information on the issue.

fixing-disk-corruption-in-the-vcenter-server-appliance-img02

It suggests running fsck manually, without the -a or -p options.  My background is heavily windows-focused, and I’ve never used fsck before, but I eventually figured out the syntax required to fix the problem, which is to run fsck against the damaged partition listed in the top line of the previous screenshot, i.e. fsck /dev/disk/by-parttitionuuid/79d76ed0-0297-4e33-a6bd-252099f2c613

Thankfully, I was able to use tab completion to save me from typing (and probably mis-typing) the entire partition UUID myself.  Running that command kicked off the fsck process and prompted me to fix probably 10-20 different errors:

fixing-disk-corruption-in-the-vcenter-server-appliance-img03

After completing all that, I got a success message.  It might be a good idea to run the whole thing twice, just to be sure.

fixing-disk-corruption-in-the-vcenter-server-appliance-img04

After that, I rebooted my vCenter appliance, and thankfully it booted normally!  I hope this post helps someone out there who is facing this issue.

 

In yesterday’s post, I updated my VCSA 6.0 appliance to version 6.5.  Today, logged into the Appliance MUI and noticed that my appliance was not able to check for updates using the default web repository.

Before we really start, a quick note on terminology.  The Appliance MUI (which means Appliance Management UI) is the new name for the old VAMI (vSphere Appliance Management Interface).  The MUI is a HTML5 web interface for configuring basic and low-level settings for the VCSA.  It’s accessible by connecting to your VCSA on port 5480.

So, what’s the deal?  Well, when browsing to the update section of the MUI and checking for updates, I would receive the following error:

how-to-get-vcenter-updates-working-through-the-appliance-mui-using-a-proxy-img01

My proxy config was the first thing I checked.  You can check the proxy in the MUI under Networking > Manage > Proxy settings.  Sure enough, my config was correct for my environment.

how-to-get-vcenter-updates-working-through-the-appliance-mui-using-a-proxy-img02

I also knew that my proxy was working correctly and was contactable by my VCSA, because I was able to use the proxy for the VCSA 6.5’s integrated update manager to communicate with the default web-based patch repositories.

I did a web search and came across an awesome post which contained the solution, written by a vExpert named Mario.  Full credit to him – the solution to this problem is his and not mine.  I encourage you to check out his blog.

So, it turns out that when you configure a proxy via the MUI, it only configures the proxy as a HTTP proxy.  As Mario points out (as does the GUI in the screenshot above – if you’re paying attention), the update repository is a HTTPS address, so the proxy configured via the MUI won’t apply.  Why doesn’t the proxy you configure apply to both HTTP and HTTPS connections, I hear you ask?  I haven’t got a clue.

The solution is surprisingly simple, once you’ve got some instructions and know where to look (which I didn’t).  Simply edit the /etc/sysconfig/proxy file via SSH or WinSCP (which is the option I took), and manually configure the HTTPS proxy – taking care, obviously, to adjust the URL from HTTP to HTTPS (highlighted).

how-to-get-vcenter-updates-working-through-the-appliance-mui-using-a-proxy-img03

After saving the changes, return to the updates section of the MUI and try checking for updates again.  Boom!  Thanks, Mario!

how-to-get-vcenter-updates-working-through-the-appliance-mui-using-a-proxy-img04

 

With today’s exciting release of VMware vSphere 6.5, I thought I’d celebrate by upgrading the vCenter Server Appliance (VCSA) in my lab from version 6.0 u2 to version 6.5.  Since I’m doing this more or less blind, without having read any documentation whatsoever, I thought I would write a (hopefully short) post about the problems that arise during the upgrade process, and how to get past them.

Before we get started, I should point out that I’m running all this stuff from a Windows 10 workstation, and my lab VCSA is an extremely compact deployment consisting of a VCSA with an integrated PSC, and a separate VUM instance running on Windows Server 2012 R2.  The VCSA 6.5 contains an integrated VUM component, which I’m looking forward to seeing.

The VCSA 6.5 upgrade is a 2 stage process.  The first stage involves creating the new VCSA VM and getting it on the network.  The second stage involves doing a data migration from the old VCSA to the new one.  By the end of this post, a new VCSA 6.5 appliance with an integrated PSC and VUM should have replaced my original VCSA, and the original VCSA will be shut down and ready to delete.  If your topology is more complex than mine (and if you’re looking to perform this upgrade outside of a small lab environment that’s highly probable), your upgrade steps and outcomes may vary from mine.

Starting the upgrade

Assuming you’ve downloaded the new VCSA 6.5 iso image from the vmware website, mount the iso and run the installer.exe located on the mounted iso at \vcsa-ui-installer\win32\.  This will open the snazzy new VCSA 6.5 installer, which no longer requires any browser plugins.  The Upgrade option is what I’m looking for today.

vsphere-6-5-upgrading-your-vcsa-img1

Note: during the upgrade process, it’s a good idea to ensure DRS is not set to fully automated.  This will ensure the source or destination VCSA VMs will not be migrated around the cluster by DRS during the upgrade process.

Hurdle 1 – backing up the VCSA database

The first step of the upgrade wizard advises us to back up all data on your appliance before you start the process.  Luckily, backing up the data on a VCSA is extremely easy, and is documented in KB2091961.  Note that the process as described in this KB article is only supported for restoring the VCSA’s integrated configuration database on the same appliance as the backup was taken from.  Trying to restore the database to another VCSA probably won’t work.

vsphere-6-5-upgrading-your-vcsa-img2

As per the documented steps:

  1. Download 091961_linux_backup_restore.zip from the bottom of the KB article and extract the backup_lin.py script to /tmp on your VCSA using WinSCP.
  2. Login to your VCSA via SSH.
  3. Run the following command to make the backup_lin.py script executable: chmod 700 /tmp/backup_lin.py.
  4. Run the backup using the following command: python /tmp/backup_lin.py -f /tmp/backupVCDB.bak.  You will be notified when the backup completes sucessfully.

vsphere-6-5-upgrading-your-vcsa-img3

Job done.  The backupVCDB.bak file will be stored in the /tmp directory, and because I’m running in a small lab environment mine was only 8MB in size.  Hopefully we won’t have to use it…

Hurdle 2 – the migration assistant

At stage 3 of the migration process, after entering in the correct details for my vCenter appliance and my ESXi host, I was greeted with the following error message: Unable to retrieve the migration assistant extension on source vCenter Server.  Make sure the migration assistant is running on the VUM server.

vsphere-6-5-upgrading-your-vcsa-img4

I looked on the mounted iso and noticed a folder called \migration-assistant which contained an .exe.  Looks promising.  Since my VUM server was running within my vSphere cluster, I mounted the iso via the VMware Remote Console, browsed to that directory, and ran VMware-Migration-Assistant.exe.  That will launch a script, which will ask you to supply the password for the windows account you’re logged in with.  That windows account must have permissions in vCenter – I’m not sure how much permission, but my account is an administrator in vCenter.

The migration assistant will then run through some prechecks, and then give you a brief summary of what is about to happen and tell you to leave the window open until the upgrade has finished.

vsphere-6-5-upgrading-your-vcsa-img5

You will then be able to return to the vCenter Server Appliance Installer and click “next” to continue the process.  Onwards!

Hurdle 3 – deployment size

I’ll immediately admit that this isn’t a hurdle so much as a decision… and a bit of an odd one.  The VCSA installer has always allowed us to choose a deployment size, which simply configures the VCSA with a certain amount of CPU and Memory resources to support a specified number of hosts and/or VMs.  Step 6 of the 6.5 installer now also allows us to select a storage size, which according to the description will simply allocate more permission to the SEAT (stats, events, alarms, tasks) data storage partition.  The reason I said this is a bit of an odd decision is because when you set the storage size to “default”, the smallest deployment size is “small”:

vsphere-6-5-upgrading-your-vcsa-img6

But when you set the storage size to “large”, it will allow you to specify an even smaller deployment size of “tiny”, albeit one which requires nearly 3x more storage than the Small/Default deployment:

vsphere-6-5-upgrading-your-vcsa-img7

So: a Tiny deployment size, which supports less hosts and VMs, and requires less CPU and RAM, needs nearly 3x more storage than a Small/Default deployment.  As I said, it seems odd.

Hurdle 4 – network configuration

If you get to step 8 of the wizard and you notice the network drop down is empty, it’s probably because non-ephemeral dvportgroups aren’t supported.  This has always been the case, but it’s easy to forget.  Simply create a new dvportgroup on your chosen dvswitch with a port binding method of Ephemeral – no binding, configure the relevant port settings for your network subnet, then return to the VCSA installer and skip back to step 7 before progressing forward again to step 8.  Don’t forget to migrate your new VCSA to the correct dvportgroup and delete the one that was just created.

Hurdle 5 – VMware Enhanced Authentication Plugin doesn’t work

After the upgrade had completed and I was greeted with the login page for the vSphere Web Client, I was prompted to install the VMware Enhanced Authentication Plugin.  This plugin allows you to use the “Use Windows session authentication” option to login to the web client, and is the updated version of the old vSphere 6.0 Client Integration Plugin.  The problem was that after installing the plugin and restarting my browser, I was still getting prompted to install the plugin after reopening.  This was occuring in both Chrome and Internet Explorer, so I knew it wasn’t browser-specific.  Restarting my workstation didn’t help.  What did help was uninstalling the Enhanced Authentication Plugin, the VMware Plug-in Service, and the old Client Integration Plugin.  I then reinstalled the Ehnanced Authentication Plugin, and upon relaunching the vSphere Web Client I was given the correct prompts to allow the plugin to launch.  Problem solved!

Conclusion

After ploughing through those issues, I’m happy to report that I’ve got a happy and functional VCSA 6.5 appliance.  Hopefully this post helps a few people out there with some niggling “early-adopter” issues when deploying their appliance.  There are plenty of really awesome features in vSphere 6.5, and I can’t wait to expand on a few of these features in upcoming posts.

A vMSC – perhaps more commonly known as a “metro cluster” –  is an architecture in which individual vSphere clusters will be spread across multiple geographical sites.  Since a vSphere cluster requires shared storage to allow VM’s to migrate across hosts, in a vMSC environment this will mean that storage must be shared or replicated across the geographical sites.  As you might expect, this kind of architecture comes with a number of gotchas and limitations, especially around the configuration of the storage arrays.  For this reason, storage vendors who support vMSC architectures have released best practices documentation specifically for designing vMSC’s with their storage products.

If you’re designing a vMSC architecture using HPE 3PAR storage arrays, HPE have released best practices documentation entitled Implementing vSphere Metro Storage Clustering using HPE 3PAR Peer Persistence which is available via VMware KB255904, or directly (at the time of writing) at this URL.

One of the requirements specific to vMSC architectures – which is not correctly stated by this documentation (or by KB255904) – is the following:

Storage I/O Control (SIOC), and the I/O Metric of Storage DRS, are not supported in vMSC configurations.

Misleadingly, the documentation not only does not specifically mention this requirement, but it suggests that the requirement should be violated by enabling the I/O metric of Storage DRS in the following quote from page 29 under the heading “Managing VMware Storage DRS Settings”:  Storage DRS should be “Manual Mode”. In this mode, Storage DRS will make recommendations when thresholds for space utilization or I/O latency are exceeded.  I/O latency will only be considered by Storage DRS if the I/O metric is enabled, which is how this quote violates the requirement.

I was designing a vMSC architecture around 12 months ago when I stumbled across a different VMware article KB2042596, which was the first document I’d ever seen stating that SIOC and the I/O metric of storage DRS are not supported in a vMSC environment.  I was quite confused by this, because I’d read through several VMware and vendor-specific documents on the subject and this was the first time I’d seen this requirement mentioned; in an obscure article I found by accident when looking for something else.  VMware’s original best practice document on the subject, entitled VMware vSphere Metro Storage Cluster Case Study from May 2012 (which is excellent documentation in all other respects), didn’t mention the requirement whatsoever.

In July 2015 I hoped to get some clarification on the subject by escalating within VMware through our VMware Technical Account Manager, and within HPE through internal channels, as it seemed documentation authors had either forgotten or were unaware of this requirement.  VMware said they would update the documentation as appropriate, and I’m now pleased to see that most recent version of VMware’s vMSC case study (now entitled VMware vSphere Metro Storage Cluster Best Practices,  but which is the same document updated for vSphere 6.0) prominently notes under the Storage DRS section on page 16 that neither SIOC or the I/O metric of storage DRS are supported.

Unfortunately, my efforts towards a correction of the 3PAR best practices were not as fruitful, and I’m disappointing to find that as of the January 2016 version of best practice documentation (linked earlier), the requirement is still not mentioned.  I will make another effort to get this documentation corrected.

Regardless, I hope that this post ensures at least some architects designing vMSC environments with HP 3PAR storage arrays will be aware of the correct and supported settings for Storage I/O Control and Storage DRS, if they weren’t already.

Resources:

VMware KB2055904 – Implementing vSphere Metro Storage Cluster (vMSC) using HP 3PAR Peer Persistence

HPE Technical Whitepaper – Implementing vSphere Metro Storage Cluster using HPE 3PAR Peer Persistence

VMware Technical whitepaper – VMware vSphere Metro Storage Cluster Recommended Practices

VMware KB2042596 – vSphere Storage DRS or vSphere Storage I/O Control support in a vSphere Metro Storage Cluster environment

 

 

This is an (extremely) quick post to cover the steps required to decommission a Platform Services Controler (PSC ) or vCenter Server from the vSphere single-sign on (SSO) domain.  The steps below are for a VCSA; steps for a Windows VC are very similar, and are contained in the VMware KB article I used as a reference for writing this post: KB 2106736.

Decommission a PSC

    1. Ensure no vCenter server instances are using the PSC that is to be decommissioned.  Instructions on how to query which PSC a vCenter instance is pointing to and subsequently repoint it are listed in my post here.
    2. Shut down the PSC.
    3. Connect to another PSC in the same SSO domain, either by SSH or using the console.  Enter the shell.
    4. Run the following cmsso-util command:
    5. Remove the decommissioned PSC from the vSphere inventory.

    Once these steps have been completed you can verify via the vSphere Web Client that the PSC has been decommissioned successfully by navigating to Administration > System Configuration > Nodes and ensuring that the decommissioned PSC is not present in the list of nodes.

    Decommission a vCenter Server (VCSA)

    1. Query the to-be-decommissioned vCenter server to identify the PSC it’s pointing to.  Instructions on how to query the PSC vCenter is pointing to are listed in my post here.
    2. Connect to the PSC the VCSA is pointing to, either by SSH or using the console.  Enter the shell.
    3. Run the following cmsso-util command:
    4. Power off the VCSA and remove it from the inventory.

    If you have multiple vCenter instances in a single SSO domain and you have just decommissioned one (or more), you may need to log out and log back into the vSphere Web Client before the decommissioned instance(s) disappear from the vSphere inventory tree.

Here’s a quick guide on how to query and change the Platform Services Controller (PSC) being used by vCenter.  Querying for the in-use PSC is possible on vCenter 6.0, but changing the PSC is only possible on 6.0 Update 1 or newer.  Note that I performed these steps on the vCenter Server Appliance (VCSA), and while I have also included some commands for a Windows-based vCenter server, I haven’t tested them myself.

Query the PSC being used by vCenter Server

There are two ways to identify this information: via the appliance console or SSH session, and via vSphere Web Client.

Option 1: via the appliance console or SSH session
On a VCSA

On a Windows vCenter

Option 2: via the vSphere Web Client

In the vSphere Web Client, navigate to the server’s Advanced Settings (vCenter > Manage > Settings > Advanced settings) there is a property called “config.vpxd.sso.admin.uri“.  The value of this property is the PSC that vCenter is currently using.

quick-post-query-and-change-the-platform-services-controller-being-used-by-vcenter-server-6-0-imga

 

Change (repoint) the PSC being used by vCenter Server

This step is a bit more detailed, as it depends on whether you are changing/repointing between PSC’s in a single SSO site, between SSO sites, or moving from an embedded PSC to an external PSC, and also whether you are using a VCSA or a Windows-based vCenter Server.  For this reason I’ll link directly to the VMware documentation for each scenario.

Option 1: Repointing within a site (KB 2113917).

    1. Connect to the VCSA console, or via SSH.
    2. Enable the shell (if necessary) and enter it.
    3. Run the vmafd-cli command.

  1. Restart the PSC services.

Option 2: Repointing between sites (KB 2131191).

Review the KB article for a full set of steps.

Option 3:  Repointing from an embedded PSC to an External PSC

See Reconfigure vCenter Server with Embedded Platform Services Controller to vCenter Server with External Platform Services Controller in the vSphere 6.0 documentation center.

As VMware continues to push in the direction of unix-based appliances for their vSphere management components, those without a Unix background (like myself) are having to come to grips with the Unix versions of common administrative tasks. Increasing the disk size on a vCenter Server appliance (VCSA) is one such task.  In vCenter 6.0 VMware has introduced Logical Volume Management (LVM) which really simplifies the process of increasing the size of a disk, and allows it to be done while the appliance is online.  VMware KB 2126276 covers all the steps required to increase the size of a disk, but this guide will cover it in slightly more detail.

Step 1: identify which disk (if any) has a problem with free space.

To do this, I connect to the appliance via SSH or the console, enable and enter the shell, and use the df -h command.
For more information on using command line tools for working with disk space can be found in my post Useful Unix commands for managing disk space on VMware appliances.

increasing-the-disk-size-on-a-vcenter-server-appliance-in-vsphere-6-0-a

I can see that both /storage/core and /storage/log are 100% used.  I’m guessing that /storage/core is full with vpxd crashdumps that are being generated because vCenter is crashing after being unable to generate logs on /storage/log.  Based on this guess, I’ll increase the size of /storage/log and then manually delete the crashdumps on /storage/core and monitor the situation.  I won’t cover the steps involved in deleting the vpxd crashdumps in this post, but it basically involves deleting core.vpxd.* and *.tgz files in the /storage/core directory.

Step 2: Increase the size of the affected disk using the vSphere Web Client

Looking at the table in VMware KB 2126276, it tells me that the disk mounted to /storage/log is VMDK5.  The way this is presented is a bit confusing in my opinion, because the disk we’re looking for is listed as hard disk 5 in the web client, but the filename of the disk is vmname_4.vmdk (the numbering of virtual disks is thrown out in this way because hard disk 1 is vmname.vmdk, and hard disk 2 is vmname_1.vmdk).  Where the KB article says “VMDK5”, it really just means “the fifth VMDK file”.

The reason my /storage/logs disk filled up is because I’ve increased the logging levels on my vCenter appliance to try to catch an issue that had been occuring.  Because of the increased amount of logs being generated, I’m going to increase the size of this VMDK to 25GB.  I don’t want to go overboard because the disks are thick provisioned by default.

increasing-the-disk-size-on-a-vcenter-server-appliance-in-vsphere-6-0-b

Step 3: expand the logical drive and confirm that it has grown successfully

Return to the SSH session and expand the logical drive(s) that have been resized.  The following command will expand any disks that have had their vmdk files resized.

If the operation is successful, you should see a message similar to the following.

In my case, I did get that message eventually,  but I also got a bunch of the following errors:

The reason I saw that error is because my /storage/core disk is 100% used.  As mentioned I’m going to free up space on that drive manually, so I’ll ignore that error for now.

If I run df -h again, I can see that /storage/log is now 25GB in total size.  Job done!

increasing-the-disk-size-on-a-vcenter-server-appliance-in-vsphere-6-0-c

Note: In the vCenter 5.x appliance, increasing disk sizes was a bit of a pain. The operation had to be performed while vCenter was offline, and involved adding a brand new disk, copying files from the old disk to the new one, and editing mount points.  For anyone who is working with a vCenter 5.x appliance, the steps are in KB 2056764.

Coming from a Windows background without much knowledge of Unix commands, I often find myself at a loss when trying to figure out how to do things on VMware’s vSphere appliances.  Managing disk space from the command line on an appliance is something I’ve had to do more than a few times, so I thought I’d create a quick list of the Unix commands I use most often to identify which partitions are filling up, and then which folders and files on that partition are consuming the most space.

When I’m working on a disk space problem, there are few things I need to do. First, list disk space by partition. Second, identify the biggest consumers on a partition by listing disk usage of child files and folders. Third, figure out if any of the directories identified in the previous step are symbolic links, and find the link target. Lastly, depending on what files are consuming all that space, I may want to delete them.

List disk space per partition

The df command (which is an abbreviation for disk free) is the trick.  The -h switch will display file sizes in KB, MB and GB.

Now we know that /storage/core and /storage/log are both 100% full, we need to work out what is consuming the space on those partitions.

List disk usage of child files and folders on a partition

The du command (which is an abbreviation for disk usage) estimates the size of directories and files under a specific path. The best way to use this command is to sort the results by file size, as follows:

You can also use the -h switch to present the file size in a more friendly format, but the downside of that approach is that you can’t pipe the results to sort, as the list will be incorrectly sorted because it only considers the numbers and not the units, so it doesn’t understand that 10GB is larger than 100MB).

One thing to note is that if a child folder is actually a symbolic link, the file size will be listed as zero. These need to be identified and handled separately.

List symbolic links and discover link targets

To identify symbolic links, use the ls command with the -la switches. The results will be colour coded, and symbolic links will be listed in a light blue colour. The real path of the symbolic link will be listed to the right. In the snipped example below, you can see that /var/log/vmware is actually a symbolic link to /storage/log/vmware.

Delete files or folders

It’s possible to delete files via the command line using the rm command and specifying the file or folder name. To remove a file:

To remove a folder:

But use that with caution, because it could end badly.

A third option is actually my preferred choice, but it doesn’t involve using the command line at all. This option is to use a program like WinSCP to connect to the appliance and delete the files via the GUI. This is a good thing in my opinion, as there’s less of a risk of accidentally deleting a folder by mistake, and because it’s much easier to delete multiple files at once.useful-unix-commands-for-managing-disk-space-on-vmware-appliances-a

If you’ve set up a vCenter 6.0 appliance or a Platform Services Controller and tried to connect via WinSCP, you will have noticed the following error:

Host is not communicating for more than 15 seconds.  Still waiting…

resolving-the-host-is-not-communicating-for-more-than-15-seconds-error-when-connecting-to-a-vsphere-6-0-appliance-with-winscp-a

This error arises because vSphere 6.0 appliances now come with two shells: the appliance shell (which is the default shell for the root user), and the BASH shell.  WinSCP throws the above error when the root user is configured to use the appliance shell.  The error is easily resolved by configuring the root user to use the BASH shell.

To do so, connect to the appliance via SSH or the console and enter the following commands:

Problem solved!

After you’ve done what you need to do with WinSCP you can change the default shell back easily enough using the command below, however I’m not aware of any downside for leaving it set to the BASH shell (and the upside is you won’t need to manually change the shell every time you want to connect with WinSCP).

VMware published a KB article (KB 2100508) in March 2015 on the subject, but if you’re seeing this error for the first time chances are you have no idea what the root cause is, so good luck finding the solution through google.  Hopefully this helps!