The list of events and tasks that vCenter maintains for each object in the inventory are extremely useful for forensics analysis in a vSphere environment.  For identifying who created or deleted a VM, resized a vmdk, or shut down a VM or host, events and tasks are where you look.

In it’s default state, however, the events and tasks views in vCenter have some major issues (from a forensic point of view).  First; events and tasks within vCenter can be rolled over.  vCenter comes with a default retention period of 30 days for events and tasks.  You can change the default retention period, but that comes with its own issues, and to be honest keeping a full list of events and tasks for time immemorial isn’t really vCenter’s job.  Second; the list of events or tasks can’t be searched.  It can be filtered, sure, but the filter terms you type only apply to the current page of entries (which displays a maximum of 100 entries).  You can also export a list of events matching a criteria to a csv file, but this will export a list of all matching events from the vCenter DB (so you’ll need to filter it with Excel), and it obviously won’t contain events which are older than the retention period.

This is where vRealize Log Insight (vRLI) comes in handy.  vRLI is a log collection and analytics tool that VMware provides, and can be configured to collect system logs from your ESXi hosts, vCenter Servers and PSC’s, as well as vCenter events, tasks, and alarms.  Now, even a small vSphere environment could generate millions of logs in 24 hours, so the next challenge becomes: how do we actually isolate the events specific to one object (a specific VM, for example)?  The answer is extremely simple, but not entirely obvious.

Under interactive analytics, add a filter for vc_event_type and an operator of “exists”, then add a second filter for vc_vm_name with an operator of “contains” and type the VM’s name.  Set your time range, and click search.  That’s it.

 

 

For more information, you could take a look at the dashboards which are provided by the “VMware – vSphere” integration pack, which is installed in vRLI by default.  For example, here’s a dashboard widget showing the vCenter Server tasks by type, over a specified time period.  Clicking the little arrow in the top right corner will take you to the interactive analytics view, where you can add or modify the filters for an even more specific search.

vRLI-02

You can use the dashboards for events, tasks, and alarms, to help identify additional event or tasks types that can be filtered through interactive analytics.  All in all, it provides a really powerful method of doing forensics investigation to discover why a certain thing did or didn’t happen.

 

I recently needed to configure a number of NSX Controller nodes to forward their logs using syslog to a vRealize Log Insight cluster.  Unlike the NSX manager (and most other components of a VMware SDDC), NSX controllers don’t provide a graphical way of configuring syslog.  In fact, they don’t even offer a CLI command for syslog configuration.  Instead, you need to use the NSX REST API.

Now, if you take a look at the official documentation for performing this configuration (including the NSX 6.3 documentation centre, or the VMware Validated Design for SDDC 4.0 documentation), you’ll find instructions which involve installing REST plug-ins into your firefox or chrome browser.  This can be a headache if you are running these commands from a jump host which only has internet access through a tightly controlled proxy.  Luckily, there is a powerful automation tool that comes pre-installed on the vast majority of jump hosts in the world: PowerShell!

I wrote a PowerShell script to configure syslog on my NSX controller nodes using the Invoke-RestMethod cmdlet, which I’ve made public below:

I hope that most of the code is fairly self-explanatory, but there are a few things I’ll mention.

Firstly, take note of the actual NSX controller ID’s which are being used in your environment.   Although the example in the script assumes NSX controllers 1, 2, and 3, remember that NSX controller ID persist forever, even after a controller node has been decommissioned (or failed to deploy in the first place).

Secondly, I spent a bit of time trying to work out how to best handle the credentials required to authenticate to the NSX managers.  Originally, I needed a quick solution so I simply provided the password as an unsecured string.  This is not ideal, especially since I intended to make this script publicly available for reuse by other people, and after a bit of investigation it turns out you can extract a password in plain text from a PSCredential object. Thus, you don’t need to provide your password in plain text (like by saving it in the script or typing it into the console via Read-Host), as the script will extract the password from the PSCredential to generate the header which is used to authenticate to the NSX Manager.  The script will also wipe the $Credentials variable after it has finished running.

Thirdly, you can edit the $body variable to change things like the port, protocol, or logging level, as required by your environment.

Lastly, it seems that all NSX controllers identify in their syslog messages as “NSX-Controller”.  This is less than ideal, especially when you might have 10-20 NSX controllers in your environment.  It’s possible to change the hostname of the NSX controller at the command line, but this step isn’t covered in any of the official documentation which makes me wonder if it’s the right approach (as maybe it causes some unintended side effects).  I’m looking into this and will report back when I have a solution.

If anyone wants to recommend some changes, feel free.  One thing I’d like to do is make the script handle multiple NSX Managers, and self-discover the valid NSX controller ID’s, but that’s a a job for another day.  Until then, I hope this helps some people out there!

 

I recently had a storage network outage in my lab environment, and after powering back on my vCenter Server Appliance (VCSA) I was rudely greeted with the following information at the console:

fixing-disk-corruption-in-the-vcenter-server-appliance-img01

Ouch!  I’ve never dealt with file system corruption in a VCSA before, and the internet doesn’t seem to contain much information on what to do next.  This post is my effort towards changing that.

The first thing I did was type “journalctl” to view the system logs, as suggested.  That displayed some additional information on the issue.

fixing-disk-corruption-in-the-vcenter-server-appliance-img02

It suggests running fsck manually, without the -a or -p options.  My background is heavily windows-focused, and I’ve never used fsck before, but I eventually figured out the syntax required to fix the problem, which is to run fsck against the damaged partition listed in the top line of the previous screenshot, i.e. fsck /dev/disk/by-parttitionuuid/79d76ed0-0297-4e33-a6bd-252099f2c613

Thankfully, I was able to use tab completion to save me from typing (and probably mis-typing) the entire partition UUID myself.  Running that command kicked off the fsck process and prompted me to fix probably 10-20 different errors:

fixing-disk-corruption-in-the-vcenter-server-appliance-img03

After completing all that, I got a success message.  It might be a good idea to run the whole thing twice, just to be sure.

fixing-disk-corruption-in-the-vcenter-server-appliance-img04

After that, I rebooted my vCenter appliance, and thankfully it booted normally!  I hope this post helps someone out there who is facing this issue.

 

In yesterday’s post, I updated my VCSA 6.0 appliance to version 6.5.  Today, logged into the Appliance MUI and noticed that my appliance was not able to check for updates using the default web repository.

Before we really start, a quick note on terminology.  The Appliance MUI (which means Appliance Management UI) is the new name for the old VAMI (vSphere Appliance Management Interface).  The MUI is a HTML5 web interface for configuring basic and low-level settings for the VCSA.  It’s accessible by connecting to your VCSA on port 5480.

So, what’s the deal?  Well, when browsing to the update section of the MUI and checking for updates, I would receive the following error:

how-to-get-vcenter-updates-working-through-the-appliance-mui-using-a-proxy-img01

My proxy config was the first thing I checked.  You can check the proxy in the MUI under Networking > Manage > Proxy settings.  Sure enough, my config was correct for my environment.

how-to-get-vcenter-updates-working-through-the-appliance-mui-using-a-proxy-img02

I also knew that my proxy was working correctly and was contactable by my VCSA, because I was able to use the proxy for the VCSA 6.5’s integrated update manager to communicate with the default web-based patch repositories.

I did a web search and came across an awesome post which contained the solution, written by a vExpert named Mario.  Full credit to him – the solution to this problem is his and not mine.  I encourage you to check out his blog.

So, it turns out that when you configure a proxy via the MUI, it only configures the proxy as a HTTP proxy.  As Mario points out (as does the GUI in the screenshot above – if you’re paying attention), the update repository is a HTTPS address, so the proxy configured via the MUI won’t apply.  Why doesn’t the proxy you configure apply to both HTTP and HTTPS connections, I hear you ask?  I haven’t got a clue.

The solution is surprisingly simple, once you’ve got some instructions and know where to look (which I didn’t).  Simply edit the /etc/sysconfig/proxy file via SSH or WinSCP (which is the option I took), and manually configure the HTTPS proxy – taking care, obviously, to adjust the URL from HTTP to HTTPS (highlighted).

how-to-get-vcenter-updates-working-through-the-appliance-mui-using-a-proxy-img03

After saving the changes, return to the updates section of the MUI and try checking for updates again.  Boom!  Thanks, Mario!

how-to-get-vcenter-updates-working-through-the-appliance-mui-using-a-proxy-img04

 

With today’s exciting release of VMware vSphere 6.5, I thought I’d celebrate by upgrading the vCenter Server Appliance (VCSA) in my lab from version 6.0 u2 to version 6.5.  Since I’m doing this more or less blind, without having read any documentation whatsoever, I thought I would write a (hopefully short) post about the problems that arise during the upgrade process, and how to get past them.

Before we get started, I should point out that I’m running all this stuff from a Windows 10 workstation, and my lab VCSA is an extremely compact deployment consisting of a VCSA with an integrated PSC, and a separate VUM instance running on Windows Server 2012 R2.  The VCSA 6.5 contains an integrated VUM component, which I’m looking forward to seeing.

The VCSA 6.5 upgrade is a 2 stage process.  The first stage involves creating the new VCSA VM and getting it on the network.  The second stage involves doing a data migration from the old VCSA to the new one.  By the end of this post, a new VCSA 6.5 appliance with an integrated PSC and VUM should have replaced my original VCSA, and the original VCSA will be shut down and ready to delete.  If your topology is more complex than mine (and if you’re looking to perform this upgrade outside of a small lab environment that’s highly probable), your upgrade steps and outcomes may vary from mine.

Starting the upgrade

Assuming you’ve downloaded the new VCSA 6.5 iso image from the vmware website, mount the iso and run the installer.exe located on the mounted iso at \vcsa-ui-installer\win32\.  This will open the snazzy new VCSA 6.5 installer, which no longer requires any browser plugins.  The Upgrade option is what I’m looking for today.

vsphere-6-5-upgrading-your-vcsa-img1

Note: during the upgrade process, it’s a good idea to ensure DRS is not set to fully automated.  This will ensure the source or destination VCSA VMs will not be migrated around the cluster by DRS during the upgrade process.

Hurdle 1 – backing up the VCSA database

The first step of the upgrade wizard advises us to back up all data on your appliance before you start the process.  Luckily, backing up the data on a VCSA is extremely easy, and is documented in KB2091961.  Note that the process as described in this KB article is only supported for restoring the VCSA’s integrated configuration database on the same appliance as the backup was taken from.  Trying to restore the database to another VCSA probably won’t work.

vsphere-6-5-upgrading-your-vcsa-img2

As per the documented steps:

  1. Download 091961_linux_backup_restore.zip from the bottom of the KB article and extract the backup_lin.py script to /tmp on your VCSA using WinSCP.
  2. Login to your VCSA via SSH.
  3. Run the following command to make the backup_lin.py script executable: chmod 700 /tmp/backup_lin.py.
  4. Run the backup using the following command: python /tmp/backup_lin.py -f /tmp/backupVCDB.bak.  You will be notified when the backup completes sucessfully.

vsphere-6-5-upgrading-your-vcsa-img3

Job done.  The backupVCDB.bak file will be stored in the /tmp directory, and because I’m running in a small lab environment mine was only 8MB in size.  Hopefully we won’t have to use it…

Hurdle 2 – the migration assistant

At stage 3 of the migration process, after entering in the correct details for my vCenter appliance and my ESXi host, I was greeted with the following error message: Unable to retrieve the migration assistant extension on source vCenter Server.  Make sure the migration assistant is running on the VUM server.

vsphere-6-5-upgrading-your-vcsa-img4

I looked on the mounted iso and noticed a folder called \migration-assistant which contained an .exe.  Looks promising.  Since my VUM server was running within my vSphere cluster, I mounted the iso via the VMware Remote Console, browsed to that directory, and ran VMware-Migration-Assistant.exe.  That will launch a script, which will ask you to supply the password for the windows account you’re logged in with.  That windows account must have permissions in vCenter – I’m not sure how much permission, but my account is an administrator in vCenter.

The migration assistant will then run through some prechecks, and then give you a brief summary of what is about to happen and tell you to leave the window open until the upgrade has finished.

vsphere-6-5-upgrading-your-vcsa-img5

You will then be able to return to the vCenter Server Appliance Installer and click “next” to continue the process.  Onwards!

Hurdle 3 – deployment size

I’ll immediately admit that this isn’t a hurdle so much as a decision… and a bit of an odd one.  The VCSA installer has always allowed us to choose a deployment size, which simply configures the VCSA with a certain amount of CPU and Memory resources to support a specified number of hosts and/or VMs.  Step 6 of the 6.5 installer now also allows us to select a storage size, which according to the description will simply allocate more permission to the SEAT (stats, events, alarms, tasks) data storage partition.  The reason I said this is a bit of an odd decision is because when you set the storage size to “default”, the smallest deployment size is “small”:

vsphere-6-5-upgrading-your-vcsa-img6

But when you set the storage size to “large”, it will allow you to specify an even smaller deployment size of “tiny”, albeit one which requires nearly 3x more storage than the Small/Default deployment:

vsphere-6-5-upgrading-your-vcsa-img7

So: a Tiny deployment size, which supports less hosts and VMs, and requires less CPU and RAM, needs nearly 3x more storage than a Small/Default deployment.  As I said, it seems odd.

Hurdle 4 – network configuration

If you get to step 8 of the wizard and you notice the network drop down is empty, it’s probably because non-ephemeral dvportgroups aren’t supported.  This has always been the case, but it’s easy to forget.  Simply create a new dvportgroup on your chosen dvswitch with a port binding method of Ephemeral – no binding, configure the relevant port settings for your network subnet, then return to the VCSA installer and skip back to step 7 before progressing forward again to step 8.  Don’t forget to migrate your new VCSA to the correct dvportgroup and delete the one that was just created.

Hurdle 5 – VMware Enhanced Authentication Plugin doesn’t work

After the upgrade had completed and I was greeted with the login page for the vSphere Web Client, I was prompted to install the VMware Enhanced Authentication Plugin.  This plugin allows you to use the “Use Windows session authentication” option to login to the web client, and is the updated version of the old vSphere 6.0 Client Integration Plugin.  The problem was that after installing the plugin and restarting my browser, I was still getting prompted to install the plugin after reopening.  This was occuring in both Chrome and Internet Explorer, so I knew it wasn’t browser-specific.  Restarting my workstation didn’t help.  What did help was uninstalling the Enhanced Authentication Plugin, the VMware Plug-in Service, and the old Client Integration Plugin.  I then reinstalled the Ehnanced Authentication Plugin, and upon relaunching the vSphere Web Client I was given the correct prompts to allow the plugin to launch.  Problem solved!

Conclusion

After ploughing through those issues, I’m happy to report that I’ve got a happy and functional VCSA 6.5 appliance.  Hopefully this post helps a few people out there with some niggling “early-adopter” issues when deploying their appliance.  There are plenty of really awesome features in vSphere 6.5, and I can’t wait to expand on a few of these features in upcoming posts.

A vMSC – perhaps more commonly known as a “metro cluster” –  is an architecture in which individual vSphere clusters will be spread across multiple geographical sites.  Since a vSphere cluster requires shared storage to allow VM’s to migrate across hosts, in a vMSC environment this will mean that storage must be shared or replicated across the geographical sites.  As you might expect, this kind of architecture comes with a number of gotchas and limitations, especially around the configuration of the storage arrays.  For this reason, storage vendors who support vMSC architectures have released best practices documentation specifically for designing vMSC’s with their storage products.

If you’re designing a vMSC architecture using HPE 3PAR storage arrays, HPE have released best practices documentation entitled Implementing vSphere Metro Storage Clustering using HPE 3PAR Peer Persistence which is available via VMware KB255904, or directly (at the time of writing) at this URL.

One of the requirements specific to vMSC architectures – which is not correctly stated by this documentation (or by KB255904) – is the following:

Storage I/O Control (SIOC), and the I/O Metric of Storage DRS, are not supported in vMSC configurations.

Misleadingly, the documentation not only does not specifically mention this requirement, but it suggests that the requirement should be violated by enabling the I/O metric of Storage DRS in the following quote from page 29 under the heading “Managing VMware Storage DRS Settings”:  Storage DRS should be “Manual Mode”. In this mode, Storage DRS will make recommendations when thresholds for space utilization or I/O latency are exceeded.  I/O latency will only be considered by Storage DRS if the I/O metric is enabled, which is how this quote violates the requirement.

I was designing a vMSC architecture around 12 months ago when I stumbled across a different VMware article KB2042596, which was the first document I’d ever seen stating that SIOC and the I/O metric of storage DRS are not supported in a vMSC environment.  I was quite confused by this, because I’d read through several VMware and vendor-specific documents on the subject and this was the first time I’d seen this requirement mentioned; in an obscure article I found by accident when looking for something else.  VMware’s original best practice document on the subject, entitled VMware vSphere Metro Storage Cluster Case Study from May 2012 (which is excellent documentation in all other respects), didn’t mention the requirement whatsoever.

In July 2015 I hoped to get some clarification on the subject by escalating within VMware through our VMware Technical Account Manager, and within HPE through internal channels, as it seemed documentation authors had either forgotten or were unaware of this requirement.  VMware said they would update the documentation as appropriate, and I’m now pleased to see that most recent version of VMware’s vMSC case study (now entitled VMware vSphere Metro Storage Cluster Best Practices,  but which is the same document updated for vSphere 6.0) prominently notes under the Storage DRS section on page 16 that neither SIOC or the I/O metric of storage DRS are supported.

Unfortunately, my efforts towards a correction of the 3PAR best practices were not as fruitful, and I’m disappointing to find that as of the January 2016 version of best practice documentation (linked earlier), the requirement is still not mentioned.  I will make another effort to get this documentation corrected.

Regardless, I hope that this post ensures at least some architects designing vMSC environments with HP 3PAR storage arrays will be aware of the correct and supported settings for Storage I/O Control and Storage DRS, if they weren’t already.

Resources:

VMware KB2055904 – Implementing vSphere Metro Storage Cluster (vMSC) using HP 3PAR Peer Persistence

HPE Technical Whitepaper – Implementing vSphere Metro Storage Cluster using HPE 3PAR Peer Persistence

VMware Technical whitepaper – VMware vSphere Metro Storage Cluster Recommended Practices

VMware KB2042596 – vSphere Storage DRS or vSphere Storage I/O Control support in a vSphere Metro Storage Cluster environment

 

 

This is an (extremely) quick post to cover the steps required to decommission a Platform Services Controler (PSC ) or vCenter Server from the vSphere single-sign on (SSO) domain.  The steps below are for a VCSA; steps for a Windows VC are very similar, and are contained in the VMware KB article I used as a reference for writing this post: KB 2106736.

Decommission a PSC

    1. Ensure no vCenter server instances are using the PSC that is to be decommissioned.  Instructions on how to query which PSC a vCenter instance is pointing to and subsequently repoint it are listed in my post here.
    2. Shut down the PSC.
    3. Connect to another PSC in the same SSO domain, either by SSH or using the console.  Enter the shell.
    4. Run the following cmsso-util command:
    5. Remove the decommissioned PSC from the vSphere inventory.

    Once these steps have been completed you can verify via the vSphere Web Client that the PSC has been decommissioned successfully by navigating to Administration > System Configuration > Nodes and ensuring that the decommissioned PSC is not present in the list of nodes.

    Decommission a vCenter Server (VCSA)

    1. Query the to-be-decommissioned vCenter server to identify the PSC it’s pointing to.  Instructions on how to query the PSC vCenter is pointing to are listed in my post here.
    2. Connect to the PSC the VCSA is pointing to, either by SSH or using the console.  Enter the shell.
    3. Run the following cmsso-util command:
    4. Power off the VCSA and remove it from the inventory.

    If you have multiple vCenter instances in a single SSO domain and you have just decommissioned one (or more), you may need to log out and log back into the vSphere Web Client before the decommissioned instance(s) disappear from the vSphere inventory tree.

Here’s a quick guide on how to query and change the Platform Services Controller (PSC) being used by vCenter.  Querying for the in-use PSC is possible on vCenter 6.0, but changing the PSC is only possible on 6.0 Update 1 or newer.  Note that I performed these steps on the vCenter Server Appliance (VCSA), and while I have also included some commands for a Windows-based vCenter server, I haven’t tested them myself.

Query the PSC being used by vCenter Server

There are two ways to identify this information: via the appliance console or SSH session, and via vSphere Web Client.

Option 1: via the appliance console or SSH session
On a VCSA

On a Windows vCenter

Option 2: via the vSphere Web Client

In the vSphere Web Client, navigate to the server’s Advanced Settings (vCenter > Manage > Settings > Advanced settings) there is a property called “config.vpxd.sso.admin.uri“.  The value of this property is the PSC that vCenter is currently using.

quick-post-query-and-change-the-platform-services-controller-being-used-by-vcenter-server-6-0-imga

 

Change (repoint) the PSC being used by vCenter Server

This step is a bit more detailed, as it depends on whether you are changing/repointing between PSC’s in a single SSO site, between SSO sites, or moving from an embedded PSC to an external PSC, and also whether you are using a VCSA or a Windows-based vCenter Server.  For this reason I’ll link directly to the VMware documentation for each scenario.

Option 1: Repointing within a site (KB 2113917).

    1. Connect to the VCSA console, or via SSH.
    2. Enable the shell (if necessary) and enter it.
    3. Run the vmafd-cli command.

  1. Restart the PSC services.

Option 2: Repointing between sites (KB 2131191).

Review the KB article for a full set of steps.

Option 3:  Repointing from an embedded PSC to an External PSC

See Reconfigure vCenter Server with Embedded Platform Services Controller to vCenter Server with External Platform Services Controller in the vSphere 6.0 documentation center.