Virtualization: What to do when vSphere goes down


Become An Insider

Sign up now and get free access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content from the best tech brands on the Internet: CIO, CSO, Computerworld, InfoWorld, IT World and Network World Learn more.

Troubleshooting doesn't have to be difficult, even in the virtual world. Here are some tips.

VMware's vSphere virtualization platform is known for its reliability and offers built-in features like High Availability (HA) to ensure that physical server failures have only minor effects on end-user applications. But not everyone chooses to use these features and, as with any technology, bad things can happen even if you do use them. So, what do you do when the unexpected happens -- when a virtual machine is slow or down or when an ESXi server or vCenter is unresponsive?

vSphere troubleshooting overview

As part of my preparation for the VMware Certified Advanced Professional -- Data Center Administrator certification, and throughout the process of creating my (caution: shameless plug alert) vSphere Troubleshooting course, I have spent a lot of time troubleshooting vSphere. I have intentionally broken it and then tried to fix it, sometimes with success and sometimes without.

What to do when vSphere goes down

Anytime someone says something is "down," you need to start by getting more information. What is down, exactly? Has a physical server failed? Has the vCenter VM blue-screened or are just the services stopped? Is the core network switch locked? Has the SAN lost power? Are all VMs down, or just one?

Users don't know what's "down" -- nor should they care; that's your job. Since you have the understanding of the various pieces in play and can perform some simple tests, you should be able to quickly determine where the problem lies. Still, make sure that you test thoroughly before deciding what the cause is. More than once I've made one quick test and (incorrectly) determined that the problem was the server, for example, when it actually was the entire network.

Check the RAM utilization in the specific VM for the app that's down by using Windows Task Manager. You will likely find that the process is eating up a lot of RAM. In this case, the VM only has 1GB of RAM and this process is using 462MB (about half).

RAM utilization by process
You can decide whether to stop a process or application that's down by using Windows Task Manager to see how much RAM the guilty party is using.

If the memory is being used by an application that you don't want (a malicious application or a game that a user is running on a virtual desktop) then you can kill the process or uninstall the application.

To continue reading, please begin the free registration process or sign in to your Insider account by entering your email address:
Free course: Hack yourself first (before the bad guys do)
View Comments
You Might Like
Join the discussion
Be the first to comment on this article. Our Commenting Policies