ESX Troubleshooting – The PSOD (Purple Screen of Death)…


Unlike the BSOD of Windows fame, there is actually hope with a PSOD on ESX.  As I learned at VMWorld 2008, this indicates a specific hardware problem in the majority of cases.  Examining the screen dump can actually point you in the right direction to resolving this.

As I was building my junk server cluster (in a lab, not for production use, so a great way to learn safely), I was swapping NICs to plus-up on Gigabit Ethernet connectivity to the Cisco 6509 I am using.  One of my servers (the big one) was largely already completely configured in VIM, right down to the NFS mountpoint it was using.  Without thinking it through, I grabbed a couple of gig NICs to install, since it still had room, and did it, removing two unsupported NICs in the process and sliding the cards over into the blank PCI slots (grouping all the NICs together).  Upon rebooting, it threw up a red log entry proclaiming a pCPU0 warning about something.  Shortly thereafter, the console stopped responding.  Checking further, I saw that the host had a PSOD.  I rebooted, got the same log message on initial ESX console screen, and another PSOD within minutes.

This time, I dug into the PSOD and noticed that the dump was referencing network drivers for the cards I had just installed.  Aha!  I realized that the vmnic numbering had changed – and the server was trying to do all kinds of things using the old vmnic PCI references, including mount the NFS share.  No wonder it vomited!

The solution was to first shut down and pull the new NICs, reboot, and see if the PSOD went away – it did.  Next, I removed the NFS share and updated the vmnic assignments to vswitches to account for any changes.  I rebooted again to make sure all was well.  When that proved to be the case, I shut down and added in the two NICs I wanted to use, rebooted, and everything worked.  I was then able to update my configs with the new vmnics, reboot to make sure there was no PSOD event, and reenable the NFS share.  I rebooted again, one last time, just to tempt fate, but still no PSOD.

Been stable ever since.

So don’t give up on the PSOD – it’s natural to want to do that with Windows, but this sure ain’t Windows, is it?  You CAN troubleshoot and resolve these cases, even if you have to open a support call.  The dump can help you zero in on the bad memory module, failing CPU, or even the occasional misplaced network card, and help you get your server back up on its feet.

Of course, I would never be this reckless in a production environment – which is why everyone should have a lab to play with.  If you can afford the time, effort, and junk servers, it is a great way to learn in safety.

About these ads

5 Responses

  1. [...] shares his experience in troubleshooting a PSoD (Purple Screen of Death) with VMware ESX. In his case, the issue turned out to be related to NICs, [...]

  2. [..] A little unrelated, but I really liked this site post [..]

  3. This article is very interesting.I like it.

  4. SMARTKEY Password Recovery Bundle is a must-have toolkit to recover/remove/reset passwords for Windows, Excel, Word, Access, PowerPoint, Outlook, Outlook Express, PDF, RAR/WinRAR, ZIP/WinZIP, MSN, AOL, Google Talk, Paltalk, Trillian, Miranda, Opera, Firefox and IE Browser, etc. Over 21 types of passwords can be Recovered instantly. Until now, these password recovery tools are the fastest on the market, the easiest to use and the least expensive.

    http://www.google.com/search?q=SMARTKEY+Password+Recovery+Bundle&btnG=Search&hl=en&source=hp&gs_rfai=&cad=&cad=&aq=f&aqi=&aql=&oq=

  5. How to handle this on Dell PowerEdge servers..?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: