Stateless VMware ESXi 3.5 on an HP c7000 Blade Server…

NOTE:  This is only an overview.  Due to the detailed nature of this project, I will break it up over several more-focused articles over time for easier reference.

Well, despite my more negative impression of this year’s VMworld conference, it still really paid off.  There I learned about stateless ESX deployment.  Using this information, I was able to build in my lab, after a couple months of trial and error, a highly robust VMware environment, fully managed and licensed, using the midwife scripts I modified for this effort.  And configuration is hands-free.

Here are the system components:

  • SERVER – HP c7000 Blade Enclosure with sixteen Bl465c blades, two 4 GB FC modules, and four VC Enet modules
  • Each blade has two dual-core AMD CPUs, 16 GB RAM, two 72 GB SAS drives (hardware RAID-1), two embedded gig NICs, and a mezzanine card with two more gig NICs/iSCSI initiators and two FC HBAs
  • NETWORK – Cisco 6509 with two SUP 720 cards, two 48 port LC Gig-E fiber cards, and four 48 port gig copper cards
  • MANAGMENT – Dell 1850 with two 146 GB SAS drives (hardware RAID-1) for management and boot services
  • STORAGE – Scavenged proof-of-concept totally ghetto Dell Optiplex desktop with four internal 1.5 TB SATA drives (software RAID-10 formatted with tuned XFS) providing 3 TB of NFS shared storage
  • Scavenged HP IP-KVM box for OOB-management of the two Dells

Here are the steps I took:

  1. First I had to update all the firmware on the blade server.  This includes the two OA cards for the Onboard Administrator, the Virtual Connect software, the iLO2 software for each blade, the BIOS on each blade, and the Power Management Controller firmware.  There is a particular order this is done in, and it is not easy, but it really needs to be done.  The fixes that come with these updates are often vital to success.  Overall, I spent a week researching and updating.  I set all the blades to boot via PXE.
  2. Next, I built the storage server.  I really had no choice – nothing was available but a Dell Optiplex desktop.  It had four internal SATA ports available, and room for four 1GB RAM modules.  It also had a single dual-core Intel CPU and PCI slots for more NICs, and a PCI-Express mini-slot as well.  I had to order parts, and it took a little while, but once done, it had a total of four gig NICs (one embedded, two PCI, one PCI-Express), four 1.5 TB SATA drives, and 4 GB RAM.  I loaded it with 64-bit Ubuntu-9.04, hand-carved the partitions and RAID-10 setup, formatted the 3 TB volume with XFS, tuned as best I knew how, and then put it on the 2.6.31 kernel (I later updated it to  There were no BIOS or other firmware updates needed.
  3. I then built the management server on the Dell 1850.  It only has one power supply (I cannot find a second one), but it does have 8 GB RAM and two dual-core CPUs.  I loaded 64-bit Ubuntu-9.04 on it afte installing two 146 GB SAS drives in a RAID-1 mirror (hardware-based).  I also updated the BIOS and other firmware on it.
  4. Having these components in place, I studied the blade server to see what I could get away with, and ultimately decided to use each NIC on a blade server to support a set of traffic types, and balanced the likelyhood of traffic demands across them.  For example, Vmotion traffic, while it may be intense, should be relatively infrequent, so it shares a V-Net with another type of traffic that is low-bandwidth (the alternate management  network).  Altogether, I ended up with a primary management network on up V-Net, Vmotion and the alternate on another V-Net, storage traffic (NFS and iSCSI) on a third V-Net, and VM traffic on its own V-Net.  Each V-Net maps to the its own NIC on a blade, the same NIC on each blade.

The physical network design:

For the V-Nets, the management network went on NIC 1 as an untagged VLAN.  It has to be untagged, because when it boots up, it needs to get a DHCP address and talk to the boot server for its image.  Since it comes up untagged, it will not be able to talk out to the DHCP/PXE server if the V-Net is set to pass through tags.  The other V-Nets support tagged VLANs to further separate traffic.  Each V-Net has four links to the Cisco 6509, except for the storage V-Net, which has eight.  Two links form an LACP bundle from the active side (VC-Enet module in Bay 1), and two make up an LACP bundle (or etherchannel) from the module in Bay 2, which is the offline side.  This is repeated for the other networks across the other modules in Bays 5 and 6.  Bays 3 and 4 house the Fiber Channel modules, which I am not using.  Everything is on its own individual private 10.x.x.x network as well, except for the VM traffic net, which will contain the virtual machine traffic.

The storage design:

Like I said, a really ghetto NFS server.  It does not have enough drives, so even though it would be overkill for a home PC, it will not cut it in this situation.  I expect it to run out of steam after only a few VMs are added, but it does tie everything together and provides the shared storage component needed for HA, Vmotion, and DRS.  I am working on an afforable and acceptable solution, rack-mounted, with more gig NICs and up to 24 hot-swap drives – more spindles should offer more thoughput.  I bonded the NICs together into a single LACP link, untagged back the the Cisco, on the NFS storage VLAN.  Once working, I stripped out all unneeded packages for a very minimal 64-bit Ubuntu server.  It boots in seconds, and has no GUI.  Unfortuately, I did not get into the weeds enough to align the partitions/volumes/etc.  I just forgot to do that.  I will have to figure that out next time I get a storage box in.

The management server:

It is also on a very minimal 64-bit Ubuntu-9.04 install.  Ithas four NICs, but I only use two (the other two are only 100 MB).  The two gig NICs are also bonded into one LACP link back to the Cisco, untagged.  The server is running a stripped down 2.6.31 kernel, and has VMware Server 2.0.x installed for the vCenter Server (running on a Windows 2003 server virtual machine).  On the Ubuntu host server, I have installed and configured DHCP, TFTP, and gPXE.  I also extracted the boot guts from the ESXi 3.5.0 Update 4 ISO and set up the tftpboot directory so that each blade will get the image installed.  On the vCenter Server virtual machine, I installed the Microsoft PowerShell tool (which installed ActiveState PERL), and the VMware PowerCLI tool.  I also downloaded the midwife scripts and installed Notepad++ for easy editing.  The vCenter Server VM is on a private 10.x.x.x net for isolated management, but this gets in the way of the Update Manager plugin, so I still have some work to do later to get around this.

Really key things I learned from this:

  1. The blade server VC-Enet modules are NOT layer-2 switches.  They may look and feel that way in some aspects, but they, by design, actually present themselves to network devices as server ports (NICs), not as more network devices.  Learn about them – RTFM.  It makes a difference.  For instance, it may be useful to know that the right side bay modules are placed in standby by default, and the left-side are active – they are linked via an internal 10Gig connection.  I know of another lab with the same hardware that could not figure out why they could not connect the blade modules to the network if all the modules were enabled, so they solved it by disabling all but Bay-1, instead of learning about the features and really getting the most out of it.
  2. Beware old 64-bit CPUs.  Just because it lets you load a cool 64-bit OS on it does NOT mean it will let you load a cool 64-bit virtual machine on it.  If it does not have virtualization instruction sets in its CPU(s), you will run into failure.  I found this out the hard way, after trying to get the RCLI appliance (64-bit) from VMware in order to manage the ESXi hosts.  I am glad I failed, because it forced me to try the PowerCLI/PowerShell tools.  Without those tools, I seriously doubt I could have gotten this project working.
  3. Learn PowerShell.  The PowerCLI scripts extend it for VMware management, but there are plenty of cool tricks you can do using the base PowerShell scripts as well.  I am no fan of Microsoft, so it is not often I express satisfaction with one of their products.  Remember where you were on this day, ‘cuz it could be a while before it happens again.
  4. Name resolution is pretty important.  HA wants it in a real bad way.  Point your hosts to a DNS server, or give them identical hosts files (a little ghetto, but a good failsafe for a static environment).  I did both.
  5. Remember those Enet modules?  Remember all that cool LACP stuff I mentioned?  Rememeber RTFM?  Do it, or you will miss the clue that while the E-net modules like to play with LACP, only one link per V-Net is set active to avoid loops.  So if, on your active V-Net, you have two LACP links, each for a different tagged VLAN, and your NFS devices won’t talk to anyone, you will know that it is because it saw your iSCSI V-Net first, so it set your NFS link offline.  Meaning, the iSCSI link on Bay-1 and it’s offline twin on Bay-2 both have to fail before your NFS link on Bay-1 will come up.  Play it safe – one LACP link per V-Net per bay.  Tag over multiple VLANs on the link instead. The E-net modules only see the LACP links, and do not care if they support different VLANs – only one is set active at a time.
  6. Be careful with spanning tree (this can be said for everything related to networking).    Use portfast on your interfaces to the E-net modules, and be careful with spanning tree guards on the Cisco side.  In testing, I would find that by pulling one of the pairs in a link, it would isolate the VLAN instead of carrying on as if nothing had happened.  Turns out a guard on the interface was disabling the link to avoid potential loops.  Once I disabled that, the port-channel link functioned as desired.
  7. Doesn’t it suck to get everything working, and then not have a clean way to import in VMs?  I mean, now that you built it, how do you get stuff into it?  I ended up restructuring my NFS server and installing Samba as well.  This is because when importing a VM from the GUI (say, by right-clicking on a resource pool), the “Other Virtual Machine” option is the only one that fits.  However, it then looks for a UNC path (Windows share-style) to the .vmx file.  I could browse the datastore and do it that way, but for VMs not on the NFS datastore already, I needed to provide a means for other labs to drop in their VMs.  Samba worked.  Now they can drop in their VMs on the NFS server via Samba, and the vCenter Server can import the VMs from the same place.

Currently, we are restructuring phycial paths between labs for better management.  It is part of an overall overhaul of the labs in my building.  Once done, my next step is to start building framework services, such as repository proxy servers, WSUS servers, DHCP/DNS/file/print, RADIUS/S-LDAP/AD, etc., etc.  I also need to wrap in a management service framework as well that extends to all the labs so everyone has an at-a-glance picture of what is happening to the network and the virtual environment.  One last issue I am fighting is that I am unable to complete importing VMs I made on ESX 3.5 U2 earlier this year.  It keeps failing to open the .vmdk files.  I will have to pin that down first.

The end result?

  1. If I run the midwife service on the vCenter server and reboot a blade, it is reloaded and reconfigured within minutes.
  2. If I upgrade to beefier blades, I pop them in and let them build.
  3. If I update to a newer release of ESXi (say, update 5 or 6), I extract from the ISO to the tftpboot directory and reboot the blades.  The old configs get applied on the new updated OS.
  4. All configs are identical – extremely important for cluster harmony.  No typos.
  5. If someone alters a config and “breaks” something, I reboot it and it gets the original config applied back.
  6. If I make a change to the config, I change it in the script once, not on each blade individually.  This also allows for immediate opportunity to DOCUMENT YOUR CHANGES AS YOU GO.  Which is just a little bit important.

As stated before, this is an overview.  I will add more detailed articles later, which will include scripts and pictures as appropriate.  I am at home now and do not have access to my documentation, but once I get them, I will post some goodies that hopefully help someone else out.  To include myself.

ESX Troubleshooting – The PSOD (Purple Screen of Death)…

Unlike the BSOD of Windows fame, there is actually hope with a PSOD on ESX.  As I learned at VMWorld 2008, this indicates a specific hardware problem in the majority of cases.  Examining the screen dump can actually point you in the right direction to resolving this.

As I was building my junk server cluster (in a lab, not for production use, so a great way to learn safely), I was swapping NICs to plus-up on Gigabit Ethernet connectivity to the Cisco 6509 I am using.  One of my servers (the big one) was largely already completely configured in VIM, right down to the NFS mountpoint it was using.  Without thinking it through, I grabbed a couple of gig NICs to install, since it still had room, and did it, removing two unsupported NICs in the process and sliding the cards over into the blank PCI slots (grouping all the NICs together).  Upon rebooting, it threw up a red log entry proclaiming a pCPU0 warning about something.  Shortly thereafter, the console stopped responding.  Checking further, I saw that the host had a PSOD.  I rebooted, got the same log message on initial ESX console screen, and another PSOD within minutes.

This time, I dug into the PSOD and noticed that the dump was referencing network drivers for the cards I had just installed.  Aha!  I realized that the vmnic numbering had changed – and the server was trying to do all kinds of things using the old vmnic PCI references, including mount the NFS share.  No wonder it vomited!

The solution was to first shut down and pull the new NICs, reboot, and see if the PSOD went away – it did.  Next, I removed the NFS share and updated the vmnic assignments to vswitches to account for any changes.  I rebooted again to make sure all was well.  When that proved to be the case, I shut down and added in the two NICs I wanted to use, rebooted, and everything worked.  I was then able to update my configs with the new vmnics, reboot to make sure there was no PSOD event, and reenable the NFS share.  I rebooted again, one last time, just to tempt fate, but still no PSOD.

Been stable ever since.

So don’t give up on the PSOD – it’s natural to want to do that with Windows, but this sure ain’t Windows, is it?  You CAN troubleshoot and resolve these cases, even if you have to open a support call.  The dump can help you zero in on the bad memory module, failing CPU, or even the occasional misplaced network card, and help you get your server back up on its feet.

Of course, I would never be this reckless in a production environment – which is why everyone should have a lab to play with.  If you can afford the time, effort, and junk servers, it is a great way to learn in safety.

Possibilities Within ESX…

As I learn more about VMware ESX, I am starting to see the flexibility and possibilities available.  You have five major sets of pieces to play with – vswifs, vmknics, portgroups, vswitches, and vmnics.

  • You can tag or untag your portgroups, and can assign multiple portgroups to a vmnic.
  • You can have multiple vswifs on multiple vswitches.
  • You can have multiple vmnics assigned to a portgroup.
  • You can have vswitches with no uplinks (no vmnics assigned).
  • You can have portgroups with no uplinks (no vmnics assigned).
  • You can have vswifs assigned to non-service console portgroups for different traffic cases.
  • You can have up to 100 vswifs (0 to 99).

Things I have yet to determine on my own:

  • How many vmknics can you have?  I assume 100 also – you do not name them like you do with vswifs; you create and assign them to portgroups and they are automatically named and numbered.
  • Can a portgroup span multiple vswitches?  I don’t see why not.
  • Can a vmnic be assigned to multiple vswitches?  I think so…

I am sure that I will come up with plenty more questions.

Then throw in the firewall configs, and appliance VM’s (like firewall/IDS/IPS/proxy devices).  I saw demonstrations of an entire DMZ within a physical server, using such appliances spanning multiple vswitches (some with uplinks, some without).  Talk about amazing – I had not even considered thinking in that direction.  Just imagine how you can move all these pieces around to create new network functionality within an ESX host server.  The more complex it gets, though, the more you [A.] need to know the ESX command line, and [B.] need a kickstart script on a floppy to autoconfigure your stroke of genius onto new ESX servers you deploy.  (Because hand-jamming sucks.)

And finally – this is just the ESX side.  VIM comes along and adds in clusters, resource pools, the concept of shares, VMotion, HA, and DRS, just to name a few.  All configurable, and with a new set of caveats, such as:

  • DRS, VMotion, and HA need shared storage (SAN, iSCSI, or NFS) available before they are enabled.
  • DRS needs to be set to Manual when importing VMs from images or machines – deploying from templates does not (I think).
  • DRS and HA are available only for hosts within a cluster (I think).
  • HA, I believe, requires identical network configs on each ESX host in the cluster to work – so if you build your cluster out of dissimilar junk machines like I have (it’s all I have to work with for now), with different NIC quantities, portgroup assignments, and so on, then HA probably won’t work.  At least, it doesn’t for me, and the differing network configs are the first thing I would suspect.  And if you think it through, it sorta makes sense that it won’t work.

When VMware and Cisco come out with the virtual switch concept they discussed at VMworld2008, this HA limitation should change.  This is where, as I understand it, essentially the network configs are shadowed across each clustered host.  The Cisco switch interconnecting them is reconfigured when a HA event happens to allow those network changes incurred to function.  I think this is basically how it is supposed to work.  Too cool, eh?

NFS Fixed…

Didn’t have a default gateway defined for the two problem servers’ vmkernels.  Once I did that, they mounted the NFS share just fine.  Oddly, it worked mounting from within ESX at the command line…

I see beers in my future…..

Need to Fix NFS…

It occurs to me that I had better fix that NFS issue I am having.  Why?  Well, if I have five servers clustered, and three can mount the NFS datastore with VMs on it, could there be a chance of DRS moving a VM to a server not talking to the VM’s NFS orgin?  I do not think so, but if true, things would fail.  If not true, then my cluster is only as good as three servers, not five.

My strategy:  Mount at the command line on one of the ESX servers first to test.  If that fails, unmount the same NFS share from one of the other servers and try to remount it, from within VIM.  This will tell me quite a lot about what is going on, I hope.  The vmknics on four of the servers (two that can and the two that cannot mount) are on the same subnet, which differs from the subnet the NFS mount is on.  So why can two mount, but the other two not?  They fail instantly, so it is not a timeout.  The firewalls are all off for now, so that is not part of the issue.

And of course, dig through the logs on each of the servers – /usr/var/messages, /vmkernel, /vmksummary, and /vmkwarning at a minimum.

My task list has otherwise been eradicated in the past week (YES!) – outside of NFS, all that really remains is for me to build a golden master of Windows 2003 Server, and maybe fork some application templates (DHCP, DNS, print, AD, web, SQL, FTP, etc.) off of it.  Cake, right?

More Work, More ESX…

Figured out my issue from yesterday – the Service Console NICs were on the wrong port group.  They had the right IPs but were assigned to a portgroup with a different subnet mask, so they were never talking to their gateway.  Fixed.  I knew I was being a chowderhead.

Another thing I learned about ESX and Virtual Center – importing is cool, but be careful to make sure you import machines to ESX hosts that have at least as many CPUs as the target machine.  Otherwise it’ll come over, but fail to start up, and the logs will declare failure.  Just migrate to a more suitable ESX host and start it up.

Now I have fixed almost every issue I am having (still can’t get my two newest servers to mount one particular NFS share, even though they can ping the IP – the logs still say, “no route to host”).  I’ll get to it later.  Feeling pretty good right now – why spoil it?


Today I worked.  No breaks, almost no emails (like, seven maybe), no phone calls, no meetings, no chit-chat watercooler stuff, barely had lunch (a sandwich from home) – while I worked.  From 8:30 AM straight through to 6:30 PM, and I am tired.  I got a LOT done.

  • I located two possible rack shelves to use, since I need to install one more shelf int the roll around rack I am stuffing with hardware.  Neither was a good fit, but I then found the exact type I needed in another rack.  It was supposed to have been removed anyway, so I did it, adjusted the mounting brackets and seaching for missing screws before finally mounting it where I wanted.
  • I updated the VLAN configs on my Cisco 6509 to account for some new changes I had come up with.  I had to do some minor repatching of my existing ESX servers afterwards.
  • I helped another team diagnose a Layer-2 loop problem (didn’t take long).
  • Next came two legacy 2U servers (old HP DL 308 G3s).  I had to pull them apart and remove the three 100BaseT NICs in each of them.  Then I had to scavenge six Intel 1000BaseT NICs from three old IBM 1U servers that are going away (we have a stack of them, so I will probably be making another trip for more NICs later).  After installing the cards, I stacked them on the new shelf (waaayyy up high), connected the KVM and power, and popped an ESX install CD into each.
  • I loaded ESX on each server using my standard configuration, cabled up all the network cards, rebooted.
  • I imported a 287 GB server image onto a new 81 GB VM in my ESX cluster.
  • I helped yet another team get into their HP blade server chassis switches (didn’t take long).
  • I KVM’d into each new server and hand configured everything from the command line, making new vswif interfaces, vswitches, portgroups, and vmknics.  This took forever – as soon as I get more time, I am making my own kickstart script to do this stuff for me.  I have three vswitches, four vswif interfaces, and four vmknics.  Two vswitches have four portgroups each, and the other has, uh, <doing math in head> 22 portgroups.  Most are not used, but are there for uniformity and flexibility.
  • I updated all my documentation and posted the newest updates on the rack doors (front and back).
  • I worked with our LAN admin to set some routes up for some new networks I will be using, and updated the static routes in my Foundry switch tying my networks to his.
  • I spent the rest of my time troubleshooting why these two new servers cannot talk to most networks, but can talk to a couple.  I am so tired from typing in commands, vlan IDs, etc., I can’t think straight.  I bet I dorked up some VLAN tags on the vswitches or mispatched something.  I checked all the cables meticulously to ensure I had good link on everything, and sure enough, there were loose cables.  I confirmed everything was good on the physical layer with esxcfg-nics -l.  Default routes look good.  Just so much stuff to keep track of…

I am still not done with one server, not quite.  But tomorrow I’ll tackle it and these problems with both, and maybe grab a few more NICs for later.  Kinda sucks having no help, but oh well.  It’ll still get done.

I also need to make templates within ESX, so I have to start copying ISOs to install from.  That can happen tomorrow too.  Just a fairly typical workday.  Oh, and my boss put me in charge of a portion of a big project I am building all this stuff for.  Bonus.

I need a beer….