Stateless VMware ESXi 3.5 on an HP c7000 Blade Server…

NOTE:  This is only an overview.  Due to the detailed nature of this project, I will break it up over several more-focused articles over time for easier reference.

Well, despite my more negative impression of this year’s VMworld conference, it still really paid off.  There I learned about stateless ESX deployment.  Using this information, I was able to build in my lab, after a couple months of trial and error, a highly robust VMware environment, fully managed and licensed, using the midwife scripts I modified for this effort.  And configuration is hands-free.

Here are the system components:

  • SERVER – HP c7000 Blade Enclosure with sixteen Bl465c blades, two 4 GB FC modules, and four VC Enet modules
  • Each blade has two dual-core AMD CPUs, 16 GB RAM, two 72 GB SAS drives (hardware RAID-1), two embedded gig NICs, and a mezzanine card with two more gig NICs/iSCSI initiators and two FC HBAs
  • NETWORK – Cisco 6509 with two SUP 720 cards, two 48 port LC Gig-E fiber cards, and four 48 port gig copper cards
  • MANAGMENT – Dell 1850 with two 146 GB SAS drives (hardware RAID-1) for management and boot services
  • STORAGE – Scavenged proof-of-concept totally ghetto Dell Optiplex desktop with four internal 1.5 TB SATA drives (software RAID-10 formatted with tuned XFS) providing 3 TB of NFS shared storage
  • Scavenged HP IP-KVM box for OOB-management of the two Dells

Here are the steps I took:

  1. First I had to update all the firmware on the blade server.  This includes the two OA cards for the Onboard Administrator, the Virtual Connect software, the iLO2 software for each blade, the BIOS on each blade, and the Power Management Controller firmware.  There is a particular order this is done in, and it is not easy, but it really needs to be done.  The fixes that come with these updates are often vital to success.  Overall, I spent a week researching and updating.  I set all the blades to boot via PXE.
  2. Next, I built the storage server.  I really had no choice – nothing was available but a Dell Optiplex desktop.  It had four internal SATA ports available, and room for four 1GB RAM modules.  It also had a single dual-core Intel CPU and PCI slots for more NICs, and a PCI-Express mini-slot as well.  I had to order parts, and it took a little while, but once done, it had a total of four gig NICs (one embedded, two PCI, one PCI-Express), four 1.5 TB SATA drives, and 4 GB RAM.  I loaded it with 64-bit Ubuntu-9.04, hand-carved the partitions and RAID-10 setup, formatted the 3 TB volume with XFS, tuned as best I knew how, and then put it on the 2.6.31 kernel (I later updated it to 2.6.31.5).  There were no BIOS or other firmware updates needed.
  3. I then built the management server on the Dell 1850.  It only has one power supply (I cannot find a second one), but it does have 8 GB RAM and two dual-core CPUs.  I loaded 64-bit Ubuntu-9.04 on it afte installing two 146 GB SAS drives in a RAID-1 mirror (hardware-based).  I also updated the BIOS and other firmware on it.
  4. Having these components in place, I studied the blade server to see what I could get away with, and ultimately decided to use each NIC on a blade server to support a set of traffic types, and balanced the likelyhood of traffic demands across them.  For example, Vmotion traffic, while it may be intense, should be relatively infrequent, so it shares a V-Net with another type of traffic that is low-bandwidth (the alternate management  network).  Altogether, I ended up with a primary management network on up V-Net, Vmotion and the alternate on another V-Net, storage traffic (NFS and iSCSI) on a third V-Net, and VM traffic on its own V-Net.  Each V-Net maps to the its own NIC on a blade, the same NIC on each blade.

The physical network design:

For the V-Nets, the management network went on NIC 1 as an untagged VLAN.  It has to be untagged, because when it boots up, it needs to get a DHCP address and talk to the boot server for its image.  Since it comes up untagged, it will not be able to talk out to the DHCP/PXE server if the V-Net is set to pass through tags.  The other V-Nets support tagged VLANs to further separate traffic.  Each V-Net has four links to the Cisco 6509, except for the storage V-Net, which has eight.  Two links form an LACP bundle from the active side (VC-Enet module in Bay 1), and two make up an LACP bundle (or etherchannel) from the module in Bay 2, which is the offline side.  This is repeated for the other networks across the other modules in Bays 5 and 6.  Bays 3 and 4 house the Fiber Channel modules, which I am not using.  Everything is on its own individual private 10.x.x.x network as well, except for the VM traffic net, which will contain the virtual machine traffic.

The storage design:

Like I said, a really ghetto NFS server.  It does not have enough drives, so even though it would be overkill for a home PC, it will not cut it in this situation.  I expect it to run out of steam after only a few VMs are added, but it does tie everything together and provides the shared storage component needed for HA, Vmotion, and DRS.  I am working on an afforable and acceptable solution, rack-mounted, with more gig NICs and up to 24 hot-swap drives – more spindles should offer more thoughput.  I bonded the NICs together into a single LACP link, untagged back the the Cisco, on the NFS storage VLAN.  Once working, I stripped out all unneeded packages for a very minimal 64-bit Ubuntu server.  It boots in seconds, and has no GUI.  Unfortuately, I did not get into the weeds enough to align the partitions/volumes/etc.  I just forgot to do that.  I will have to figure that out next time I get a storage box in.

The management server:

It is also on a very minimal 64-bit Ubuntu-9.04 install.  Ithas four NICs, but I only use two (the other two are only 100 MB).  The two gig NICs are also bonded into one LACP link back to the Cisco, untagged.  The server is running a stripped down 2.6.31 kernel, and has VMware Server 2.0.x installed for the vCenter Server (running on a Windows 2003 server virtual machine).  On the Ubuntu host server, I have installed and configured DHCP, TFTP, and gPXE.  I also extracted the boot guts from the ESXi 3.5.0 Update 4 ISO and set up the tftpboot directory so that each blade will get the image installed.  On the vCenter Server virtual machine, I installed the Microsoft PowerShell tool (which installed ActiveState PERL), and the VMware PowerCLI tool.  I also downloaded the midwife scripts and installed Notepad++ for easy editing.  The vCenter Server VM is on a private 10.x.x.x net for isolated management, but this gets in the way of the Update Manager plugin, so I still have some work to do later to get around this.

Really key things I learned from this:

  1. The blade server VC-Enet modules are NOT layer-2 switches.  They may look and feel that way in some aspects, but they, by design, actually present themselves to network devices as server ports (NICs), not as more network devices.  Learn about them – RTFM.  It makes a difference.  For instance, it may be useful to know that the right side bay modules are placed in standby by default, and the left-side are active – they are linked via an internal 10Gig connection.  I know of another lab with the same hardware that could not figure out why they could not connect the blade modules to the network if all the modules were enabled, so they solved it by disabling all but Bay-1, instead of learning about the features and really getting the most out of it.
  2. Beware old 64-bit CPUs.  Just because it lets you load a cool 64-bit OS on it does NOT mean it will let you load a cool 64-bit virtual machine on it.  If it does not have virtualization instruction sets in its CPU(s), you will run into failure.  I found this out the hard way, after trying to get the RCLI appliance (64-bit) from VMware in order to manage the ESXi hosts.  I am glad I failed, because it forced me to try the PowerCLI/PowerShell tools.  Without those tools, I seriously doubt I could have gotten this project working.
  3. Learn PowerShell.  The PowerCLI scripts extend it for VMware management, but there are plenty of cool tricks you can do using the base PowerShell scripts as well.  I am no fan of Microsoft, so it is not often I express satisfaction with one of their products.  Remember where you were on this day, ‘cuz it could be a while before it happens again.
  4. Name resolution is pretty important.  HA wants it in a real bad way.  Point your hosts to a DNS server, or give them identical hosts files (a little ghetto, but a good failsafe for a static environment).  I did both.
  5. Remember those Enet modules?  Remember all that cool LACP stuff I mentioned?  Rememeber RTFM?  Do it, or you will miss the clue that while the E-net modules like to play with LACP, only one link per V-Net is set active to avoid loops.  So if, on your active V-Net, you have two LACP links, each for a different tagged VLAN, and your NFS devices won’t talk to anyone, you will know that it is because it saw your iSCSI V-Net first, so it set your NFS link offline.  Meaning, the iSCSI link on Bay-1 and it’s offline twin on Bay-2 both have to fail before your NFS link on Bay-1 will come up.  Play it safe – one LACP link per V-Net per bay.  Tag over multiple VLANs on the link instead. The E-net modules only see the LACP links, and do not care if they support different VLANs – only one is set active at a time.
  6. Be careful with spanning tree (this can be said for everything related to networking).    Use portfast on your interfaces to the E-net modules, and be careful with spanning tree guards on the Cisco side.  In testing, I would find that by pulling one of the pairs in a link, it would isolate the VLAN instead of carrying on as if nothing had happened.  Turns out a guard on the interface was disabling the link to avoid potential loops.  Once I disabled that, the port-channel link functioned as desired.
  7. Doesn’t it suck to get everything working, and then not have a clean way to import in VMs?  I mean, now that you built it, how do you get stuff into it?  I ended up restructuring my NFS server and installing Samba as well.  This is because when importing a VM from the GUI (say, by right-clicking on a resource pool), the “Other Virtual Machine” option is the only one that fits.  However, it then looks for a UNC path (Windows share-style) to the .vmx file.  I could browse the datastore and do it that way, but for VMs not on the NFS datastore already, I needed to provide a means for other labs to drop in their VMs.  Samba worked.  Now they can drop in their VMs on the NFS server via Samba, and the vCenter Server can import the VMs from the same place.

Currently, we are restructuring phycial paths between labs for better management.  It is part of an overall overhaul of the labs in my building.  Once done, my next step is to start building framework services, such as repository proxy servers, WSUS servers, DHCP/DNS/file/print, RADIUS/S-LDAP/AD, etc., etc.  I also need to wrap in a management service framework as well that extends to all the labs so everyone has an at-a-glance picture of what is happening to the network and the virtual environment.  One last issue I am fighting is that I am unable to complete importing VMs I made on ESX 3.5 U2 earlier this year.  It keeps failing to open the .vmdk files.  I will have to pin that down first.

The end result?

  1. If I run the midwife service on the vCenter server and reboot a blade, it is reloaded and reconfigured within minutes.
  2. If I upgrade to beefier blades, I pop them in and let them build.
  3. If I update to a newer release of ESXi (say, update 5 or 6), I extract from the ISO to the tftpboot directory and reboot the blades.  The old configs get applied on the new updated OS.
  4. All configs are identical – extremely important for cluster harmony.  No typos.
  5. If someone alters a config and “breaks” something, I reboot it and it gets the original config applied back.
  6. If I make a change to the config, I change it in the script once, not on each blade individually.  This also allows for immediate opportunity to DOCUMENT YOUR CHANGES AS YOU GO.  Which is just a little bit important.

As stated before, this is an overview.  I will add more detailed articles later, which will include scripts and pictures as appropriate.  I am at home now and do not have access to my documentation, but once I get them, I will post some goodies that hopefully help someone else out.  To include myself.

Advertisements

More Work, More ESX…

Figured out my issue from yesterday – the Service Console NICs were on the wrong port group.  They had the right IPs but were assigned to a portgroup with a different subnet mask, so they were never talking to their gateway.  Fixed.  I knew I was being a chowderhead.

Another thing I learned about ESX and Virtual Center – importing is cool, but be careful to make sure you import machines to ESX hosts that have at least as many CPUs as the target machine.  Otherwise it’ll come over, but fail to start up, and the logs will declare failure.  Just migrate to a more suitable ESX host and start it up.

Now I have fixed almost every issue I am having (still can’t get my two newest servers to mount one particular NFS share, even though they can ping the IP – the logs still say, “no route to host”).  I’ll get to it later.  Feeling pretty good right now – why spoil it?

Some Tips on VMware ESX…

Well, it has been slow posting recently.  For a while.  OK, a long time now.  But I have been working in a lab, building a virtualization environment using VMware ESX Servers and Virtual Center, and lemme tell ya, there are a LOT of moving parts.  I thought it would be useful to jot down some of the tips I have picked up along the way.  This applies to ESX 3.5.0 update 2 and Virtual Center 2.5.0 update 1.

So here goes.  From memory, so there may be *minor* inaccuracies.

  1. Hardware:  Sure, you want as many CPU cores as you can get (VMware counts up to six cores per physical CPU as one).  Sure you want as much RAM as the machine will hold.  Of course you want terabytes of disk space (well, as much as you can get anyway).  Guess what?  You should also make sure you have plenty of network cards handy.  Whatever space isn’t taken up with fiber channel HBAs, iSCSI initiators, etc., throw a NIC in there.  A Gigabit Ethernet NIC, fiber or copper.  10 gig if you can use it.  Just make sure the cards are supported by VMware, or you may be swapping cards a lot learning the hard way…
  2. Network:  Cisco is good – CDP (Cisco Discovery Protocol) and Etherchannel are both great compliments to ESX networking.
  3. Storage:  NFS instead of iSCSI/Fiber Channel.  Huh?  Are you nuts?  Seriously, my mind was blown away at VMWorld 2008 at the sessions covering NFS access to shared storage.  On a NAS.  NetApp appears to be a natural choice, but any NFS will do in a pinch.  The VMware ESX kernel currently supports version 3 of NFS.  Some apps are better fits for FC/iSCSI SANs.  But most should work just great on a NAS over NFS, and it is *way* cheaper, easier to manage, and more flexible.  There are tradeoffs to everything, of course, so investigate closely.
  4. Which NIC is which?  Two ways to find out – being in the ESX command line is useful now.  Assuming you are plugged into a Cisco, you can use CDP.
    • ESX – Set CDP to both listen and advertise on your virtual switch(es) – the default is listen – with this command: ” esxcfg-vswitch -B both vSwitch0 ". Replace vSwitch0 with your vswitch name. Check with the same command, using -b instead of -B.
    • Cisco – Turn on CDP.
    • Cisco – ” show cdp neighbor ” will show you vmnic0, vmnic1, etc. and the Cisco port connecting them.

    Or you can do it all from ESX by plugging in the NIC to the network, and typing in at the ESX command line, ” esxcfg-nics -l “.  Plug in the NICs one at a time and rerun that command each time.  You’ll see.  Be sure to document everything.

  5. Routing:  Can’t ping a service console NIC?  Can’t get to a vmkernel NIC?  Virtual machines not talking to the rest of the network?  Make sure your default routes are set properly with ” esxcfg-route ” (for vmknics), and ” netstat -r ” for your vswif (service console) NICs.  The ” /etc/sysconfig/network ” file also has the service console default route in it for startup – make sure it is correct, change as needed.
  6. VLANs, portgroups, and vmnics:  This is tricky, and something I had to learn on my own.  The ” esxcfg-vswitch ” command lets you create and delete virtual switches, set CDP, add and remove vmnics (the physical network cards ESX detects), and add and remove portgroups (VLANs and their tags, or IDs).  The -L option links the vmnic to the vswitch, on all portgroups.  The -U unlinks.  But then there is also a -M option, which adds a vmnic to a portgroup on the switch, while -N removes vmnics from portgroups.  That is the tricky part – suppose you wanna add a vmnic to one portgroup only (say your switch has three portgroups) – first, you need to be in the command line, because the Virtual Center GUI does not seem to provide this granularity of configuration. If you add it with the -M command to that portgroup, it does not fail, and looks right, but the vmnic does not talk to anything.  You must link it (-L) to the switch first.  THEN add and remove from portgroups using the -M and -N options, one vmnic/portgroup at a time, after which your vmnic will work as you expected.  This is not documented well, and the man page does not clearly explain this, so be aware.
  7. NFS on ESX:  This is not recommended, so do not do it.  Now that you have chosen to ignore my advice, you will need to recompile the kernel with the _NFS_TCPD option set to y – you need this link.  You will need to modprobe two modules, nfs and nfsd.  You will need to start the portmap and NFS services.  You will need to edit the /etc/exports file (using the no_root_squash option) and export it.  Verify with ” showmount -e ” and ” rpcinfo -p “.  Be really careful – I have done this several times in a closed lab environment, just to learn.  But you can seriously gank up your OS trying this – do not miss a step.  I used the /usr/src/linux-2.4.xx source, and did not need to modify the makefile. One more tip – if you do this, and have separated your traffic properly (service consoles, vmotion, NFS, virtual machine nets all on separate IP networks and VLANs), you will need to add in a service console vswif that other machines can access NFS on to keep the traffic away from the service console networks – so if your NFS network traffic flows on the 10.10.11.x network and your service consoles are on the 10.10.10.x network, add a vswif to the .11 NFS network and point other servers to it.  NFS won’t see a vmkernel NIC – it needs to be something that shows up in ifconfig – a vswif.  This allows you to add it to the NFS portgroup (if you are separating the traffic via portgroups/tagged vlans on a single vswitch at layer 2 instead of using multiple vswitches – I do both).  I have had no problems doing this.  YMMV.  Not for production use – get a NetApp (with the NFS license) or build one (FreeNAS, etc.).
  8. Clusters, resource pools, and virtual machines:  So you made a cluster, added your hosts, and created some resource pools.  Ready to import a VM?  READY, SET, FAIL!  I found that importing a Virtual Server 1.x VM from a local disk copy (trying to make it as simple as possible) failed with typically helpful Virtual Center log entries.  You know, the “unknown error” type.  Off to Google.  Turns out that DRS is getting in the way of the import.  Right click on the cluster, edit the properties, set DRS to manual, and then import it directly to the ESX host in the cluster that you want it to go to.  Should work fine after that (knock, knock).  Then you can set DRS back to what it was, and drag the VM to the resource pool desired.  Some posts say to further remove the ESX host from the cluster, but setting DRS to manual was all I needed to do.
  9. ESX host settings:  When you first set up a host (the ESX server itself) on Virtual Center, make sure to set the time properly – use an NTP server source if you can.  You may also want to increase your service console memory  – I max mine out at 800 MB.  This requires a reboot of the ESX host to take effect.  Also, when making partitions during the ESX install (if you do that kind of thing – I always do it manually), make sure you set the vmkcore partition to be larger than 100 MB.  It needs a minimum of 100 MB, so set a number of 104 MB to be sure, as 100 MB may actually format to less than 100 MB, causing your install to fail.
  10. fdisk, vmkcore, and vmfs3:  If you need this, you are really in the deep water.  So, you are not happy and decided to take it out on your partition table.  Using fdisk.  At the ESX command line.  ALLRIGHTY THEN.  I assume you know exactly what the hell you are doing, cuz if you don’t , you sure will.  The partition type for vmfs3 is fb.  The partition type for vmkcore is fc.  You do not need to (and cannot) format the vmkcore partition, but will need to format the vmfs3 partition after rebooting.  You may very well be booted into a maintenance shell (not safe mode, not even that far).  If you change partitions, you change those UUIDs referenced in ” /etc/fstab ” – I got around this by mounting via /dev/ mountpoints instead of UUID within /etc/fstab.  (Hope you like vi.)  Here is a link for the lost and desperate (foolhardy and just-plain-nuts in my case).
  11. Using service consoles creatively:  This can be done, as I mentioned above with NFS on ESX.  I have a situation where I need to get to a time server on our LAN, but the ESX interfaces I want to use for NTP are on private nets – which our LAN administrator absolutely refuses to route on the LAN (good for him).  So, I added a tagged VLAN and interface on my Cisco for an unused production network, adjusted the routing on the uplink switch, and created that VLAN portgroup on all my ESX hosts’ service console vswitches.  I then added a vswif interface on that IP network to the new (NTP) portgroup, and added the service console vmnics (using the -M option) to the new portgroup.  I also had to set the Cisco up as an NTP server, using the upstream NTP server as a peer, and voila!  Accurate time on all my ESX hosts is now a reality.  May not be recommended (I don’t actually know), but you *can* use vswif interfaces for special purpose traffic needs, and still hold true to best practice guidelines (I try to keep all the backend ESX traffic on non-routing private nets).
  12. Troubleshooting extras:  This may cover a few gotchas…
    • Disable the firewall (see below examples).  It might be in the way, so get rid of it as a variable.  Don’t forget to turn it back on and configure it later!
    • Before removing a portgroup from a switch at the ESX command line, make sure you have removed all vswif, vmknic, and vmnic interfaces from the portgroup first. It has to be empty before you remove it.
    • Not sure which NIC has a driver for it?  I loaded up lots of scavenged gigabit NICs and couldn’t tell which was loading (I do things the hard way).  Match the PCI IDs from the ” esxcfg-nics -l ” command with the output of ” lspci ” to be sure.
    • Wanna change a portgroup VLAN ID? Just reissue the esxcfg-vswitch command with the new VLAN ID, like this: ” esxcfg-vswitch -p "Current Portgroup With Wrong ID" -v 74 vSwitch2 “. Now the VLAN tag is changed from whatever it was to 74. No need to remove the portgroup and do it all over.  You can also move a vswif to another portgroup in a similar manner, so you do not need to delete it and recreate it (see below for an example).
  13. Document and plan, plan and document:  This all starts with a plan.  The goal is as robust and flexible a virtual environment as you can afford to make.  You do not want to build this on a poor foundation – you do not want to rip everything up later and do it a better way.  PLAN IT OUT.  Mine took over a month of me researching and dry-running, and I am still not sure I got it all right, but it is far more sophisticated and robust than it was when I start with my original design.  Document all phases of your work.  There are a LOT of moving parts here – you just cannot do this without ruthlessly precise documentation.  No time to be lazy or cut corners – your environment did not come cheap, and it may very well become a critical part of your network.  It has to be your best effort.  Use the best practices available from VMware, Cisco, NetApp, etc.
  14. ESX command line examples:  Here are a few of the really useful ones I have gotten comfortable with… See the man pages for more.
    • Add a vswitch – ” esxcfg-vswitch -a vSwitch4
    • Add a portgroup – ” esxcfg-vswitch -A "My New Portgroup" vSwitch4
    • Add a VLAN tag – ” esxcfg-vswitch -p "My New Portgroup" -v 33 vSwitch4
    • Add a NIC to a vswitch – ” esxcfg-vswitch -L vmnic2 vSwitch4
    • Add a NIC to a portgroup – ” esxcfg-vswitch -M vmnic2 -p "My New Portgroup" vSwitch4
    • Remove a NIC from a portgroup – ” esxcfg-vswitch -N vmnic2 -p "My Old Portgroup" vSwitch4
    • Remove a portgroup – ” esxcfg-vswitch -D "My Old Portgroup" vSwitch4
    • List your vswitches – ” esxcfg-vswitch -l
    • List your vswif NICs – ” esxcfg-vswif -l
    • List your vmkernel NICs – ” esxcfg-vmknic -l
    • List your physical NICs – ” esxcfg-nics -l
    • Add a vswif to a portgroup – ” esxcfg-vswif -a vswif7 -i 10.11.12.3 -n 255.255.255.192 -p "My New Portgroup"
    • Move a vswif to a different portgroup – ” esxcfg-vswif -p "new portgroup" vswif3
    • Add a vmkernel NIC – ” esxcfg-vmknic -a -i 10.12.13.4 -n 255.255.255.192 "My Other New Portgroup"
    • Temporarily disable the firewall for troubleshooting purposes – ” esxcfg-firewall --allowIncoming --allowOutgoing

Well, this ends a pretty long post.  More to come as I progress through this.