Possibilities Within ESX…

As I learn more about VMware ESX, I am starting to see the flexibility and possibilities available.  You have five major sets of pieces to play with – vswifs, vmknics, portgroups, vswitches, and vmnics.

  • You can tag or untag your portgroups, and can assign multiple portgroups to a vmnic.
  • You can have multiple vswifs on multiple vswitches.
  • You can have multiple vmnics assigned to a portgroup.
  • You can have vswitches with no uplinks (no vmnics assigned).
  • You can have portgroups with no uplinks (no vmnics assigned).
  • You can have vswifs assigned to non-service console portgroups for different traffic cases.
  • You can have up to 100 vswifs (0 to 99).

Things I have yet to determine on my own:

  • How many vmknics can you have?  I assume 100 also – you do not name them like you do with vswifs; you create and assign them to portgroups and they are automatically named and numbered.
  • Can a portgroup span multiple vswitches?  I don’t see why not.
  • Can a vmnic be assigned to multiple vswitches?  I think so…

I am sure that I will come up with plenty more questions.

Then throw in the firewall configs, and appliance VM’s (like firewall/IDS/IPS/proxy devices).  I saw demonstrations of an entire DMZ within a physical server, using such appliances spanning multiple vswitches (some with uplinks, some without).  Talk about amazing – I had not even considered thinking in that direction.  Just imagine how you can move all these pieces around to create new network functionality within an ESX host server.  The more complex it gets, though, the more you [A.] need to know the ESX command line, and [B.] need a kickstart script on a floppy to autoconfigure your stroke of genius onto new ESX servers you deploy.  (Because hand-jamming sucks.)

And finally – this is just the ESX side.  VIM comes along and adds in clusters, resource pools, the concept of shares, VMotion, HA, and DRS, just to name a few.  All configurable, and with a new set of caveats, such as:

  • DRS, VMotion, and HA need shared storage (SAN, iSCSI, or NFS) available before they are enabled.
  • DRS needs to be set to Manual when importing VMs from images or machines – deploying from templates does not (I think).
  • DRS and HA are available only for hosts within a cluster (I think).
  • HA, I believe, requires identical network configs on each ESX host in the cluster to work – so if you build your cluster out of dissimilar junk machines like I have (it’s all I have to work with for now), with different NIC quantities, portgroup assignments, and so on, then HA probably won’t work.  At least, it doesn’t for me, and the differing network configs are the first thing I would suspect.  And if you think it through, it sorta makes sense that it won’t work.

When VMware and Cisco come out with the virtual switch concept they discussed at VMworld2008, this HA limitation should change.  This is where, as I understand it, essentially the network configs are shadowed across each clustered host.  The Cisco switch interconnecting them is reconfigured when a HA event happens to allow those network changes incurred to function.  I think this is basically how it is supposed to work.  Too cool, eh?


Need to Fix NFS…

It occurs to me that I had better fix that NFS issue I am having.  Why?  Well, if I have five servers clustered, and three can mount the NFS datastore with VMs on it, could there be a chance of DRS moving a VM to a server not talking to the VM’s NFS orgin?  I do not think so, but if true, things would fail.  If not true, then my cluster is only as good as three servers, not five.

My strategy:  Mount at the command line on one of the ESX servers first to test.  If that fails, unmount the same NFS share from one of the other servers and try to remount it, from within VIM.  This will tell me quite a lot about what is going on, I hope.  The vmknics on four of the servers (two that can and the two that cannot mount) are on the same subnet, which differs from the subnet the NFS mount is on.  So why can two mount, but the other two not?  They fail instantly, so it is not a timeout.  The firewalls are all off for now, so that is not part of the issue.

And of course, dig through the logs on each of the servers – /usr/var/messages, /vmkernel, /vmksummary, and /vmkwarning at a minimum.

My task list has otherwise been eradicated in the past week (YES!) – outside of NFS, all that really remains is for me to build a golden master of Windows 2003 Server, and maybe fork some application templates (DHCP, DNS, print, AD, web, SQL, FTP, etc.) off of it.  Cake, right?