Now Using Kubuntu 10.04 Lucid Lynx…

Well, I finally upgraded my work machine from Jaunty to Lucid about a month ago, and really liked what I saw.  I was using 64-bit, and got sick and tired of all the little issues with flash, Java, and Acrobat Reader., so I switched to 32-bit instead.  KDE4 seems much more stable and polished now, and I can sign PDFs with my smartcard now in Acrobat Reader.  Since it worked so well at work, I went ahead and upgraded at home after a couple weeks.  This involved swapping my media computer with my main computer (the old RAID SATA setup I have is getting a little squirrelly), and rebuilding both.  The RAID computer was built using the Alternate Install ISO, which worked well.  In both cases, I lost no data unless I chose to, so the 300 GB of movies I had copied from our DVDs was wiped from the old media server.  I figure I can always recopy them in a smaller format later.  Yesterday, I updated my wife’s laptop, completely rebuilding it (wiped everything after backing up the user data).  I restored her data later and nothing was lost.

Some common things I am doing to customize my Lucid installs of Kubuntu are:

  1. sudo wget --output-document=/etc/apt/sources.list.d/medibuntu.list http://www.medibuntu.org/sources.list.d/$(lsb_release -cs).list && sudo apt-get --quiet update && sudo apt-get --yes --quiet --allow-unauthenticated install medibuntu-keyring && sudo apt-get --quiet update (from https://help.ubuntu.com/community/Medibuntu)
  2. sudo apt-get --yes install app-install-data-medibuntu apport-hooks-medibuntu
  3. sudo apt-get install libdvdcss2 w32codecs
  4. Update to a later kernel (currently 2.6.35-17) – sudo add-apt-repository ppa:kernel-ppa/ppa && sudo apt-get update
  5. sudo apt-get install linux-headers-2.6.35-17 linux-headers-2.6.35-17-generic linux-image-2.6.35-17-generic linux-maverick-source-2.6.35
  6. Update to a later version of KDE4 (currently KDE 4.4.5) – sudo add-apt-repository ppa:kubuntu-ppa/ppa && sudo apt-get update && sudo apt-get dist-upgrade

So far, things work very well.  The computer with squid, squidGuard, and dansguardian is not going to be upgraded, however.  Another thing – no more XFS.  I now use EXT4 with everything, and have a separate /boot partition.  This is so I can more easily convert to btrfs when 2.6.26 comes out.  I read that btrfs suffered a large performance regression in the 2.6.35 kernel, so I will hold out for the 2.6.36 kernel instead.

Stateless VMware ESXi 3.5 on an HP c7000 Blade Server…

NOTE:  This is only an overview.  Due to the detailed nature of this project, I will break it up over several more-focused articles over time for easier reference.

Well, despite my more negative impression of this year’s VMworld conference, it still really paid off.  There I learned about stateless ESX deployment.  Using this information, I was able to build in my lab, after a couple months of trial and error, a highly robust VMware environment, fully managed and licensed, using the midwife scripts I modified for this effort.  And configuration is hands-free.

Here are the system components:

  • SERVER – HP c7000 Blade Enclosure with sixteen Bl465c blades, two 4 GB FC modules, and four VC Enet modules
  • Each blade has two dual-core AMD CPUs, 16 GB RAM, two 72 GB SAS drives (hardware RAID-1), two embedded gig NICs, and a mezzanine card with two more gig NICs/iSCSI initiators and two FC HBAs
  • NETWORK – Cisco 6509 with two SUP 720 cards, two 48 port LC Gig-E fiber cards, and four 48 port gig copper cards
  • MANAGMENT – Dell 1850 with two 146 GB SAS drives (hardware RAID-1) for management and boot services
  • STORAGE – Scavenged proof-of-concept totally ghetto Dell Optiplex desktop with four internal 1.5 TB SATA drives (software RAID-10 formatted with tuned XFS) providing 3 TB of NFS shared storage
  • Scavenged HP IP-KVM box for OOB-management of the two Dells

Here are the steps I took:

  1. First I had to update all the firmware on the blade server.  This includes the two OA cards for the Onboard Administrator, the Virtual Connect software, the iLO2 software for each blade, the BIOS on each blade, and the Power Management Controller firmware.  There is a particular order this is done in, and it is not easy, but it really needs to be done.  The fixes that come with these updates are often vital to success.  Overall, I spent a week researching and updating.  I set all the blades to boot via PXE.
  2. Next, I built the storage server.  I really had no choice – nothing was available but a Dell Optiplex desktop.  It had four internal SATA ports available, and room for four 1GB RAM modules.  It also had a single dual-core Intel CPU and PCI slots for more NICs, and a PCI-Express mini-slot as well.  I had to order parts, and it took a little while, but once done, it had a total of four gig NICs (one embedded, two PCI, one PCI-Express), four 1.5 TB SATA drives, and 4 GB RAM.  I loaded it with 64-bit Ubuntu-9.04, hand-carved the partitions and RAID-10 setup, formatted the 3 TB volume with XFS, tuned as best I knew how, and then put it on the 2.6.31 kernel (I later updated it to 2.6.31.5).  There were no BIOS or other firmware updates needed.
  3. I then built the management server on the Dell 1850.  It only has one power supply (I cannot find a second one), but it does have 8 GB RAM and two dual-core CPUs.  I loaded 64-bit Ubuntu-9.04 on it afte installing two 146 GB SAS drives in a RAID-1 mirror (hardware-based).  I also updated the BIOS and other firmware on it.
  4. Having these components in place, I studied the blade server to see what I could get away with, and ultimately decided to use each NIC on a blade server to support a set of traffic types, and balanced the likelyhood of traffic demands across them.  For example, Vmotion traffic, while it may be intense, should be relatively infrequent, so it shares a V-Net with another type of traffic that is low-bandwidth (the alternate management  network).  Altogether, I ended up with a primary management network on up V-Net, Vmotion and the alternate on another V-Net, storage traffic (NFS and iSCSI) on a third V-Net, and VM traffic on its own V-Net.  Each V-Net maps to the its own NIC on a blade, the same NIC on each blade.

The physical network design:

For the V-Nets, the management network went on NIC 1 as an untagged VLAN.  It has to be untagged, because when it boots up, it needs to get a DHCP address and talk to the boot server for its image.  Since it comes up untagged, it will not be able to talk out to the DHCP/PXE server if the V-Net is set to pass through tags.  The other V-Nets support tagged VLANs to further separate traffic.  Each V-Net has four links to the Cisco 6509, except for the storage V-Net, which has eight.  Two links form an LACP bundle from the active side (VC-Enet module in Bay 1), and two make up an LACP bundle (or etherchannel) from the module in Bay 2, which is the offline side.  This is repeated for the other networks across the other modules in Bays 5 and 6.  Bays 3 and 4 house the Fiber Channel modules, which I am not using.  Everything is on its own individual private 10.x.x.x network as well, except for the VM traffic net, which will contain the virtual machine traffic.

The storage design:

Like I said, a really ghetto NFS server.  It does not have enough drives, so even though it would be overkill for a home PC, it will not cut it in this situation.  I expect it to run out of steam after only a few VMs are added, but it does tie everything together and provides the shared storage component needed for HA, Vmotion, and DRS.  I am working on an afforable and acceptable solution, rack-mounted, with more gig NICs and up to 24 hot-swap drives – more spindles should offer more thoughput.  I bonded the NICs together into a single LACP link, untagged back the the Cisco, on the NFS storage VLAN.  Once working, I stripped out all unneeded packages for a very minimal 64-bit Ubuntu server.  It boots in seconds, and has no GUI.  Unfortuately, I did not get into the weeds enough to align the partitions/volumes/etc.  I just forgot to do that.  I will have to figure that out next time I get a storage box in.

The management server:

It is also on a very minimal 64-bit Ubuntu-9.04 install.  Ithas four NICs, but I only use two (the other two are only 100 MB).  The two gig NICs are also bonded into one LACP link back to the Cisco, untagged.  The server is running a stripped down 2.6.31 kernel, and has VMware Server 2.0.x installed for the vCenter Server (running on a Windows 2003 server virtual machine).  On the Ubuntu host server, I have installed and configured DHCP, TFTP, and gPXE.  I also extracted the boot guts from the ESXi 3.5.0 Update 4 ISO and set up the tftpboot directory so that each blade will get the image installed.  On the vCenter Server virtual machine, I installed the Microsoft PowerShell tool (which installed ActiveState PERL), and the VMware PowerCLI tool.  I also downloaded the midwife scripts and installed Notepad++ for easy editing.  The vCenter Server VM is on a private 10.x.x.x net for isolated management, but this gets in the way of the Update Manager plugin, so I still have some work to do later to get around this.

Really key things I learned from this:

  1. The blade server VC-Enet modules are NOT layer-2 switches.  They may look and feel that way in some aspects, but they, by design, actually present themselves to network devices as server ports (NICs), not as more network devices.  Learn about them – RTFM.  It makes a difference.  For instance, it may be useful to know that the right side bay modules are placed in standby by default, and the left-side are active – they are linked via an internal 10Gig connection.  I know of another lab with the same hardware that could not figure out why they could not connect the blade modules to the network if all the modules were enabled, so they solved it by disabling all but Bay-1, instead of learning about the features and really getting the most out of it.
  2. Beware old 64-bit CPUs.  Just because it lets you load a cool 64-bit OS on it does NOT mean it will let you load a cool 64-bit virtual machine on it.  If it does not have virtualization instruction sets in its CPU(s), you will run into failure.  I found this out the hard way, after trying to get the RCLI appliance (64-bit) from VMware in order to manage the ESXi hosts.  I am glad I failed, because it forced me to try the PowerCLI/PowerShell tools.  Without those tools, I seriously doubt I could have gotten this project working.
  3. Learn PowerShell.  The PowerCLI scripts extend it for VMware management, but there are plenty of cool tricks you can do using the base PowerShell scripts as well.  I am no fan of Microsoft, so it is not often I express satisfaction with one of their products.  Remember where you were on this day, ‘cuz it could be a while before it happens again.
  4. Name resolution is pretty important.  HA wants it in a real bad way.  Point your hosts to a DNS server, or give them identical hosts files (a little ghetto, but a good failsafe for a static environment).  I did both.
  5. Remember those Enet modules?  Remember all that cool LACP stuff I mentioned?  Rememeber RTFM?  Do it, or you will miss the clue that while the E-net modules like to play with LACP, only one link per V-Net is set active to avoid loops.  So if, on your active V-Net, you have two LACP links, each for a different tagged VLAN, and your NFS devices won’t talk to anyone, you will know that it is because it saw your iSCSI V-Net first, so it set your NFS link offline.  Meaning, the iSCSI link on Bay-1 and it’s offline twin on Bay-2 both have to fail before your NFS link on Bay-1 will come up.  Play it safe – one LACP link per V-Net per bay.  Tag over multiple VLANs on the link instead. The E-net modules only see the LACP links, and do not care if they support different VLANs – only one is set active at a time.
  6. Be careful with spanning tree (this can be said for everything related to networking).    Use portfast on your interfaces to the E-net modules, and be careful with spanning tree guards on the Cisco side.  In testing, I would find that by pulling one of the pairs in a link, it would isolate the VLAN instead of carrying on as if nothing had happened.  Turns out a guard on the interface was disabling the link to avoid potential loops.  Once I disabled that, the port-channel link functioned as desired.
  7. Doesn’t it suck to get everything working, and then not have a clean way to import in VMs?  I mean, now that you built it, how do you get stuff into it?  I ended up restructuring my NFS server and installing Samba as well.  This is because when importing a VM from the GUI (say, by right-clicking on a resource pool), the “Other Virtual Machine” option is the only one that fits.  However, it then looks for a UNC path (Windows share-style) to the .vmx file.  I could browse the datastore and do it that way, but for VMs not on the NFS datastore already, I needed to provide a means for other labs to drop in their VMs.  Samba worked.  Now they can drop in their VMs on the NFS server via Samba, and the vCenter Server can import the VMs from the same place.

Currently, we are restructuring phycial paths between labs for better management.  It is part of an overall overhaul of the labs in my building.  Once done, my next step is to start building framework services, such as repository proxy servers, WSUS servers, DHCP/DNS/file/print, RADIUS/S-LDAP/AD, etc., etc.  I also need to wrap in a management service framework as well that extends to all the labs so everyone has an at-a-glance picture of what is happening to the network and the virtual environment.  One last issue I am fighting is that I am unable to complete importing VMs I made on ESX 3.5 U2 earlier this year.  It keeps failing to open the .vmdk files.  I will have to pin that down first.

The end result?

  1. If I run the midwife service on the vCenter server and reboot a blade, it is reloaded and reconfigured within minutes.
  2. If I upgrade to beefier blades, I pop them in and let them build.
  3. If I update to a newer release of ESXi (say, update 5 or 6), I extract from the ISO to the tftpboot directory and reboot the blades.  The old configs get applied on the new updated OS.
  4. All configs are identical – extremely important for cluster harmony.  No typos.
  5. If someone alters a config and “breaks” something, I reboot it and it gets the original config applied back.
  6. If I make a change to the config, I change it in the script once, not on each blade individually.  This also allows for immediate opportunity to DOCUMENT YOUR CHANGES AS YOU GO.  Which is just a little bit important.

As stated before, this is an overview.  I will add more detailed articles later, which will include scripts and pictures as appropriate.  I am at home now and do not have access to my documentation, but once I get them, I will post some goodies that hopefully help someone else out.  To include myself.

HOWTO – 64-bit Kernel 2.6.31 and VMware Server 2.0.1…

Assuming you have already installed the 2.6 31 kernel, this link has a patch and script to modify the modules VMware compiles when you run the vmware-config.pl script.  The script is for 2.6.30.4 and later kernels, and works fine for 2.6.31.

  1. Run the vmware-install.sh script that came with VMware Server 2.0.1, but DO NOT run the vmware-config.pl script at the end.
  2. Get the patch script – vmware-server.2.0.1_x64-modules-2.6.30.4-fix.sh and make it executable.
  3. Get the patch – vmware-server.2.0.1_x64-modules-2.6.30.4-fix.patch.
  4. Make a directory, say, /usr/src/vmware-patches and cd to it.
  5. Copy the patch, the script and the four module sources (/usr/lib/vmware/modules/source/*.tar) to the patch directory you are now in.
  6. Run the patch – it should build for 64-bit systems.  I do not know about 32-bit systems…
  7. Run the vmware-config.pl command, and install as normal.

There have been reports of minor script errors, so you may need to make some slight edits.  Or you may not – I had no trouble.  If you need to reinstall, make sure you stop the vmware services, rmmod the vmware modules, and delete everything in the /usr/lib/vmware/modules directory before re-running the installer-patch-config steps above.  You will also need to delete the modules from your system – running the installer should generate a failure message telling you what files to delete from where.  Successfully running the installer will put everything you need back in the /usr/lib/vmware/modules directory.

Big thanks to meubeukeu and michelemase for their work in making these patches!

Japanese TV on Linux – KeyHoleTV…

My wife loves watching her TV, but it is too expensive via satellite, we do not have (or want) cable, and the Internet services she WAS using have either gotten too popular (PandaTV) or have been shutdown (J-NetTV).

So she found KeyHoleTV.  On the Xorsyst web site, it is explained that this is part of a test program run by the Japanese Ministry of Internal Affairs to test and demonstrate P2P technology.  They have builds for Windows, Mac, and Linux (32-bit and 64-bit).  It is simple to use and works pretty well.  The streams cannot be downloaded (easily, anyway) for archiving, and video and audio quality is not perfect, but it ain’t bad either.  And it is free.

It does seem to be down right now (first time since we started using it 2 months ago), but that is a side effect of a test program, right?  Try it out if you miss Japanese TV.

Update:  Seems to be back up now.

VMware Server 2.0.1 and Kernel 2.6.30.1…

I finally decided to get VMware Server running on my new kernel.  Whenever the kernel is updated, there are some things you can count on having to reinstall, such as NVidia video drivers and VMware installations.   I expected problems, so my methodology was to attempt a normal install, expect failure, and search on the resulting errors.  This did not pan out, so I tried the VMware Community Forums, and I found this little gem on how to patch the VMware modules:

This apparently works with 32-bit as well, but may not be confirmed.

I downloaded the patch and shell script, ran the script, and followed the directions of the output:

  • Move original files that could cause issues with VMware – “mv /usr/lib/vmware/modules/binary /usr/lib/vmware/modules/binary-orig
  • Run the config again, without the -d option (otherwise, root would be the only user allowed to log into the web interface) – “vmware-config.pl

Essentially, there were no problems getting everything running.  Now I have to figure out what my password was to log into my Windows XP VM.  I have to complete some online training that can only be done in Windows (thanks a ton).  I would hate to have to crack my way in to my own VM….

Huge thanks out to both michelmase and Krellan for the patches and scripts!

Kubuntu 9.04 64-Bit, Kernel 2.6.30.1, and NVIDIA…

I went ahead and decided to upgrade my kernel, and to go to the latest NVIDIA driver (180.51).  I downloaded the kernel and the nvidia driver file, built the kernel, and removed the nvidia restricted driver.  This is on a 64-bit build of Kubuntu 9.04.

However, I was not done.

When I tried to install the kernel image file, I kept getting dkms errors relating to nvidia-common.  I eventually removed the nvidia packages using “apt-get remove --purge nvidia*” (as root), but this still would not allow me to install the kernel. Also my xorg.conf file was empty.

I fixed xorg by typing “dpkg-reconfigure xserver-xorg” and adding in the line “Driver "vesa" ” to the “Device” section, so I would have something when I rebooted.

Only when I removed dkms (“apt-get remove dkms“), was I able to install kernel 2.6.30.1.  I use lilo since I run RAID-10, but did not have to update the /etc/lilo.conf file.  Upon reboot, I stopped X with “/etc/init.d/kdm stop“.

I next installed the NVidia driver first (and chose to install the 32-bit compatibility files as well). After that, I ran “nvidia-xconfig” and my xorg file was ready.  When testing with the “X” command, it just pulled up a blank screen, but I took a chance and started KDM (“/etc/init.d/kdm start“).

Everything came up fine.  Typing “glxgears” in a terminal showed decent enough acceleration (about 3000 fps).

So far, no other ill effects. And no firmware issues.

Kubuntu 9.04 and Flash Audio…

I finally got sound to work reliably.  Here is what did NOT work:

  • “touch .asoundrc” in the home directory
  • remove and reinstall non-free flash and the installer
  • ensuring the PCM channel was unmuted and not turned down

I did have to make sure that the correct sound card was selected (I have two).  This made a difference for system-wide purposes, but as far as flash went, sites like YouTube were mute.

I finally found a site that gave instructions to install a pulse audio management tool – padevchooser.  I don’t remember which site, because I must have trolled dozens, but this solution worked every time (I had to do this once for each user, as each user).

Opening a terminal and running padevchooser opened up the app in the system tray.  Left clicking it brought up the context menu.  Selecting Volume Control, and the Output Devices tab, I was able to ensure that the correct card was the default.  On the Playback tab, I was able to move the stream to the correct card.  For some reason, they all seemed to default on the other unused card, which is integrated into the motherboard.

Once I did these things, Flash audio was just fine.