VMware-Server 2.0 Won’t Start a VM…

The title says it all.  I had taken a snapshot earlier of the VM.  Later, my machine locked for some reason last night, and I had to reset it hard.  Even the magic keys didn’t save me.  Everything seemed to come up ok, but this morning when I tried to start the VM, it failed with the error:

Cannot open the disk ‘/home/vmguests/WinXP/WinXP-000001.vmdk’ or one of the snapshot disks it depends on.  Reason:  Failed to lock the file.

I found that by deleting all the .lck folders in the VM directory (as root), I was able to start it normally.

Advertisements

Impressions of Kubuntu 9.04 and VMware-Server 2.0.1…

So far, RAID-10/LVM/XFS is working quite well with Kubuntu 9.04.  Jaunty picks up hardware effortlessly.  I plugged in a USB thumb drive, and a little notification pops up.

Ok.

I plug in my camera, and it sees it fine,no muss, no fuss.

Better.

I plug in my webcam – no notification, it just works.

Sweeeet.

I plug in my HP printer, and I have to dig around to see that it was added as quietly and politely as you please, ready to print.

Awesome.

I ran out of things to plug in.  Kubuntu 8.04 (the previous version I was using) didn’t boot nearly as quickly, took longer to load the desktop after login, and was good about detecting devices, mostly, but needed polish and charm.

9.04 has it in spades.  I am really quite impressed with the hardware cababilities of it.  There are some programs, like adept, I am missing, but the learning curve for the newer stuff is really more like a learning bump.

Update:  It even loaded the sensors package to track temperatures.  Wow.

I am running 64-bit now, and flash and java work fine.  It took me a while to find the right libjavaplugin and link it into the Firefox plugins folder, but flash 10 worked fine and installed easily.

VMware-Server is a different story.  The 64-bit is slow, flaky, and cranky.  It times out all the time, it resets often, and it just stalls doing stuff.  I now have a VM ready for loading, but it took all day to fight it into doing so.  And I found no reliable cure, to include swapping out the java jre version used for a later version.  I am really dissapointed with the 2.0.1 release in terms of ease of install, performance, and reliability.  Oh well, at least it installed without needing a special patch or script.

Update:  After a huge fight, I got a new Windows XP VM made.  Using the command ‘watch “du -s –si /home/vmguests/WinXP” ‘, I was able to get a sense of the speed of the file system when I was creating the virtual disk files.  I chose to make one large file at once for each of the two disks; C drive (15 GB), and E drive (48 GB).  With the watch command updating every two seconds, I was able to see that the RAID-10 XFS filesystem was handling about 100 Mbps as the disk files were created.

Once I had made theVM, loading it was uneventful.  Just a regular Windows XP professional install, like any other.  The vmware-server played nice mostly after that and has continued to do so.  I have only had to log out once due to unresponsiveness, and have not had to restart the server services.  The VM is quite fast, and allows my wife to see her video streams in Media Player 11 with only minor stuttering of the video.   Audio is fine.

I really like the USB visibilty of vmware-server.  The VM picked up the printer as if it were directly connected, and once I loaded the drivers for it, I was printing from the VM like normal.  All of my USB devices can be presented to the VM, which is an area I had problems with in the past with the 1.x versions of vmware-server.

Anyway, my wife is set up with her login and has a shortcut to RDP to the Windows XP VM, where she can login and watch her JNet streams.

HOWTO – Kubuntu 9.04, RAID-10, LVM2, and XFS…

Time to rebuild the Beast…

The poor thing had been in sparse use since it started shutting down (really, just POOF! –  it may as well be unplugged) randomly.  I didn’t pursue it for months, because I am fundamentally lazy at home.

(I think I mentioned this before.)

But with a trip coming up, and me needing a laptop for it, and my wife saying I could have hers if I got her Windows-based internet movie playing experience working on another computer, and the release of Ubuntu 9.04, well, let’s just say the planets finally aligned.

So I burned several CD’s – Xubuntu Live 9.04 (32-bit), MythBuntu 9.04 (64-bit), Kubuntu 9.04 Live (32-bit), and Kubuntu 9.04 Alternate Install (64-bit).  I rebuilt my targeted media PC first as Kubuntu.  I got a 1 TB SATA drive, put it in my beast computer, and used scp to back up everything in the house to it (execpt the Vista laptop).  It power dropped several times before I finally got it all (I hope).  Then I popped it into my media PC (not Microsoft – I call it that because I have it hooked up to the VGA port on my flat-screen TV).  I then loaded the Live CD of Kubuntu 32-bit and will use it as a backup in case my beastie dies while I am away.  This way, my wife is not stuck unable to watch her Japanese TV program downloads.

You have no idea how important that is to maintaining a happy family.  Seriously.

Anyway, enough boring crap.

I first installed the seven SATA drives I had (four pulled from the media PC, one was already installed, and two were sitting in a drawer), each identical 80 GB Hitachis, and left them powered up overnight to find any serious drive errors.  I got seek errors on the one I suspected of being bad, and tossed it.  Trust me – it was bad.

(That may explain why it was sitting in a drawer…)

The other six have stayed quite civilized.  Maybe they got the hint.

After trial and error, I used a combination of the Live CD to google and hand-build the file system, and the alternate 64-bit CD to install.

Why hand-build?  I guess that’s just how I roll…  And it gave me total control over how I built it.

Playing around with the drives, I found I could reliably pop the power just by running “hdparm -Tt /dev/sda”, so off I went to get a new power supply.  I found a 650W PS that more than makes up for my failing 450W PS, and let me clean up my cable mess as well.  Out with the old, in with the new, and everything is smooth as silk.

Back to googling, I found a collection of sites that allowed me to piece together what I think, and hope, is a very solid compromise between performance and reliability.  Space is not too much of an issue, since only one VM will be running on this system, and we are not huge downloaders.  As long as it has more space than the laptop (160 GB), it is fine.

System specs:

  • Athlon FX-53 (the old obsolete server-board-based one with 959 pins or something).
  • 2 GB of registered memory, I forget how fast.
  • An old NVidia AGP 7600 GT card (I think).
  • Four SATA ports onboard (two controllers, no hardware RAID enabled).
  • One four port add-in PCI SATA controller (RAID disabled).
  • No special BIOS tweaks.
  • Six SATA drives, 80 GB each, /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf.
  • Fans.  Lots of fans.
  • 64-bit Kubuntu 9.04, alternate install CD.  Supports VMware-Server 2, and can run 64-bit and 32-bit virtual machine guests.

From the Live CD:

Opening Konsole and Konqueror:

sudo -i

Partitioning the drives:

cfdisk /dev/sda

  • sda1 primary 82 MB type FD (Linux RAID), bootable – this will be the RAID-1 /boot partition of six drives (ext3)
  • sda2 primary 404 MB type FD – this will be the swap partion on RAID-10 and LVM
  • sda3 primary 82 GB type FD – this will be the OS partition on RAID-10 and XFS

sfdisk -d /dev/sda | sfdisk /dev/sdb

sfdisk -d /dev/sda | sfdisk /dev/sdc

sfdisk -d /dev/sda | sfdisk /dev/sdd

sfdisk -d /dev/sda | sfdisk /dev/sde

sfdisk -d /dev/sda | sfdisk /dev/sdf

REBOOT (power cycle), run Live CD again, same apps opened:

sudo -i
apt-get install mdadm lvm2 (the live CD does not get RAID and LVM on its own – so install them)

RAID-1 and RAID-10 (all active, no spares):
Link = http://www.howtoforge.org/install-ubuntu-with-software-raid-10

  • boot partition: mdadm -v -C /dev/md0 -c 256 -n 6 -l 1 /dev/sd[abcdef]1 – RAID1 so LILO can boot it, all drives for max redundancy.
  • swap partition: mdadm -v -C /dev/md1 -c 256 -n 6 -l 10 -p f6 /dev/sd[abcdef]2
  • os partition: mdadm -v -C /dev/md2 -c 256 -n 6 -l 10 -p f2 /dev/sd[abcdef]3

cat /proc/mdstats to see RAID sets:

md2 : active raid10 sda3[0] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1]
239817216 blocks 256K chunks 2 far-copies [6/6] [UUUUUU]
[=================>…] resync = 86.1% (206697408/239817216) finish=12.6min speed=43546K/sec

md1 : active raid10 sda2[0] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1]
393472 blocks 6 near-copies [6/6] [UUUUUU]

md0 : active raid10 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
79872 blocks 256K chunks 6 far-copies [6/6] [UUUUUU]

unused devices:

Next, set up LVM:
Link = http://www.linuxdynasty.org/lvm2-how-to.html

Physical Volumes:

  • pvcreate /dev/md0
  • pvcreate /dev/md1
  • pvcreate /dev/md2

Volume Groups:
create, with useful names:

  • vgcreate boot-vg /dev/md0
  • vgcreate swap-vg /dev/md1
  • vgcreate os-vg /dev/md2

activate:

  • vgchange -a y boot-vg
  • vgchange -a y swap-vg
  • vgchange -a y os-vg

pvdisplay and pvscan to see physical volumes:

PV /dev/md2 VG os-vg lvm2 [228.71 GB / 4.00 MB free]
PV /dev/md1 VG swap-vg lvm2 [384.00 MB / 0 free]
PV /dev/md0 VG boot-vg lvm2 [76.00 MB / 0 free]
Total: 3 [229.16 GB] / in use: 3 [229.16 GB] / in no VG: 0 [0 ]

vgdisplay and vgscan to see volume groups:

Reading all physical volumes. This may take a while…
Found volume group “os-vg” using metadata type lvm2
Found volume group “swap-vg” using metadata type lvm2
Found volume group “boot-vg” using metadata type lvm2

Logical Volumes, create with useful names:

  • lvcreate -L 76M -n boot-lv boot-vg
  • lvcreate -L 384M -n swap-lv swap-vg
  • lvcreate -L 10G -n root-lv os-vg
  • lvcreate -L 2G -n var-lv os-vg
  • lvcreate -L 3G -n temp-lv os-vg
  • lvcreate -L 213.7G -n home-lv os-vg

lvdisplay and lvscan to see logical volumes:

ACTIVE ‘/dev/os-vg/root-lv’ [10.00 GB] inherit
ACTIVE ‘/dev/os-vg/var-lv’ [2.00 GB] inherit
ACTIVE ‘/dev/os-vg/temp-lv’ [3.00 GB] inherit
ACTIVE ‘/dev/os-vg/home-lv’ [213.70 GB] inherit
ACTIVE ‘/dev/swap-vg/swap-lv’ [384.00 MB] inherit
ACTIVE ‘/dev/boot-vg/boot-lv’ [76.00 MB] inherit

So far, partitioning, RAID-10, and LVM are done. Format using swap, ext3 (boot) and XFS:

  • mkfs.ext3 /dev/boot-vg/boot-lv
  • mkswap /dev/swap-vg/swap-lv

XFS Links:
http://www.csamuel.org/2008/03/23/btrfs-013-and-xfs-benchmarks
http://oss.oracle.com/projects/btrfs/dist/documentation/benchmark.html
http://everything2.com/index.pl?node_id=1479435

http://www.issociate.de/board/post/472270/New_XFS_benchmarks_using_David_Chinner%27s_recommendations_for_XFS-basedoptimizations..html

  • mkfs.xfs -f -d agcount=1 -i attr=2 -l lazy-count=1,size=128m,version=2 /dev/os-vg/root-lv
  • mkfs.xfs -f -d agcount=1 -i attr=2 -l lazy-count=1,size=128m,version=2 /dev/os-vg/var-lv
  • mkfs.xfs -f -d agcount=1 -i attr=2 -l lazy-count=1,size=128m,version=2 /dev/os-vg/temp-lv
  • mkfs.xfs -f -d agcount=1 -i attr=2 -l lazy-count=1,size=128m,version=2 /dev/os-vg/root-lv
  • mkfs.xfs -f -d agcount=1 -i attr=2 -l lazy-count=1,size=128m,version=2 /dev/os-vg/home-lv

df -h to see:

/dev/mapper/os–vg-root–lv
9.9G 4.1M 9.9G 1% /target
/dev/mapper/os–vg-var–lv
1.9G 4.1M 1.9G 1% /target/var
/dev/mapper/os–vg-temp–lv
2.9G 4.1M 2.9G 1% /target/temp
/dev/mapper/os–vg-home–lv
214G 4.1M 214G 1% /target/home
/dev/mapper/boot–vg-boot–lv
74M 5.6M 65M 8% /target/boot

Install using the 64-bit Alternate Install CD for Kubuntu 9.04.  Use the ext3 partition for /boot, the XFS partitions for /, /var, /tmp, and /home, and use the swap partition.  Do not format anything  – it will then only demand to format the swap partition.  I hand-formated to get additional control over how XFS was formatted.

I always separate /home to survive any OS rebuilds I might have to do, or a distro change.  I also separate out /var (so if it fills up, it does not fill up the root space), and /tmp (VMware stores lots of stuff there when snapshotting virtual machines, so make the room).

It installed LILO with the large-memory option on /dev/md0, and reran LILO successfully.  There was some misinformation out there that LILO would boot from a RAID-10 volume.  Yeah, only if it looks exactly like a RAID-1 mirror.  Whoops.

Mount options for XFS I put into /etc/fstab after successfully booting:

-o noatime,nodiratime,logbsize=256k,logbufs=8

Conclusion:  Well, it hasn’t thrown up yet.  I guess that is good.  It seems plenty fast, but I have not really exercised it.  I do not think I will mess with the kernel for a while – I need it to be very stable while I am away, which is exactly when it is most likely to break.

Next, I will put VMware-Server 2 on it and install a 32-bit XP VM for my wife to use with her JNet TV streaming addiction.  I am assuming it won’t work with Firefox and Linux, but I will try that also, to be sure.  She switched to that after Pandora TV changed for the worse.

Should it all work out, I will putty her Windows settings over (bookmarks really), finish up her XP VM, and finally get around to fixing her Vista laptop with a prescription-case extra-strength dose of Xubuntu.

I can hardly wait.

Fix for VMware-Server error – “version `GCC_4.2.0′ not found”

I have had to employ this fix more than once, so it goes on the blog.  It applies to VMware-Server 1.x on newer kernels – I dunno if the 2.x version has the same issue.

Thanks Dave.

The fix:

cd /usr/lib/vmware/lib
/usr/lib/vmware/lib$ sudo mkdir bak
/usr/lib/vmware/lib$ sudo mv libgcc_s.so.1/libgcc_s.so.1 bak/

Make sure to run the vmware command in a console to determine the error you are having – after all, it might not be opening for an entirely different reason.

HOWTO – Vanilla Kernel 2.6.28.3, VMware Server 1.0.8, and Kubuntu 8.04…

First off, this was a mild pain, to put it royally.  The steps from my previous post are the same, but when stepping through the new features in the “make oldconfig” or “make menuconfig” portions, two things stand out:

  1. When prompted about firmware blobs in the kernel, one of the questions does not have a (Y,m,n,?) option – it is just blank.  Press enter and keep on moving (don’t type anything in if you don’t know what to do).  I typed “n” for no, and it failed at compiling the firmware module for n.ko.  The error looked like this:  “No rule to make target `firmware/n””
  2. When prompted, or if not, under Kernel hacking (make menuconfig step), be sure to check the box to include obsolete symbols, or you will have problems with VMware Server later on.  Like, it won’t compile the modules.  Be aware – the default action is “no”, so if you blow through the make oldconfig portion, obsolete symbols will not be included.

Once you are booted on your new kernel, extract the file VMware-server-1.0.8-126538.tar.gz, after downloading it from the VMware site.  This will make a new folder called vmware-server-distrib.  To run on 2.6.27 kernels or higher, you need to patch the vmware files.  If not, the errors you get will look like this:

“error: asm/semaphore.h: No such file or directory”

Searching around led me to this site with a patch (vmware-update-2.6.27-5.5.7-2.tar.gz).  Extract, cd, and run the run.pl file as root (sudo ./runme.pl) to patch and install.

The error message I tracked down leading me to including obsolete symbols was:

“insmod: error inserting ‘/tmp/vmware-config1/vmmon.o’: -1 Unknown symbol in module”

The fix was documented at this site (translated from German).

Finally, everything compiled and completed.  Not out of the woods yet, though.  Starting vmware in the console gave me errors – the errors and the simple fix are on this page at David’s blog.

And now it works.  I started my VM, installed VMware Tools to update the previous version (1.0.4, I think), and everything is smooth again.

I upgraded my kernel from 2.6.25.9 because a need arose to be able to mount DVDs using the UDF 2.5 filesystem – I guess kernels 2.6.26 and later support this.  Anyway, I did, and I was able to mount the DVD media just fine after that.

Hope this helps!

ESX Troubleshooting – The PSOD (Purple Screen of Death)…

Unlike the BSOD of Windows fame, there is actually hope with a PSOD on ESX.  As I learned at VMWorld 2008, this indicates a specific hardware problem in the majority of cases.  Examining the screen dump can actually point you in the right direction to resolving this.

As I was building my junk server cluster (in a lab, not for production use, so a great way to learn safely), I was swapping NICs to plus-up on Gigabit Ethernet connectivity to the Cisco 6509 I am using.  One of my servers (the big one) was largely already completely configured in VIM, right down to the NFS mountpoint it was using.  Without thinking it through, I grabbed a couple of gig NICs to install, since it still had room, and did it, removing two unsupported NICs in the process and sliding the cards over into the blank PCI slots (grouping all the NICs together).  Upon rebooting, it threw up a red log entry proclaiming a pCPU0 warning about something.  Shortly thereafter, the console stopped responding.  Checking further, I saw that the host had a PSOD.  I rebooted, got the same log message on initial ESX console screen, and another PSOD within minutes.

This time, I dug into the PSOD and noticed that the dump was referencing network drivers for the cards I had just installed.  Aha!  I realized that the vmnic numbering had changed – and the server was trying to do all kinds of things using the old vmnic PCI references, including mount the NFS share.  No wonder it vomited!

The solution was to first shut down and pull the new NICs, reboot, and see if the PSOD went away – it did.  Next, I removed the NFS share and updated the vmnic assignments to vswitches to account for any changes.  I rebooted again to make sure all was well.  When that proved to be the case, I shut down and added in the two NICs I wanted to use, rebooted, and everything worked.  I was then able to update my configs with the new vmnics, reboot to make sure there was no PSOD event, and reenable the NFS share.  I rebooted again, one last time, just to tempt fate, but still no PSOD.

Been stable ever since.

So don’t give up on the PSOD – it’s natural to want to do that with Windows, but this sure ain’t Windows, is it?  You CAN troubleshoot and resolve these cases, even if you have to open a support call.  The dump can help you zero in on the bad memory module, failing CPU, or even the occasional misplaced network card, and help you get your server back up on its feet.

Of course, I would never be this reckless in a production environment – which is why everyone should have a lab to play with.  If you can afford the time, effort, and junk servers, it is a great way to learn in safety.

ESX 4.0….

VMware ESX 4.0 is in limited, restricted beta testing right now (beta 2?).  I found some links discussing some of the features expected, although this is always subject to change.  YMMV.