January 2007

Sun Solaris 10 ZFS – An installation headache.

I’ve been reading about ZFS ever since it hit slashdot over a year ago, and finally decided to move a bunch of data into ZFS. We bought a Sun x4100 M2 server (two dual-core Opterons with 8GB of RAM), a pair of dual-channel PCI Express Ultra320 SCSI controllers, and two Aberdeen XDAS SCSI-to-SATA shelves. Total raw disk space: 24TB. We bought all the hardware for a little over $2/GB.

I should note that the custom solution easily beat the offers from Network Appliance ($10/GB), EMC ($9.40/GB), BlueArc ($20/GB), Panasas ($5.60/GB) and Sun StorageTek ($9.40/GB).
I racked the two shelves and went to connect the SCSI cables and ran into a little problem. The SCSI cables that I bought (VHDCI) were too fat to fit side-by-side on the PCI Express cards. There is a special type of cable called a VHDCI Offset Cable that has the connector offset to one side. The off-the-shelf cache configuration on the XDAS boxes was 512MB of battery-backed RAM. I didn’t think that was going to be a problem until I found out that I couldn’t expose each of the SATA disks as an individual LUN to make full use of ZFS. I could get 16 LUN’s mapped, but the system didn’t have “slots” available to map any more LUN’s. Aberdeen tech support put me in direct contact with InforTrend’s support group, and they said I simply needed to upgrade the RAM in the shelf to 1GB.

After the 1GB upgrade, I performed a factory-reset of the disk shelf and the shelf showed a full 128 slots for LUN mappings. I booted Solaris and saw no drives. Since the default Solaris LSI driver (MPT) was a bit older, I installed the “unsupported” Solaris drivers from LSI Logic (itmpt-x86-5.07.01) and Solaris was able to see a whole bunch of drives. A very short time later, I had a 24TB ZFS pool comprised of four raidz2 pools of 12 drives each. Carving up the drives into the four raidz2 pools allows me to lose two drives per pool, or an entire SCSI chain without losing the ability to serve data. God forbid I lose an entire SCSI chain, but it’s nice to know that it can survive something that bad.

Given the parity loss of 8 drives worth of data, and 500GB per drive, I have a usable storage pool of 20TB. Awesome.

Technology

Comments (0)

Permalink

VMWare Server on Sun Opteron Hardware

Three brand new Sun x2200 M2 servers, each with 8GB of RAM and two dual-core 2.2GHz Opteron processors, one copy of VMWare ESX Server 3.0.1, and a bunch of production servers that need virtualized ASAP.

I rack mount the servers, power them up, and find out that the integrated lights-out manager (ILOM) is pretty rough. Instead of a nice remote KVM like our x4100 M2 has, the entire ILOM reboots with the server. This means that a CD install that has a boot menu with a default time out is a royal PITA. You have to close the browser immediately after rebooting the server, use “ping -t” with the ILOM IP address waiting for it to come back to life, then re-connect the web browser as soon as possible to get to the CD menu before the 10 second timeout occurs.

Next, we start the ESX server installation and it can’t see any hard drives. Sigh. Turns out that the “Supports VMWare” blurb on Sun’s web page isn’t exactly correct. VMWare ESX server isn’t supported until version 3.0.2, which at this time is unavailable with no expected release date. The support group at VMWare suggests I install VMWare Server instead on top of a linux host. Ok then, off to download openSUSE.

OpenSUSE installed without any major issues. I installed the 64 bit version and added the 32 bit compatibility layers for the VMWare prerequisites. Also, a compiler for VMWare to use to link itself with. VMWare Server installed fine.

Next, I brought up a virtual machine. All looked well, except for the clock which was running strangely. Either way too slow or way too fast. I struggled with the clock for several hours, trying various VMWare specific settings that I found on the discussion forums, as well as the VMWare knowledge base. The clocking got better, but under any kind of load at all, the windows second hand was spinning wildly. At one point, I saw it jump 30 seconds in a single “tick”. Egads.

The conclusion presented itself late last night. As it turns out, “cat /proc/cpuinfo” on the linux host was showing that I had two dual core 1GHz processors. AMD has a PowerNow! feature that automatically throttles the clock speed with the amount of work being done. Since VMWare and its guests are tied very closely to the CPU clock, the system as a whole couldn’t keep time as well as an old 15th century water clock. I disabled PowerNow! in the BIOS and lo-and-behold, the clock problem is now just a few milliseconds per day.

VMWare needs to document this. Badly. In big, bold print.

Technology

Comments (0)

Permalink