Technology

Machine Learning and Meta-Meta-Data

I read a recent post on slashdot talking about machine learning and some new ideas coming out of MIT and it got me thinking. Since I’m a programmer, I’ve always been fascinated by how well my children learn by experience, training, categorization, and classification of all of their available input to produce (mostly) sensible output. I have two wonderful children and watching them grow is the ultimate first-hand experience for a programmer to see the baby bootstrap.

Here’s a simple example: One night my daughter woke up crying. I asked her what the problem was and she couldn’t really give me an answer. Just some babbling and more crying. Since it was 2 in the morning and my programmer brain was just nodding off, I figured that A) she had some kind of dream and her brain didn’t really know what to do with it and B) she couldn’t focus on what I was asking her because of A. All it took to fix the problem was a simple processor interrupt. I just carried her into the bathroom, sat her on the toilet and said “go potty”. Since that was something we had practiced many times, it was very easy for her to focus on that instead of the problem at hand, and the crying instantly stopped. (Parents take note, beatings and yelling only get you so far…) As a programmer, I found that little trick worked perfectly well.

So back to the learning thing… The brain doesn’t come pre-wired. It doesn’t come with a blueprint, and it sure doesn’t come with a users-manual. So how in the world does it come up with a superb sensory storage system that doesn’t cook itself given all of the input we experience on a daily basis? A common buzzword in this synergy-infected business world is “thinking outside the box”. Typically this means taking a step back and re-evaluating the problem. In the database world, we had data, we stepped back and came up with metadata to fit the data into. It worked, and worked surprisingly well. All of the techniques that I’ve seen from MIT and Caltech and every other AI research group out there are focused on how to categorize the data. They start with some data, then they find these wonderfully sophisticated algorithms to pour through hundreds of thousands of samples and produce some very good output, but the framework is very rigid.

Brain cells die. New brain cells are created. The brain re-wires itself constantly. You just can’t cram that rigid square-peg-algorithm into a dynamic round-hole-environment. It won’t work. We need to take another step backwards. Call it a meta-meta-data if you’d like. Call it dynamic metadata. I don’t care what you call it, just call it something.

Instead of toiling over algorithms and data structures, come up with a simple genetic sequence of virtual parallel computer instructions. The Cell processor from IBM is a good starting point. Each SPE has a fixed instruction set. Use that. Now write some code that randomly scrambles and reassembles generations of genetic code and run it through fitness trials (i.e. real life situations), but don’t train the model based on example input and yes/no output – train the model on example input and “classification methodologies”. We train our children how to organize by color, size, shape, feel, etc. Train the genetic model that a picture of a car can be a “red car”, a “big car”, a “race car”, whatever. Let the model decide how to store that data. Let the model decide where to store that data.

Then give the model a toddler to learn with. They might just be able to teach each other something.

Technology

Comments (0)

Permalink

Unlocking my Blackberry Pearl

I decided to see how hard it would be to get T-Mobile to unlock my shiny new Blackberry Pearl. I figured since I might need to travel outside the country, popping a cheap long-distance SIM card into the phone would be a nice feature to have. I called the tech support department at 611 from my phone. I got to a tech within just a minute or two, and they had me enter *#06# and read them my SIM (IMEI) number. He asked if it would be ok to send the unlock instructions to my email address, which I agreed to.

The next day, I had an email from an “ARSystem Notify” with a 16 digit unlock code and the following instructions:

  1. Press the Menu key
  2. Scroll to Options and press the trackball
  3. Scroll to Advanced Options and press the trackball
  4. Scroll to SIM Card and press the trackball
  5. Type MEPPD
  6. Press the Alt key and type MEPP2
  7. Enter the unlock code
  8. Press the Enter Key

I followed these instructions to the letter and the phone was unlocked shortly thereafter. No fuss, no problems, nothing.

T-Mobile, you guys rock. I had been a Cingular customer for years and their customer service was mediocre at best. Ever since I switched in November, I’ve had nothing but excellent customer service from your Blackberry support people.

Technology

Comments (3)

Permalink

Sun Solaris 10 ZFS – An installation headache.

I’ve been reading about ZFS ever since it hit slashdot over a year ago, and finally decided to move a bunch of data into ZFS. We bought a Sun x4100 M2 server (two dual-core Opterons with 8GB of RAM), a pair of dual-channel PCI Express Ultra320 SCSI controllers, and two Aberdeen XDAS SCSI-to-SATA shelves. Total raw disk space: 24TB. We bought all the hardware for a little over $2/GB.

I should note that the custom solution easily beat the offers from Network Appliance ($10/GB), EMC ($9.40/GB), BlueArc ($20/GB), Panasas ($5.60/GB) and Sun StorageTek ($9.40/GB).
I racked the two shelves and went to connect the SCSI cables and ran into a little problem. The SCSI cables that I bought (VHDCI) were too fat to fit side-by-side on the PCI Express cards. There is a special type of cable called a VHDCI Offset Cable that has the connector offset to one side. The off-the-shelf cache configuration on the XDAS boxes was 512MB of battery-backed RAM. I didn’t think that was going to be a problem until I found out that I couldn’t expose each of the SATA disks as an individual LUN to make full use of ZFS. I could get 16 LUN’s mapped, but the system didn’t have “slots” available to map any more LUN’s. Aberdeen tech support put me in direct contact with InforTrend’s support group, and they said I simply needed to upgrade the RAM in the shelf to 1GB.

After the 1GB upgrade, I performed a factory-reset of the disk shelf and the shelf showed a full 128 slots for LUN mappings. I booted Solaris and saw no drives. Since the default Solaris LSI driver (MPT) was a bit older, I installed the “unsupported” Solaris drivers from LSI Logic (itmpt-x86-5.07.01) and Solaris was able to see a whole bunch of drives. A very short time later, I had a 24TB ZFS pool comprised of four raidz2 pools of 12 drives each. Carving up the drives into the four raidz2 pools allows me to lose two drives per pool, or an entire SCSI chain without losing the ability to serve data. God forbid I lose an entire SCSI chain, but it’s nice to know that it can survive something that bad.

Given the parity loss of 8 drives worth of data, and 500GB per drive, I have a usable storage pool of 20TB. Awesome.

Technology

Comments (0)

Permalink

VMWare Server on Sun Opteron Hardware

Three brand new Sun x2200 M2 servers, each with 8GB of RAM and two dual-core 2.2GHz Opteron processors, one copy of VMWare ESX Server 3.0.1, and a bunch of production servers that need virtualized ASAP.

I rack mount the servers, power them up, and find out that the integrated lights-out manager (ILOM) is pretty rough. Instead of a nice remote KVM like our x4100 M2 has, the entire ILOM reboots with the server. This means that a CD install that has a boot menu with a default time out is a royal PITA. You have to close the browser immediately after rebooting the server, use “ping -t” with the ILOM IP address waiting for it to come back to life, then re-connect the web browser as soon as possible to get to the CD menu before the 10 second timeout occurs.

Next, we start the ESX server installation and it can’t see any hard drives. Sigh. Turns out that the “Supports VMWare” blurb on Sun’s web page isn’t exactly correct. VMWare ESX server isn’t supported until version 3.0.2, which at this time is unavailable with no expected release date. The support group at VMWare suggests I install VMWare Server instead on top of a linux host. Ok then, off to download openSUSE.

OpenSUSE installed without any major issues. I installed the 64 bit version and added the 32 bit compatibility layers for the VMWare prerequisites. Also, a compiler for VMWare to use to link itself with. VMWare Server installed fine.

Next, I brought up a virtual machine. All looked well, except for the clock which was running strangely. Either way too slow or way too fast. I struggled with the clock for several hours, trying various VMWare specific settings that I found on the discussion forums, as well as the VMWare knowledge base. The clocking got better, but under any kind of load at all, the windows second hand was spinning wildly. At one point, I saw it jump 30 seconds in a single “tick”. Egads.

The conclusion presented itself late last night. As it turns out, “cat /proc/cpuinfo” on the linux host was showing that I had two dual core 1GHz processors. AMD has a PowerNow! feature that automatically throttles the clock speed with the amount of work being done. Since VMWare and its guests are tied very closely to the CPU clock, the system as a whole couldn’t keep time as well as an old 15th century water clock. I disabled PowerNow! in the BIOS and lo-and-behold, the clock problem is now just a few milliseconds per day.

VMWare needs to document this. Badly. In big, bold print.

Technology

Comments (0)

Permalink