Wednesday, November 9, 2011

Hitachi Deskstar + RocketRAID 232x

Spent the morning tracking down a stupid RAID problem, so hopefully this helps someone...

I use RocketRAID 2320 cards as HBAs in a few Ubuntu servers. I use software RAID (mdadm), so no need for the RAID features.

The trick to using the rr232x as an HBA is to configure each drive as a separate single-disk JBOD "array" in the RocketRAID BIOS. In the servers I've set up before, I've used eight Western Digital (Green) 1.5 TB and 2.0 TB drives, did the JBOD thing, and they appeared to the OS without issue.

My new server has 12 3.0 TB Hitachi Deskstars (H3IK3000), and when I installed the rr232x driver, I saw no drives. The RAID card could see the drives, and the OS could see the RAID card, but the OS couldn't see the drives.

When the drives that had been attached to the RR2320 were plugged directly into the motherboard, I noticed that they didn't spin up. The OS saw them and tried to talk, but timed out and gave up. Identical drives not exposed to the RR2320 worked fine.

There turned out to be two problems:

  1. The staggered spin-up feature of the RR2320. Apparently not compatible with these Deskstars, when enabled it flips a bit telling the drives not to spin up at all. The solution is to enable-then-disable the staggered spin up setting (Settings menu in the BIOS utility).
  2. The RR2320 just doesn't seem to like H3IK3000s. Solution: run the drives in "Legacy" mode rather than as JBOD arrays, a tidbit I picked up from here.
To run in Legacy mode, put an empty partition on a drive (plug into mobo, "mklabel" and "mkpart" in parted). The RR2320 will ignore it and pass it through to the OS.

Recap:
  1. Enable-then-disable staggered spin-up
  2. Plug each drive into the motherboard and fire up parted
    1. mklabel gpt
    2. mkpart (enter: null, xfs, 1, 3tb)
    3. set 1 raid on
  3. Plug drives back into the RR2320 -- should see them on boot
  4. mdadm --create /dev/md0 --raid-level=6 --raid-devices=12 /dev/sdb1 /dev/sdc1 ... /dev/sdm1
  5. mkfs.xfs /dev/md0

Friday, August 26, 2011

Really Super Quick Start Guide to Setting Up SLURM

SLURM is the awesomely-named Simple Linux Utility for Resource Management written by the good people at LLNL. It's basically a smart task queuing system for clusters. My cluster has always run Sun Grid Engine, but it looks like SGE is more or less dead in the post-Oracle Sun software apocalypse. In light of this and since SGE recently looked at me the wrong way, I'm hoping to ditch it for SLURM. I like pop culture references and software that works.

The "Super Quick Start Guide" for LLNL SLURM has a lot of words, at least one of which is "make." If you're lazy like me, just do this:

0. Be using Ubuntu
1. Install: # apt-get install slurm-llnl
2. Create key for MUNGE authentication: /usr/sbin/create-munge-key
3a. Make config file: https://computing.llnl.gov/linux/slurm/configurator.html
3b. Put config file in: /etc/slurm-llnl/slurm.conf
4. Start master: # slurmctld
5. Start node: # slurmd
6. Test that fool: $ srun -N1 /bin/hostname

Bam.

(In my config file, I specified "localhost" as the master and the node. Probably a good place to start.)

Saturday, June 4, 2011

Evil C++ #2: Using GCC's -ftrapv flag to debug integer overflows

In C++, overflowing an integer type won't cause an exception and can result in weird numbers propagating through your program. GCC's ftrapv flag has your back.

Thursday, June 2, 2011

Evil C++ #1: Brackets and "at" for accessing STL vector elements

This is the first in a series of code snippets that demonstrate C/C++ pitfalls.

(For an thorough explanation of the many ways C++ is out to get you, see Yossi Kreinin's excellent C++ FQA).

Ignoring GCC warnings on a per-file basis

In most cases, ignoring GCC warnings is a Bad Idea. Treating warnings as errors results in better code.

However, sometimes we are forced to deal with other people's code. For instance, a project I work on relies on JsonCpp. We include this in our source tree so that every user doesn't to have to go get JsonCpp source code in order to compile this thing.

Such dependencies can be a problem if you want really strict compiler options, since libraries will often be slightly incompatible with your particular standard (ANSI, C++0x, ...) or not be up to your lofty expectations. In my case, JsonCpp gives me a couple of warnings with GCC options -W, -Wall, -ansi, -pedantic. This means I can't compile my code with -Werror, which makes me sad. I certainly don't want to modify these external libraries.

Fortunately, in recent GCC versions ways of selectively disabling warnings have been added. If your problems are confined to headers, you can replace -I/path/to/headers with -isystem/path/to/headers and GCC will treat them as system headers, ignoring warnings.

Another less-desirable solution is to use pragmas. Headers can be marked as system headers by putting at the top:

#pragma GCC system_header


If the problems lie in the source files themselves, neither of these tricks work. We can, however, add to the top of the files causing the warnings things like this:

#pragma GCC diagnostic ignored "-Wunused-parameter"
#pragma GCC diagnostic ignored "-Woverflow"


to disable specific warnings generated by that file.

To figure out the names of the warnings causing the problems, recompile with the -fdiagnostics-show-option option on the g++ line. This is especially useful in the case of default warnings (i.e. those which aren't optional) like -Woverflow since they are harder to find in the documentation.

This isn't a great solution, since it does require some modification of the libraries. However, you can easily generate a patch from your changes and apply it to any new library versions should you decide later to upgrade them. Hopefully someday GCC will include an "ignore warnings from this file or subdirectory" option, but until then... it works.