Friday, August 26, 2011

Really Super Quick Start Guide to Setting Up SLURM

SLURM is the awesomely-named Simple Linux Utility for Resource Management written by the good people at LLNL. It's basically a smart task queuing system for clusters. My cluster has always run Sun Grid Engine, but it looks like SGE is more or less dead in the post-Oracle Sun software apocalypse. In light of this and since SGE recently looked at me the wrong way, I'm hoping to ditch it for SLURM. I like pop culture references and software that works.

The "Super Quick Start Guide" for LLNL SLURM has a lot of words, at least one of which is "make." If you're lazy like me, just do this:

0. Be using Ubuntu
1. Install: # apt-get install slurm-llnl
2. Create key for MUNGE authentication: /usr/sbin/create-munge-key
3a. Make config file: https://computing.llnl.gov/linux/slurm/configurator.html
3b. Put config file in: /etc/slurm-llnl/slurm.conf
4. Start master: # slurmctld
5. Start node: # slurmd
6. Test that fool: $ srun -N1 /bin/hostname

Bam.

(In my config file, I specified "localhost" as the master and the node. Probably a good place to start.)

8 comments:

  1. Is it possible to install it on a 32 core workstation if yes how different will the steps be.

    ReplyDelete
  2. Hi, Sreejith, and thanks for the comment.

    That is totally possible, and it's a great use of a cluster system like SLURM to get the most out of a multi-core system (just be sure that other resources (memory, memory bandwidth, network and disk I/O, etc.) are scaled to match depending on your usage patterns, so you don't end up with a bottleneck somewhere else).

    The steps don't change at all in your scenario. It's all in the configuration file created in step 3a: you should just set "localhost" as the master set up one node called "localhost" as well.

    ReplyDelete
  3. This was a big help. I just wanted to be able to test some scripts I was writing on my home computer. Did have a small problem with munge, needed to manually create /var/run/munge and change the ownership of /var/log/munge to root.

    ReplyDelete
  4. Thanks for a nice super-quick guide. Would it be possible for you to share the slurm.conf you generated? cheers

    ReplyDelete
  5. Interesting. It seems that this is broken on Ubuntu 16.04.

    ReplyDelete
  6. i followed the exact same steps but when i run the "srun -N1 /bin/hostname" command i get "srun:error unable to allocate resources: unable to contact slurm controller (connect failer)" any idea on why i'm getting this error?
    Thnaks alot.

    ReplyDelete
    Replies
    1. https://github.com/Azure/azure-quickstart-templates/issues/1796

      Following the instructions in above URL helped me to solved the error you mentioned.

      Delete
  7. Sadly with a non homogeneous set of nodes, this will not quite do it for setup. Each OS had a different version, both of which were out of date for 'security vulnerabilities'. Awesome guide for setting up my VM test cluster though.

    ReplyDelete