SLURM is the awesomely-named Simple Linux Utility for Resource Management written by the good people at LLNL. It's basically a smart task queuing system for clusters. My cluster has always run Sun Grid Engine, but it looks like SGE is more or less dead in the post-Oracle Sun software apocalypse. In light of this and since SGE recently looked at me the wrong way, I'm hoping to ditch it for SLURM. I like pop culture references and software that works.
The "Super Quick Start Guide" for LLNL SLURM has a lot of words, at least one of which is "make." If you're lazy like me, just do this:
0. Be using Ubuntu
1. Install: # apt-get install slurm-llnl
2. Create key for MUNGE authentication: /usr/sbin/create-munge-key
3a. Make config file: https://computing.llnl.gov/linux/slurm/configurator.html
3b. Put config file in: /etc/slurm-llnl/slurm.conf
4. Start master: # slurmctld
5. Start node: # slurmd
6. Test that fool: $ srun -N1 /bin/hostname
Bam.
(In my config file, I specified "localhost" as the master and the node. Probably a good place to start.)
Is it possible to install it on a 32 core workstation if yes how different will the steps be.
ReplyDeleteHi, Sreejith, and thanks for the comment.
ReplyDeleteThat is totally possible, and it's a great use of a cluster system like SLURM to get the most out of a multi-core system (just be sure that other resources (memory, memory bandwidth, network and disk I/O, etc.) are scaled to match depending on your usage patterns, so you don't end up with a bottleneck somewhere else).
The steps don't change at all in your scenario. It's all in the configuration file created in step 3a: you should just set "localhost" as the master set up one node called "localhost" as well.
This was a big help. I just wanted to be able to test some scripts I was writing on my home computer. Did have a small problem with munge, needed to manually create /var/run/munge and change the ownership of /var/log/munge to root.
ReplyDeleteThanks for a nice super-quick guide. Would it be possible for you to share the slurm.conf you generated? cheers
ReplyDeleteInteresting. It seems that this is broken on Ubuntu 16.04.
ReplyDeleteThat is certainly my experience: https://github.com/superphy/semantic/issues/93. Any ideas?
Deletei followed the exact same steps but when i run the "srun -N1 /bin/hostname" command i get "srun:error unable to allocate resources: unable to contact slurm controller (connect failer)" any idea on why i'm getting this error?
ReplyDeleteThnaks alot.
https://github.com/Azure/azure-quickstart-templates/issues/1796
DeleteFollowing the instructions in above URL helped me to solved the error you mentioned.
Sadly with a non homogeneous set of nodes, this will not quite do it for setup. Each OS had a different version, both of which were out of date for 'security vulnerabilities'. Awesome guide for setting up my VM test cluster though.
ReplyDeletecan we setup slurm on just one VM ?
ReplyDelete