edit · history · print

CLSP Cluster

Here some basic advice on running things on the CLSP cluster. This is a shared resource among many students, researchers, and faculty, so please try your best to not do harm to others. General guidelines on how to use the cluster are at https://wiki.clsp.jhu.edu/index.php/How_to_work_with_the_CLSP_Grid. Make sure you read this and learn how to run qsub jobs.

The main set of machines currently (May 2016) are the a* and b* nodes. You can log into them directly ssh or with qlogin. Do not run major jobs when logging in with ssh but small scripts like experiment.perl (in cluster mode are fine). Never run anything on login. Always immediately log into to one of the compute nodes. You may choose to set an alias like the one below in your local basic file to automatically ssh to the machine of your choice.

 alias clsp="ssh -Yt [username]@login.clsp.jhu.edu 'ssh b03'"

Basic Unix Advice

The nature of research is to do a lot of new things. Things that take time to figure out. Things that you may have to do again at some point in the future. So it is a major help to keep track of how to do things.

You should really write nice summaries and detailed instructions on what you do on a Wiki. You can use this Wiki for it. Edit the sidebar, add you name under Members and you have your own personal Wiki. Others can find out what you are doing as well. And it is to be known what you do.

But this is work and takes effort. So, there are two tools I always use. One logs command line interactions on demand and allows to make comments. The other logs everything.

Easy logging on demand

I wrote a small program called philog that makes logging easy.

It is in /home/pkoehn/statmt/bin/philog and life is easier with the following aliases in your .bash_profile

 alias p="philog"
 alias pp="philog -pretty"
 alias ppd="philog -pretty -dir"
 alias pd="philog -dir"
 alias ph="history | philog -history"
 alias pe="vi /home/pkoehn/statmt/bin/philog.LOGFILE".`hostname`

As you can see, you will need to modify the path names in philog to make it work.

When you do something on the command line, you can log it.

 > make -complex -stuff -x magic
 > p '# here is how I did it'
 > p 'make -complex -stuff -x magic'

Since you will often want to log the command you type in, this can be done with ph

 > make -complex -stuff -x magic
 > ph
 > p '# that's how I did it'

You can read the log with p. Since the log contains the directory path, so you can look at the log specific to the current directory with pd.

 > p
 Tue May 03 10:40:39 EDT 2016 /home/pkoehn/experiment make -complex -stuff -x magic

Automatic logging

Put this in you .bash_profile. Do it now.

  export BASHLOG='/home/pkoehn/statmt/bin/bash_history.log' 
  export SAVEBASH='if [ "$(id -u)" -ne 0 ]; then echo "`date +%F:%H:%M:%S` `hostname` `pwd` `history 1`" >> ${BASHLOG}; fi' 
  export SAVEBASHLOCAL='if [ "$(id -u)" -ne 0 ]; then echo "`date +%F:%H:%M:%S` `hostname` `pwd` `history 1`" >> .bash_history.local; fi' 

This saves everything you type in on the command line in a local file called .bash_history.local and a global file (put in the right path for BASHLOG). This is the zero effort way to keep track of what you did. You will never wonder again, "what exactly did I do to get this output?"

Here's a variation that only requires copy-pasting 7 lines into your terminal window. It also labels each local logfile with the directory name (useful if you have accounts on two clusters and want to make sure you don't accidentally overwrite your logfiles). The date/time format is also a little easier to read. Also, it won't try to make logfiles in directories for which you don't have write permissions.

 mkdir -p /home/$USER/log/

  echo "## philipp koehn autologging + clsp mods:" >> ~/.profile 

  echo "export BASHLOG='/home/${USER}/log/history_clsp.log'" >> ~/.profile 

  echo "export SAVEBASH='if [[ -w \$PWD ]]; then echo \"\`date +\"%Y %b %d %a %H:%M:%S %Z\"\` \`hostname\` \`pwd\` \`history 1\`\" >> \${BASHLOG}; fi'" >> ~/.profile 

  echo "export SAVEBASHLOCAL='if [[ -w \$PWD ]]; then echo \"\`date +\"%Y %b %d %a %H:%M:%S %Z\"\` \`hostname\` \`pwd\` \`history 1\`\" >> .history_clsp.local.\${PWD##/*/}; fi'" >> ~/.profile 

  echo "export PROMPT_COMMAND=\"\${SAVEBASH};\${SAVEBASHLOCAL}\"" >> ~/.profile 

 source ~/.profile

Running Moses experiments

Read the tutorial on experiment.perl on the Moses web site.

You will run experiments at the CLSP cluster with the -cluster switch. This schedules jobs on nodes.

To avoid typing in long commands, I use the following aliases in .bash_profile:

 alias ems="~/moses/scripts/ems/experiment.perl -no-graph -cluster -config"
 alias emsc="~/moses/scripts/ems/experiment.perl -no-graph -cluster -continue"
 alias emsdc="~/moses/scripts/ems/experiment.perl -no-graph -cluster -delete-crashed"
 alias emsx="nice nohup ~/moses/scripts/ems/experiment.perl -no-graph -cluster -exec -config"
 alias emscx="nice nohup ~/moses/scripts/ems/experiment.perl -no-graph -cluster -exec -continue"
 alias emsdcx="~/moses/scripts/ems/experiment.perl -no-graph -cluster -exec -delete-crashed"

So this way I can start an experiment with

 > emsx my-config-file >& OUT.42 &

And when it crashes, fix the problem, delete old files with

 > emsdc 42

and continue on

 > emsc 42 >& OUT.42 &


Running jobs in the background

Anything that runs longer than 5 minutes should be run in the background. This can be done with nohup:

 > nohup gzip tera-byte-file &

This way it does not interrupted when your internet acts up. For jobs that are IO intensive, like the one above, make sure you run it on the host machine. So if your tera-byte file resides on b04, run this command on b04. Our cluster's main bottleneck is IO, so adjust the nice value of a process like this to ensure that there's no IO blocking for other users.

Forgot your nohup and you want to go home? disown lets you sleep.

 > gzip tera-byte-file
 > bg
 > disown

Now it also runs in the background (it does not even show up with jobs) and you can rest easy.

Screen: An Alternative

Similarly to nohup, screen lets you run jobs without worrying about network interruptions or window closures. It also lets you resume command line sessions after network interruptions, and lets you run multiple windows in one terminal. Once you SSH into the grid machine you want to be on, run:

 > screen

This will start a screen session (which will look just like your standard terminal). Most of the basic screen commands take the form of "CTRL-a [characters]". To see the full list from inside screen, do "CTRL-a ?". To start a new window, press "CTRL-a c". To switch between windows, press "CTRL-a n" (changes to the next window) or "CTRL-a [window number]" (to change to a specific window). See the user manual for more details.

To detach the screen (for example, before going home or switching computers), type "CTRL-a d" or run:

 > screen -d

To reattach a screen, run:

 > screen -r

To close the screen permanently (it will no longer be able to reattach) or to close a window within a screen session, run:

 > exit

You can also run multiple screen sessions. To do this, it helps to give them a name. When you create the screen session, use

 > screen -S NAME

For example,

 > screen -S wmt16

You can list your active screen sessions with

 > screen -ls

And you can select which one to reattach to with

 > screen -dr NAME

(The -d forces any other attached sessions to detach).

If you are an emacs user, you will get annoyed that the screen meta-command is C-a (which is beginning-of-line). To change it to something more reasonable, you can put some commands in your ~/.screenrc file. For example, I (MJP) have this, which binds the meta-command to C-[:

 > escape ^]]
 > defescape ^]]

Here is MJP's full .screenrc:

 > escape ^]]
 > defescape ^]]
 > backtick 0 30 30 sh -c 'screen -ls | grep --color=no -o "$PPID[^[:space:]]*" | cut -d. -f2' hardstatus string '[%`]'
 > caption always '%{= dg} %{G}| %{B}{G}|%=%?%{d}%-w%?%{r}(%{d}t%? {%u}%?%{r})%{d}%?%+w%?%=%{G}| %{B}d s '

Some of this sets the title bar in your xterm so that you can tell what's running.

edit · history · print
Page last modified on August 27, 2020, at 05:49 PM