Q: How do you run a bajillion simulations in parallel on an AWS instance?

A:

echo "running experiment..."
ssh -o StrictHostKeyChecking=no -i notactuallymykeyname.pem ubuntu@${IP} "for user in ${UIDS}; do PATH=/usr/local/cuda-7.0/bin/:\$PATH ~/anaconda2/bin/python ~/privknow/facebook_scnn.py 1 1 \${user} ${1} ${3} & for ((i=2;i<=3;i++)); do PATH=/usr/local/cuda-7.0/bin/:\$PATH ~/anaconda2/bin/python ~/privknow/facebook_scnn.py \$((i)) \$((i-1)) \${user} ${1} ${3} & done; done; wait" | tee ${2}

The rage flows strong and true, although apparently it's just 'suggestion rage':

INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1482')
INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1467')
INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1488')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/lock_dir
INFO (theano.gof.compilelock): To manually release the lock, delete /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/lock_dir
INFO (theano.gof.compilelock): To manually release the lock, delete /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1463')
INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1474')
INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1472')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/lock_dir
INFO (theano.gof.compilelock): To manually release the lock, delete /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/lock_dir
INFO (theano.gof.compilelock): To manually release the lock, delete /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1470')
INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1469')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/lock_dir
INFO (theano.gof.compilelock): To manually release the lock, delete /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1464')
INFO (theano.gof.compilelock): Waiting for existing lock by process '1487' (I am process '1459')

Sometimes, it barfs up its own source, because reasons:

Epoch 6 training error: 0.000000
Epoch 6 validation error: 0.000000
Epoch 7 training error: 0.000000
Epoch 7 validation error: 0.000000
Epoch 8 training error: 0.000000
Epoch 8 validation error: 0.000000
Validation loss did not decrease. Stopping early.
fitting the adversary's model...
    return np.asarray(Apow, dtype='float32')
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/core/numeric.py", line 474, in asarray
    return array(a, dtype, copy=False, order=order)
MemoryError
INFO (theano.gof.compilelock): Waiting for existing lock by process '1473' (I am process '1462')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/lock_dir
1 #define _CUDA_NDARRAY_C                                                                                                                                                            
2                                                                                                                                                                                    
3 #include <Python.h>                                                                                                                                                                
4 #include <structmember.h>                                                                                                                                                          
5 #include "theano_mod_helper.h"                                                                                                                                                     
6                                                                                                                                                                                    
7 #include <numpy/arrayobject.h>                                                                                                                                                     
8 #include <iostream>                                                                                                                                                                
9                                                                                                                                                                                    
10 #include "cuda_ndarray.cuh"                                                                                                                                                       
11                                                                                                                                                                                   
12 #ifndef CNMEM_DLLEXPORT                                                                                                                                                           
13 #define CNMEM_DLLEXPORT                                                                                                                                                           
14 #endif                                                                                                                                                                            
15                                                                                                                                                                                   
16 #include "cnmem.h"                                                                                                                                                                
17 #include "cnmem.cpp"                                                                                                                                                              
18                                                                                                                                                                                   
19 //If true, when there is a gpu malloc or free error, we print the size of allocated memory on the device.                                                                         
20 #define COMPUTE_GPU_MEM_USED 0                                                                                                                                                    
21                                                                                                                                                                                   
22 //If true, we fill with NAN allocated device memory.                                                                                                                              
23 #define ALLOC_MEMSET 0

Ye spirits of CUDA, lament your unsurfaced mutex, gaze upon my works, and despair.