Tensorflow setup
Posted: 2016-07-22 , Modified: 2016-07-22
Tags: programming
Posted: 2016-07-22 , Modified: 2016-07-22
Tags: programming
Load tensorflow on startup, e.g. put in .bashrc.
module load python cudatoolkit/7.5 cudann
pip install --user /tigress/plazonic/public_html/tensorflow/rhel6/tensorflow_pkg_gpu/tensorflow-0.8.0-py2-none-any.whl
Sample script
#!/bin/bash
#SBATCH -t 10:00:00
#SBATCH -N 1
#SBATCH --ntasks-per-node=4
#SBATCH --ntasks-per-socket=2
#SBATCH --gres=gpu:2
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --mail-user=holdenl@princeton.edu
module load python
module load cudatoolkit/7.5
module load cudann
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python cifar10_multi_gpu_train.py --num_gpus=2 --train_dir='/tigress/holdenl/tmp/cifar10_train1'
Run by sbatch script.cmd.
cifar10_multi_gpu_train.py or cifar10_train.py.
cifar10.py to build the graph.cifar10_input.py to download or load the data.
data_dir defined in cifar10.py. (Changed to /tigress/knv/cifar10_data.)Settings: Override flags as above.
num_gpus=2 seems to work best.train_dir='/tigress/holdenl/tmp/cifar10_train1'.train(fs, step_f, output_steps=10, summary_steps=100, save_steps=1000, eval_steps = 1000, max_steps=1000000, train_dir="/", log_device_placement=False, batch_size=128,train_data=None,validation_data=None, test_data=None, train_feed={}, eval_feed={}, x_pl="x", y_pl="y_", batch_feeder_args=[])fs is a dictionary containing: inference, loss functionsstep_f is function to execute at each training step, taking arguments fs and global_step. Example
lambda fs, global_step: (
train_step(fs["loss"], fs["losses"], global_step,
lambda gs: tf.train.AdamOptimizer(1e-4)))Constants
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN=50000NUM_EXAMPLES_PER_EPOCH_FOR_EVAL=10000NUM_EPOCHS_PER_DECAYINITIAL_LEARNING_RATELEARNING_RATE_DECAY_FACTORNUM_CLASSES=10MOVING_AVERAGE_DECAY = 0.9999NUM_EPOCHS_PER_DECAY = 350.0LEARNING_RATE_DECAY_FACTOR = 0.1INITIAL_LEARNING_RATE = 0.1