LSTM Programming

Posted: 2016-03-19 , Modified: 2016-03-19

Tags: programming

Math

Here are the equations for LSTM.

\[\begin{align} f_t&=\si(W_f \coltwo{h_{t-1}}{x_t} + b_f)\\ i_t&=\si(W_i \coltwo{h_{t-1}}{x_t} + b_i)\\ \wt{C}_t &= \tanh (W_C\coltwo{h_{t-1}}{x_t}+b_C)\\ C_t &= f_t \odot C_{t-1} + i_t \odot \wt{C}_t\\ o_t &= \si(W_o\coltwo{h_{t-1}}{x_t} + b_o)\\ h_t &= o_t\odot \tanh(C_t)\\ \wh y &= \text{softmax}(Wh_t + b). \end{align}\]

References:

LSTM layer

We define functions

(Actually these functions will involve the parameters as well, which we omitted here.)

Implementation

Define step_lstm1 by

def step_lstm1(x, C, h, Wf, bf, Wi, bi, WC, bC, Wo, bo):
    hx = T.concatenate([h,x]) #dimension m+n
    f = sigmoid(T.dot(hx, Wf) + bf) #dimension m
    i = sigmoid(T.dot(hx, Wi) + bi) #dimension m
    C_add = T.tanh(T.dot(hx, WC) + bC) #dimension m
    C1 = f * C + i * C_add #dimension m
    o = sigmoid(T.dot(hx, Wo) + bo) #dimension m
    h1 = o * T.tanh(C1) #dimension m
    return [C1, h1] #dimension 2m

Now define step_lstm as the version with parameters grouped together.

def step_lstm(x, C, h, tparams): 
    Wf, bf, Wi, bi, WC, bC, Wo, bo = unpack_params(tparams, ["Wf", "bf", "Wi", "bi", "WC", "bC", "Wo", "bo"])
    return step_lstm1(x, C, h, Wf, bf, Wi, bi, WC, bC, Wo, bo)

To define sequence_lstm we use Theano’s can function. The arguments are

Thus to create a scanned function like so

scan' :: ((a,b,c) -> b) -> [a] -> b -> c -> [b]
scan' f a's init fixed =

we call

theano.scan(fn=f, sequences=a's, outputs_info=init, non_sequences=fixed)

Note here a, b, c can encompass multiple arguments, in which case you pass a list to sequences, outputs_info, and non_sequences. However, a, b, c must appear in that order.

def sequence_lstm(C0, h0, xs, tparams):.
    Wf, bf, Wi, bi, WC, bC, Wo, bo = unpack_params(tparams, ["Wf", "bf", "Wi", "bi", "WC", "bC", "Wo", "bo"])
    #the function fn should have arguments in the following order:
    #sequences, outputs_info (accumulators), non_sequences
    #(x, C, h, Wf, bf, Wi, bi, WC, bC, Wo, bo)
    ([C_vals, h_vals], updates) = theano.scan(fn=step_lstm1,
                                          sequences = xs, 
                                          outputs_info=[C0, h0], #initial values of the memory/accumulator
                                          non_sequences=[Wf, bf, Wi, bi, WC, bC, Wo, bo], #fixed parameters
                                          strict=True)
    return [C_vals, h_vals]

Note this will map automatically; to define sequence_multiple_lstm all we have to do is swap two axes.

(Note on Theano list in scan.)

Neural net functions

A vanilla neural net layer is

def nn_layer1(x, W, b):
    return x * W + b

def nn_layer(x, tparams):
    W, b = unpack_params(tparams, ["W", "b"])
    return nn_layer1(x, W, b)

We define functions

Now we can combine these with our LSTM to make the evaluation, prediction, and loss function. Evaluation will give the probabilities of each output, prediction will give the output with max probability, and loss is the logloss on the expected and actual outcomes. We also include a accuracy function that outputs 1 if the prediction is correct and 0 otherwise.

Note fns_lstm returns a list of Theano variables (depending on the input lists/parameters) representing the activations, predictions, losses and accuracy. We haven’t compiled these variables into a function yet.

(Add code here)

Some other functions:

(A further speedup is to concatenate the matrices.)

Data processing functions

We’ll keep parameters in a dictionary, and unpack them as needed.

def unpack_params(tparams, li):
    return [tparams[name] for name in li]

Optimization functions

These are taken from…

The arguments of each are

Returns

What does the train function need?

Pseudocode for train: