Student Version Basis of AI Backprop Hypertext Documentation

Copyright (c) 1990-97 by Donald R. Tveter

Recurrent Network Examples

Recurrent back-propagation networks take values from hidden layer and/or output layer units and copy them down to the input layer for use with the next input. These values that are copied down are a kind of coded record of what the recent inputs to the network have been and this gives a network a simple kind of short-term memory, possibly a little like human short-term memory. For instance, suppose you want a network to memorize the two short sequences, "acb" and "bcd". In the middle of both of these sequences is the letter, `c'. In the first case you want a network to take in `a' and output `c', then take in `c' and output `b'. In the second case you want a network to take in `b' and output `c', then take in `c' and output `d'. To do this a network needs a simple memory of what came before the `c'.

Let the network be an 7-3-4 network where input units 1-4 and output units 1-4 stand for the letters a-d and the `h' stands for the value of a hidden layer unit. So the codes are:

a: 1000
b: 0100
c: 0010
d: 0001
In action, the networks need to do the following. When `a' is input, `c' must be output:

   0010     <- output layer

   hhh      <- hidden layer

1000 stm    <- input layer
In this context, when `c' is input, `b' should be output:

   0100

   hhh

0010 stm
For the other string, when `b' is input, `c' is output:

   0010

   hhh

0100 stm
and when `c' in input, `d' is output:

   0001

   hhh

0010 stm
This is easy to do if the network keeps a short-term memory of what its most recent inputs have been. Suppose we input a and the output is c:

   0010     <- output layer

   hhh      <- hidden layer

1000 stm    <- input layer
Placing `a' on the input layer generates some kind of code (like a hash code) on the 3 units in the hidden layer. On the other hand, placing `b' on the input units will generate a different code on the hidden units. All we need to do is save these hidden unit codes and input them with a `c'. In one case the network will output `b' and in the other case it will output `d'. In one particular run inputting `a' produced:

     0  0  1  0

  0.993 0.973 0.020

 1  0  0  0  0  0  0
When `c' is input the hidden layer units are copied down to input to give:

        0  1  0  0

    0.006 0.999 0.461

0  0  1  0  0.993 0.973 0.020
For the other pattern, inputting `b' gave:

    0  0  1  0

  0.986 0.870 0.020

0  1  0  0  0  0  0
Then the input of `c' gave:

          0  0  0  1

      0.005 0.999 0.264

0  0  1  0  0.986 0.870 0.020
This particular problem can be set up as follows:

m 7 3 4
s 7
ci
t 0.2
f ic   * compressed input format
rt rexample.dat
where the data file rexample.dat is:

1000 H   0010
0010 H   0100

0100 H   0010
0010 H   0001
The first four values on each line are the normal input. The H codes for however many hidden layer units there are. The last four values are the desired outputs.

By the way, this simple problem does not converge particularly fast and you may need to do a number of runs before you hit on initial values that will work quickly. It will work more reliably with more hidden units.

Rather than using recurrent networks to memorize sequences of letters they are probably more useful at predicting the value of some variable at time t+1 given its value at t, t-1, t-2, ... . A very simple of this is to give the value of sin(t+1) given a recent history of inputs to the net. Given a value of sin(t) the curve may be going up or down and the net needs to keep track of this in order to correctly predict the next value. The following setup will do this:

m 1+5 5 1
f ir
a aol dd uq
qp e 0.02
ci
rt rsin.dat
and the data file rsin.bp is:

   0.00000  H   0.15636
   0.15636  H   0.30887
   0.30887  H   0.45378

   . . .

  -0.15950  H  -0.00319
  -0.00319  H   0.15321
and in fact it converges rather rapidly. The complete set of data can be found in the example file rsin.bp.

Another recurrent network included in the examples is one designed to memorize two lines of poetry. The two lines were:

I the heir of all the ages in the foremost files of time
For I doubt not through all the ages ones increasing purpose runs

but for the sake of making the problem simpler each word was shortened to 5 characters giving:

i the heir of all the ages in the frmst files of
time for i doubt not thru the ages one incre purpo runs

The letters were coded by taking the last 5 bits of their ASCII codes. See the file poetry.bp.