LSTM是目前最常用的循环神经网络(RNN)之一,本文提供了一个用Tensorflow实现LSTM的模板,供参考。

LSTM

循环神经网络RNN是包含循环的网络,网络当前输出取决于当前输入和网络前(后)一时刻的隐藏层参数,保存了状态信息,可以让信息持久化,但是却存在梯度爆炸/梯度消失的问题,因此只能记忆短期的信息。而LSTM加入了保持长期状态的单元状态(cell state)

1538224455246.png

关于LSTM的更多信息,可以参考以下链接:零基础入门深度学习(6) - 长短时记忆网络(LSTM)

Tensorflow实现LSTM

  • 创建一个Cell:tf.nn.rnn_cell.BasicLSTMCell
  • 创建多层网络:tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * N_LSTM_LAYER, state_is_tuple=True)
  • 创建RNN:tf.contrib.rnn.static_rnn(lstm_cells, input_net, dtype=tf.float32)

基本上就是如下的模式:

print("Program start...")
train_data, train_label, test_data, test_label = read_dataset()

N_TRAIN = len(train_label)
N_TEST = len(test_label)

N_CLASS = len(train_label[0])
N_STEP = len(train_data[0])
N_INPUT = len(train_data[0][0])
N_HIDDEN = 32
N_LSTM_LAYER = 2
LEARNING_RATE = 0.0025
EPOCHS = 100
BATCH_SIZE = 50

print("Setting up network...")
W1 = tf.Variable(tf.random_normal([N_INPUT, N_HIDDEN]))
B1 = tf.Variable(tf.random_normal([N_HIDDEN]))
W2 = tf.Variable(tf.random_normal([N_HIDDEN, N_CLASS]))
B2 = tf.Variable(tf.random_normal([N_CLASS]))

input_data = tf.placeholder(tf.float32, [None, N_STEP, N_INPUT]) # BATCH_SIZE * N_STEP * N_INPUT
input_label = tf.placeholder(tf.float32, [None, N_CLASS])

input_net = tf.transpose(input_data, [1, 0, 2]) # N_STEP * BATCH_SIZE * N_INPUT
input_net = tf.reshape(input_net, [-1, N_INPUT]) # (N_STEP * BATCH_SIZE) * N_INPUT
input_net = tf.add(tf.matmul(input_net, W1), B1) # (N_STEP * BATCH_SIZE) * N_HIDDEN
input_net = tf.nn.relu(input_net)
input_net = tf.split(input_net, N_STEP, 0) # N_STEP * BATCH_SIZE * N_HIDDEN

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(N_HIDDEN, forget_bias=1.0, state_is_tuple=True)
lstm_cells = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * N_LSTM_LAYER, state_is_tuple=True)
lstm_outputs, _ = tf.contrib.rnn.static_rnn(lstm_cells, input_net, dtype=tf.float32)
lstm_last_output = lstm_outputs[-1]
lstm_net = tf.add(tf.matmul(lstm_last_output, W2), B2)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels = input_label, logits = lstm_net))
optimizer = tf.train.AdamOptimizer(learning_rate = LEARNING_RATE).minimize(cost)
correct = tf.equal(tf.argmax(lstm_net, 1), tf.argmax(input_label, 1))
accuracy = tf.reduce_mean(tf.cast(correct, dtype=tf.float32))

sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement = False))
init = tf.global_variables_initializer()
sess.run(init)

print(N_TRAIN, N_TEST, N_STEP, N_INPUT, N_CLASS)
print(test_label)

print("Start training...")
for i in range(EPOCHS):
    for start in range(0, N_TRAIN, BATCH_SIZE):
        end = start + BATCH_SIZE
        sess.run(optimizer,
            feed_dict = {input_data: train_data[start:end], input_label: train_label[start:end]})
    out, acc, loss = sess.run([lstm_net, accuracy, cost],
        feed_dict = {input_data: test_data, input_label: test_label})
    # print(out)
    print('Iter: %d, acc: %f, cost: %f' % (i, acc, loss))

其中的read_dataset是读取数据集,存入train_data, train_label, test_data, test_label四个变量中。如果你使用这个模板,这就是工作量最大的地方。

接下来就是使用Tensorflow创建LSTM网络,使用Adam算法进行训练,每个EPOCH输出结果,很常规的操作。

标签: 机器学习, LSTM

添加新评论