文章/答案/技术大牛

发布

社区首页 >问答首页 >基于TensorFlow的多对多LSTM视频分类

问基于TensorFlow的多对多LSTM视频分类
EN

Stack Overflow用户

提问于 2018-02-02 03:14:16

回答 1查看 3K关注 0票数 5

我必须构建一个二进制分类器来预测输入视频是否包含动作。模型的输入将是形状:在这里，[batch, frames, height, width, channel]是视频的数量，帧是视频中的图像数量(对于每个视频是固定的)，高度是图像中的行数，宽度是图像中的列数，通道是RGB颜色。我在Andrej Karpathy博客中发现，多对多递归神经网络最适合这个应用程序：http://karpathy.github.io/2015/05/21/rnn-effectiveness/

因此，我需要在TensorFlow中实现：

我使用这个教程学习了如何实现LSTM：https://github.com/nlintz/TensorFlow-Tutorials/blob/master/07_lstm.py#L52，但是，它实现了多对一的LSTM，并且只使用最后一个张量：outputs[-1]来预测输出和减少损失，我想使用许多张量(比如4)来预测输出，并使用它们来减少损失。

下面是我的实现：

import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np

# Training Parameters
batch = 5 # number of examples
frames = time_step_size = 20
height = 60
width = 80
channel = 3

lstm_size = 240
num_classes = 2

# Creating random data

input_x = np.random.normal(size=[batch, frames, height, width, channel])
input_y = np.zeros((batch, num_classes))
B = np.ones(batch)
input_y[:,1] = B

X = tf.placeholder("float", [None, frames, height, width, channel], name='InputData')
Y = tf.placeholder("float", [None, num_classes], name='LabelData')

with tf.name_scope('Model'):
    XR = tf.reshape(X, [-1, height*width*channel]) # shape=(?, 14400)
    X_split3 = tf.split(XR, time_step_size, 0) # 20 tensors of shape=(?, 14400)

    lstm = rnn.BasicLSTMCell(lstm_size, forget_bias=1.0, state_is_tuple=True)
    outputs, _states = rnn.static_rnn(lstm, X_split3, dtype=tf.float32) # 20 tensors of shape=(?, 240)
    logits = tf.layers.dense(outputs[-1], num_classes, name='logits') # shape=(?, 2)

prediction = tf.nn.softmax(logits)

# Define loss and optimizer
with tf.name_scope('Loss'):
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))

with tf.name_scope('optimizer'):
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
with tf.name_scope('Accuracy'):
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    logits_output = sess.run(logits, feed_dict={X: input_x})
    print(logits_output.shape) # shape=(5, 2)

    sess.run(train_op, feed_dict={X: input_x, Y: input_y})

    loss, acc = sess.run([loss_op, accuracy], feed_dict={X: input_x, Y: input_y})
    print("Loss: ", loss) # loss:  1.46626135e-05
    print("Accuracy: ", acc) # Accuracy:  1.0

问题：

我需要帮助来实现多对多LSTM并预测某些帧(比如4帧)之后的输出，但是，我只使用最后一个张量outputs[-1]来减少损失。有20个张量，每个frames或time_step_size一个。如果我转换每5个张量：outputs[4], outputs[9], outputs[14], outputs[-1]，我将得到4个logit。
还有一个问题是，我必须实现二进制分类器，但我只有我想要识别的动作视频。因此，input_y是标签的热门表示，其中第一列始终为0，第二列始终为1 (操作我必须确定)，而我没有任何示例视频，其中第一列的值为1。您认为它会起作用吗？
为什么在上面的实现中，仅在一次迭代中精度为1？

谢谢

python

tensorflow

machine-learning

neural-network

lstm

回答 1

Stack Overflow用户

发布于 2018-02-03 02:37:31

对于1，Dense接受任意数量的批次维度，因此您应该能够一次性转换为所有步骤的logit(然后同样对批次进行操作，直到每个步骤获得最终损失，然后聚合，例如通过取平均值)。

对于2.和3.，似乎你需要找到一些负面的例子。有一篇关于“积极和未标记(PU)”学习和“一类分类”的文献可能会有所帮助。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48570146

复制

相似问题

问基于TensorFlow的多对多LSTM视频分类
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于TensorFlow的多对多LSTM视频分类EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于TensorFlow的多对多LSTM视频分类
EN