Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
989 views
in Technique[技术] by (71.8m points)

neural network - Why do we ignore past memory when performing back propagation for LSTM cells?

(As always, please notice I'm not an expert or even an engineer) I'm trying to understand backpropagation for LSTM layers. I try to grasp the idea that derivatives just have to be calculated at each step and added through time-batch-size (e.g. a sentence) but I can't understand why we ignore past information coming from memory channel at each step.

To exemplify my problem, I tried to examine this formula for the derivative of some error function with respect to the forget gate of an LSTM: (https://www.geeksforgeeks.org/lstm-derivation-of-back.../)

dE/df = E_delta * o * (1-tanh2 (ct)) * ct-1

where:

E is ERror,

f is forget gate

o is output gate,

(1-tanh^2(ct)) is the dervative of tanh(ct) with respect to ct and 

ct-1 is memory at time t-1

My question is: Shouldn't it be "required" to consider ct-1 as depending on forget gate as well, and not only as a constant? Sholuldn't we see the product of forget gate (time t) and memory (time t-1) as a product f(x)g(x)?

We would then have: d(f(x)g(x))/dx = f'g + fg'.

The first part is already in the formula, and the second one will take us to the very beginning of the LSTM memory. If ct is settled to be 0 for t=0 the algorithm would end nicely.

The formula I get goes like in the image bellow.

enter image description here

I would add two additional questions (assuming my formula is not just plainly wrong)

  1. The terms of the sum in my formula will decrease fast, as they have products of more and more numbers always 0<x<1. Is that the justification for not consider them?

  2. Should I take the correct/expected value of the cell at time t-1 (as I do) to calculate this formula or the h_t-1 actually given by the cell?

I would appreciate any advice. Not an expert here, in any sense!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...