Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
651 views
in Technique[技术] by (71.8m points)

deep learning - How does the Gradient function work in Backpropagation?

In backpropagation, is the gradient of Loss w.r.t layer L calculated using gradient w.r.t. layer L-1? Or, is the gradient of Loss w.r.t. layer L-1 calculated using the gradient w.r.t layer L?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A gradient descent function is used in back-propagation to find the best value to adjust the weights by. There are two common types of gradient descent: Gradient Descent, and Stochastic Gradient Descent.

Gradient descent is a function that determines the best adjustment value to change the weights by. Over each iteration, it determines the volume/amount the weights should be adjusted by, the further away from the best determined weight, the bigger the adjustment value will be. You can think of it as a ball rolling down a hill; the ball's velocity being the adjustment value, and the hill being the possible adjustment values. Essentially, you want the ball (adjustment value) to be closest to the bottom of the world (possible adjustment) as possible. The ball's velocity will increase until it reaches the bottom of the hill - the bottom of the hill is the best possible value. Practical Image A more practical explanation can be found here.

Stochastic gradient descent is a more complicated version of the gradient descent function and it is used in a neural network that may have a false-best adjustment value, where regular gradient descent won't find the best value, but a value it think's is the best. This can be analogised as the ball rolling down two hills, the hills are different in height. It rolls down the first hill and reaches the bottom of the first hill, thinking that it's reached the best possible answer, but with stochastic gradient descent, it would know that the position it was in now was not the best position, but in reality, the bottom of the second hill.

The left is what gradient descent would output. The right is what the stochastic gradient descent would find (the best possible value). Practical diagram A more descriptive and practical version of this explanation can be found here.

And finally to conclude my answer to your question, in back-propagation you calculate the furthest right weight-matrix's gradient and then adjust the weights accordingly, then you move one layer to the left, L-1, (on the next weight-matrix) and repeat the step, so in other words you determine the gradient, adjust accordingly and then move the the left.

I have also talked about this in detail in another question, it might help to check that one out.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...