I am trying to implement Adversarial NN, which requires to 'freeze' one or the other part of the graph during alternating training minibatches. I.e. there two sub-networks: G and D.
G( Z ) -> Xz
D( X ) -> Y
where loss function of G
depends on D[G(Z)], D[X]
.
First I need to train parameters in D with all G parameters fixed, and then parameters in G with parameters in D fixed. Loss function in first case will be negative loss function in the second case and the update will have to apply to the parameters of whether first or second subnetwork.
I saw that tensorflow has tf.stop_gradient
function. For purpose of training the D (downstream) subnetwork I can use this function to block the gradient flow to
Z -> [ G ] -> tf.stop_gradient(Xz) -> [ D ] -> Y
The tf.stop_gradient
is very succinctly annotated with no in-line example (and example seq2seq.py
is too long and not that easy to read), but looks like it must be called during the graph creation. Does it imply that if I want to block/unblock gradient flow in alternating batches, I need to re-create and re-initialize the graph model?
Also it seems that one cannot block the gradient flowing through the G (upstream) network by means of tf.stop_gradient
, right?
As an alternative I saw that one can pass the list of variables to the optimizer call as opt_op = opt.minimize(cost, <list of variables>)
, which would be an easy solution if one could get all variables in the scopes of each subnetwork. Can one get a <list of variables>
for a tf.scope?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…