Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
109 views
in Technique[技术] by (71.8m points)

python - Loss not decreasing at RBM loss training

I am learning about RBM and have copied and analysed Hismael Costa's code from his github repository. I have altered the code so it works with tensorflow.v2 only. Somehow when training the loss function jumps all over instead of going down.

The data set is definitely convergent.

length = 64*100

col0 = np.random.randint(0,2,length)
col1 = np.random.randint(0,2,length)
col2 = np.bitwise_or(col0, col1)
train_X = np.array([col0, col1, col2], dtype=np.float32).T

Initially I thought it is wrong application of GradientTape or wrong optimizer syntax, as this is the part i wrote mostly myself, but it seems all right. The number of hidden nodes, nor learnig_rate, nor number of steps, do not change anything. I also tried SGD optimizer to no avail.

Would someone please peek into my code:

import tensorflow as tf
import numpy as np

class RBM(tf.keras.Model):
    
    def __init__(self, nv, nh, cd_steps=3):
        super().__init__()
        self.W = tf.Variable(tf.random.truncated_normal((nv, nh), dtype=tf.float64) * 0.01) 
        self.bv = tf.Variable(tf.zeros((nv, 1), dtype=tf.float64))
        self.bh = tf.Variable(tf.zeros((nh, 1), dtype=tf.float64))
        self.cd_steps = cd_steps 
        self.modelW = None 
            
        
    def bernoulli(self, p):
        return tf.nn.relu(tf.sign(p- tf.random.uniform(p.shape, dtype=tf.float64)))
        
    def sample_h(self, v):
        ph_given_v = tf.sigmoid(tf.matmul(v, self.W) + tf.squeeze(self.bh))
        return self.bernoulli(ph_given_v)
    
    def sample_v(self, h):
        pv_given_h = tf.sigmoid(tf.matmul(h, tf.transpose(self.W)) + tf.squeeze(self.bv))
        return self.bernoulli(pv_given_h)
   
    def gibbs_step(self, i, k, vk):
        hk = self.sample_h(vk)
        vk = self.sample_v(hk)
        return i+1, k, vk
    
    def energy(self, v):
        b_term = tf.matmul(v, self.bv)
        linear_tranform = tf.matmul(v, self.W) + tf.squeeze(self.bh)
        h_term = tf.reduce_sum(tf.math.log(tf.exp(linear_tranform) + 1), axis=1) 
        return tf.reduce_mean(-h_term -b_term)
    
    def loss(self, v, vk):
        return self.energy(v) - self.energy(vk) 
     
    
    def call(self, X, lr=0.01, batch_size=64, epochs=5):
        X = tf.data.Dataset.from_tensor_slices(X).batch(batch_size)
        optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
        for epoch in range(epochs):
            losses = []
            for n_batch, batch in enumerate(X):
                print('batch: ', n_batch, end='
')
                with tf.GradientTape() as tape:
                    v = batch
                    vk = tf.identity(v)

                    i = tf.constant(0)
                    _, _, vk = tf.nest.map_structure(tf.stop_gradient,
                                                     tf.while_loop(cond = lambda i,k, *args: i<=k,
                                                                   body = self.gibbs_step,
                                                                   loop_vars =[i, tf.constant(0), v])
                                                    )

                    l = self.loss(v, vk)
                # grads = tape.gradient(l, [self.W, self.bv, self.bh])
                # optimizer.apply_gradients(zip(grads, self.trainable_variables))
                optimizer.minimize(l, var_list = [self.W, self.bv, self.bh], tape=tape) # this does the same job as the 2 lines abow (doesn't it?)
                

            losses.append(l)
            print('Epoch: {} Cost: {}'.format(epoch, np.mean(losses)))
        self.modelW = self.W


    def predict(self, Q, batch_size=64):
        Q = tf.data.Dataset.from_tensor_slices(Q).batch(batch_size)
        answers = []
        for n_batch, batch in enumerate(Q):
            print('batch: ', n_batch, end='
')
            v = batch
            i = tf.constant(0)
            _, _, vk = tf.nest.map_structure(tf.stop_gradient,
                                                tf.while_loop(cond = lambda i,k, *args: i<=k,
                                                              body = self.gibbs_step,
                                                              loop_vars =[i, tf.constant(0), v])
                                               )
        
            answers.append(vk.numpy())
        return np.concatenate(answers)
                    
        
 

just execute the code below to run the class with the dataset.

rbm = RBM(nv=train_X.shape[1], nh=5, cd_steps=3)
rbm(X=train_X, lr=0.001, epochs=10)
question from:https://stackoverflow.com/questions/65945324/loss-not-decreasing-at-rbm-loss-training

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

OK - it was just me being an idiot:

  1. Initiation of weights was probbably not ideal: It seems to work better when they are completely random:
self.W = tf.Variable(tf.random.truncated_normal((nv, nh), dtype=tf.float32)) 
self.bv = tf.Variable(tf.random.truncated_normal((nv, 1), dtype=tf.float32))
self.bh = tf.Variable(tf.random.truncated_normal((nh, 1), dtype=tf.float32))
  1. loss calculation: It works better with a square:
def loss(self, v, vk):
    return (self.energy(v) - self.energy(vk))**2 
  1. wrong indentation: the "losses.append(l)" line should be as indented as the line above it:
    optimizer.minimize(l, var_list = [self.W, self.bv, self.bh], tape=tape)
    losses.append(l)
  1. tweaking learning_rate for optimizer
  2. also plotting loss helped a lot. Some subtle changes are easier to appreciate when visible on a plot than in a stack of numbers.

So the subject closed.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...