【转帖】【面向代码】学习 Deep Learning(二)Deep Belief Nets(DBNs)
今天介绍DBN的内容,其中关键部分都是(Restricted Boltzmann Machines, RBM)的步骤,所以先放一张rbm的结构,帮助理解
(图来自baidu的一个讲解ppt)
==========================================================================================
照例,我们首先来看一个完整的DBN的例子程序:
这是\tests\test_example_DBN.m 中的ex2
- //train dbn
- dbn.sizes = [100 100];
- opts.numepochs = 1;
- opts.batchsize = 100;
- opts.momentum = 0;
- opts.alpha = 1;
- dbn =dbnsetup(dbn, train_x, opts); //here!!!
- dbn = dbntrain(dbn, train_x, opts); //here!!!
- //unfold dbn to nn
- nn = dbnunfoldtonn(dbn, 10); //here!!!
- nn.activation_function = ‘sigm‘;
- //train nn
- opts.numepochs = 1;
- opts.batchsize = 100;
- nn = nntrain(nn, train_x, train_y, opts);
- [er, bad] = nntest(nn, test_x, test_y);
- assert(er < 0.10, ‘Too big error‘);
//train dbn dbn.sizes = [100 100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn =dbnsetup(dbn, train_x, opts); //here!!! dbn = dbntrain(dbn, train_x, opts); //here!!! //unfold dbn to nn nn = dbnunfoldtonn(dbn, 10); //here!!! nn.activation_function = ‘sigm‘; //train nn opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.10, ‘Too big error‘);
其中的过程简单清晰明了,就是dbnsetup(),dbntrain()以及dbnunfoldtonn()三个函数
最后fine tuning的时候用了(一)里看过的nntrain和nntest,参见(一)
\DBN\dbnsetup.m
这个实在没什么好说的,
- for u = 1 : numel(dbn.sizes) - 1
- dbn.rbm{u}.alpha = opts.alpha;
- dbn.rbm{u}.momentum = opts.momentum;
- dbn.rbm{u}.W = zeros(dbn.sizes(u + 1), dbn.sizes(u));
- dbn.rbm{u}.vW = zeros(dbn.sizes(u + 1), dbn.sizes(u));
- dbn.rbm{u}.b = zeros(dbn.sizes(u), 1);
- dbn.rbm{u}.vb = zeros(dbn.sizes(u), 1);
- dbn.rbm{u}.c = zeros(dbn.sizes(u + 1), 1);
- dbn.rbm{u}.vc = zeros(dbn.sizes(u + 1), 1);
- end
for u = 1 : numel(dbn.sizes) - 1 dbn.rbm{u}.alpha = opts.alpha; dbn.rbm{u}.momentum = opts.momentum; dbn.rbm{u}.W = zeros(dbn.sizes(u + 1), dbn.sizes(u)); dbn.rbm{u}.vW = zeros(dbn.sizes(u + 1), dbn.sizes(u)); dbn.rbm{u}.b = zeros(dbn.sizes(u), 1); dbn.rbm{u}.vb = zeros(dbn.sizes(u), 1); dbn.rbm{u}.c = zeros(dbn.sizes(u + 1), 1); dbn.rbm{u}.vc = zeros(dbn.sizes(u + 1), 1); end
\DBN\dbntrain.m
- function dbn = dbntrain(dbn, x, opts)
- n = numel(dbn.rbm);
- //对每一层的rbm进行训练
- dbn.rbm{1} = rbmtrain(dbn.rbm{1}, x, opts);
- for i = 2 : n
- x = rbmup(dbn.rbm{i - 1}, x);
- dbn.rbm{i} = rbmtrain(dbn.rbm{i}, x, opts);
- end
- end
function dbn = dbntrain(dbn, x, opts) n = numel(dbn.rbm); //对每一层的rbm进行训练 dbn.rbm{1} = rbmtrain(dbn.rbm{1}, x, opts); for i = 2 : n x = rbmup(dbn.rbm{i - 1}, x); dbn.rbm{i} = rbmtrain(dbn.rbm{i}, x, opts); end end首先映入眼帘的是对第一层进行rbmtrain(),后面每一层在train之前用了rbmup,
\DBN\rbmtrain.m
- for i = 1 : opts.numepochs //迭代次数
- kk = randperm(m);
- err = 0;
- for l = 1 : numbatches
- batch = x(kk((l - 1) * opts.batchsize + 1 : l * opts.batchsize), :);
- v1 = batch;
- h1 = sigmrnd(repmat(rbm.c‘, opts.batchsize, 1) + v1 * rbm.W‘); //gibbs sampling的过程
- v2 = sigmrnd(repmat(rbm.b‘, opts.batchsize, 1) + h1 * rbm.W);
- h2 = sigm(repmat(rbm.c‘, opts.batchsize, 1) + v2 * rbm.W‘);
- //Contrastive Divergence 的过程
- //这和《Learning Deep Architectures for AI》里面写cd-1的那段pseudo code是一样的
- c1 = h1‘ * v1;
- c2 = h2‘ * v2;
- //关于momentum,请参看Hinton的《A Practical Guide to Training Restricted Boltzmann Machines》
- //它的作用是记录下以前的更新方向,并与现在的方向结合下,跟有可能加快学习的速度
- rbm.vW = rbm.momentum * rbm.vW + rbm.alpha * (c1 - c2) / opts.batchsize;
- rbm.vb = rbm.momentum * rbm.vb + rbm.alpha * sum(v1 - v2)‘ / opts.batchsize;
- rbm.vc = rbm.momentum * rbm.vc + rbm.alpha * sum(h1 - h2)‘ / opts.batchsize;
- //更新值
- rbm.W = rbm.W + rbm.vW;
- rbm.b = rbm.b + rbm.vb;
- rbm.c = rbm.c + rbm.vc;
- err = err + sum(sum((v1 - v2) .^ 2)) / opts.batchsize;
- end
- end
for i = 1 : opts.numepochs //迭代次数 kk = randperm(m); err = 0; for l = 1 : numbatches batch = x(kk((l - 1) * opts.batchsize + 1 : l * opts.batchsize), :); v1 = batch; h1 = sigmrnd(repmat(rbm.c‘, opts.batchsize, 1) + v1 * rbm.W‘); //gibbs sampling的过程 v2 = sigmrnd(repmat(rbm.b‘, opts.batchsize, 1) + h1 * rbm.W); h2 = sigm(repmat(rbm.c‘, opts.batchsize, 1) + v2 * rbm.W‘); //Contrastive Divergence 的过程 //这和《Learning Deep Architectures for AI》里面写cd-1的那段pseudo code是一样的 c1 = h1‘ * v1; c2 = h2‘ * v2; //关于momentum,请参看Hinton的《A Practical Guide to Training Restricted Boltzmann Machines》 //它的作用是记录下以前的更新方向,并与现在的方向结合下,跟有可能加快学习的速度 rbm.vW = rbm.momentum * rbm.vW + rbm.alpha * (c1 - c2) / opts.batchsize; rbm.vb = rbm.momentum * rbm.vb + rbm.alpha * sum(v1 - v2)‘ / opts.batchsize; rbm.vc = rbm.momentum * rbm.vc + rbm.alpha * sum(h1 - h2)‘ / opts.batchsize; //更新值 rbm.W = rbm.W + rbm.vW; rbm.b = rbm.b + rbm.vb; rbm.c = rbm.c + rbm.vc; err = err + sum(sum((v1 - v2) .^ 2)) / opts.batchsize; end end
\DBN\dbnunfoldtonn.m
- function nn = dbnunfoldtonn(dbn, outputsize)
- %DBNUNFOLDTONN Unfolds a DBN to a NN
- % outputsize是你的目标输出label,比如在MINST就是10,DBN只负责学习feature
- % 或者说初始化Weight,是一个unsupervised learning,最后的supervised还得靠NN
- if(exist(‘outputsize‘,‘var‘))
- size = [dbn.sizes outputsize];
- else
- size = [dbn.sizes];
- end
- nn = nnsetup(size);
- %把每一层展开后的Weight拿去初始化NN的Weight
- %注意dbn.rbm{i}.c拿去初始化了bias项的值
- for i = 1 : numel(dbn.rbm)
- nn.W{i} = [dbn.rbm{i}.c dbn.rbm{i}.W];
- end
- end
function nn = dbnunfoldtonn(dbn, outputsize) %DBNUNFOLDTONN Unfolds a DBN to a NN % outputsize是你的目标输出label,比如在MINST就是10,DBN只负责学习feature % 或者说初始化Weight,是一个unsupervised learning,最后的supervised还得靠NN if(exist(‘outputsize‘,‘var‘)) size = [dbn.sizes outputsize]; else size = [dbn.sizes]; end nn = nnsetup(size); %把每一层展开后的Weight拿去初始化NN的Weight %注意dbn.rbm{i}.c拿去初始化了bias项的值 for i = 1 : numel(dbn.rbm) nn.W{i} = [dbn.rbm{i}.c dbn.rbm{i}.W]; end end最后fine tuning就再训练一下NN就可以了
总结
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。