《Neural networks and deep learning》概览

浏览数：60 / 时间：2015年06月09日

最近阅读了《Neural networks and deep learning》这本书（online book，还没出版），算是读得比较仔细，前面几章涉及的内容比较简单，我着重看了第三章《Improving the way neural networks learn》，涉及深度神经网络优化和训练的各种技术，对第三章做了详细的笔记（同时参考了其他资料，以后读到其他相关的论文资料也会补充或更改），欢迎有阅读这本书的同学一起交流。以下属个人理解，如有错误请指正。

What this book is about？

这本书中的代码基于Python实现，从MNIST这个例子出发，讲人工神经网络（Neural networks），逐步深入到深度学习（Deep Learning），以及代码实现，一些优化方法。适合作为入门书。

1、 Using neural nets to recognize handwritten digits

文章概要

用人工神经网络来识别MNIST数据集，Python实现，仅依赖NumPy库。

2、 How the backpropagation algorithm works

文章概要

上一章没有讨论怎么优化NN，当时并没有讨论怎么计算损失函数的梯度，没有讨论优化过程，这就是这一章要讲的BP算法。
BP算法在1970s出现，但直到1986年Hinton的paper发表之后它才火起来。
BP实现代码

the code was contained in the update_ mini _ batch and backprop methods of the Network class.In particular, the update_mini_batch method updates the Network’s weights and biases by computing the gradient for the current mini_batch of training examples:
Fully matrix-based approach to backpropagation over a mini-batch

Our implementation of stochastic gradient descent loops over training examples in a mini-batch. It’s possible to modify the backpropagation algorithm so that it computes the gradients for all training examples in a mini-batch simultaneously. The idea is that instead of beginning with a single input vector, x, we can begin with a matrix X=[x1x2…xm] whose columns are the vectors in the mini-batch.

将mini batch里的所有样本组合成一个大矩阵，然后计算梯度，这样可以利用线性代数库，大大地减少运行时间。
BP算法有多快？

BP算法刚发明的时候，计算机计算能力极其有限。现在BP在深度学习算法中广泛应用，得益于计算能力的大跃升，以及很多有用的trick。
what’s the algorithm really doing？

这部分对BP算法深入讨论，是个证明过程。网络前面某个节点发生的改变，会一层一层往后传递，导致代价函数发生改变，这两个改变之间的关系可以表示为：

技术分享

一层一层地推导，又可以表示为：

技术分享

后面还有一堆……

关于BP的原理，建议看看Andrew NG的UFLDL，也可以看一些相应的博文。

3、Improving the way neural networks learn

这一章讨论一些加速BP算法、提高NN性能的技术。这些技术/trick在训练网络、优化的时候很常用，如下所述，（目前还没整理完各个部分的笔记，而且篇幅长，就分为几篇博客来写，陆续在 [文章链接] 中贴出。）：

比方差代价函数更好的：交叉熵代价函数 [文章链接]
四种正则化方法（提高泛化能力，避免overfitting）： [文章链接]
- L1 regularization
- L2 regularization
- dropout
- artificial expansion of the training data
权重初始化的方法 [文章链接]
如何选取超参数（学习速率、正则化项参数、minibatch size） [文章链接]