Neural Networks
I have implemented a Back Propagation (BP) Algorithm in Stochastic Gradient Descent (SGD) mode, which involves 2 layers of sigmoid units (1 hidden layer). It has been used in stopping module for Coreference Resolution. [1]
There are two strategies for optimization:
(1) Adding momentum in the processing of update the weights during training phrase. The function of momentum is mainly to keep the weight update on the nth iteration depend partially on the update that occurred during the (n1)th iteration. Thus, momentum can sometimes carry the gradient descent procedure through narrow local minima.
(2) Setting a "fSpeedUp" constant for accelerating the training speed: When the searching procedure enters the flat region in the super space, "fSpeedUp" could lengthen the search step. Once, the searching procedure jumps out of the flat region, the step would restore to its original length.
Reference:
[1] " Machine Learning ", Tom Mitchell, McGraw Hill, 1997.
