Sunday, 22 June 2014

Mid-term Summary 
GSoC 2014: Extending Neural Networks Module for Scikit-Learn

The objective is to implement neural network algorithms in a clean, well-tested code using the scikit-learn API. The algorithms are meant to be user-friendly and easy to edit and scale for those who wish to extend, or debug them if necessary.

Since the start of GSoC 2014 until now, I completed two modules, multi-layer perceptron (mlp) #3204 and mlp with pre-training #3281, which are pending final review for merging. I also implemented the extreme learning machine (elm) algorithm #3306 which hasn't been reviewed yet and more components such as test files, examples, and documentations are required. However, I am confident that I will complete it by the deadline I set in the proposal.

In the following three sections, I will explain the modules in more detail.

1) Multi-layer perceptron  #3204
Multi-layer perceptron (MLP) supports more than one hidden layer allowing it to construct highly non-linear functions. Figure 1 displays an MLP with 1 hidden layer.

Figure 1

To define the number of hidden layers and their neurons, one can simply run the following statement.

The list '[150, 100]' means that two hidden layers are constructed with 150 and 100, neurons respectively.

Further, MLP can be used for reinforcement learning where each time step makes a new training sample. It can use the `partial_fit` method to update its weights on per sample basis in real-time (stochastic update).

MLP also consists of a regularization term `alpha` as part of its parameters, whose value determines the degree of non-linearity the function is meant to have. Therefore, if the algorithm is overfitting, it is desirable to increase `alpha` to have a more linear function. Figure 2 demonstrates  the decision boundaries learnt by mlp with different alpha values.

Figure 2

Figure 2 shows that the higher the value of `alpha`, the less curves the decision boundary will have.

The implementation has passed through various test cases to prevent unexpected behavior. One of the test cases involves comparing between the algorithm's analytic computation of the gradient and its numerical computation. Since the difference between the values was found to be at most a very small value means the backpropagation algorithm is working as expected.

2) MLP with pre-training #3281

One issue with MLP is that it involves random weights' initialization. The weights could land in a poor position in the optimization (see Figure 3) whose final solutions are not as good as they could be.

Figure 3

Pre-training is one scheme to have the initial weights land in a better start. Restricted boltzmann machines (RBMs) can find such initial weights. Figure 4 shows the process of pre-training.

Figure 4
For each layer in the network, there is an RBM that trains on the inputs given for that layer. The final weights of the RBM are given as the initial weights of the corresponding layer in the network.

Running an example of pre-training has showed that RBMs can improve the final performance. For instance, on the digits the dataset, the following results were obtained.

1) Testing accuracy of mlp without pretraining: 0.964
2) Testing accuracy of mlp with pretraining: 0.978

3) Extreme learning machine (elm) #3306 

Much of the criticism towards MLP is in its long training time. MLP uses the slow gradient descent to updates its weights iteratively, involving many demanding computations.

Extreme learning machines (ELMs) [1], on the other hand, can train single hidden layer feedforward networks (SLFNs) using least square solutions instead of gradient descent. This scheme requires only few matrix operations, making it much faster than gradient descent. It also has a strong generalization power since it uses least-squares to find its solutions.

The algorithm has been implemented and it passed the travis tests. But it still awaits more thorough review and test files to anticipate errors.

I believe I will finalize the module by 29 June as per the proposal.

Remaining work

In the remaining weeks my tasks are broken down as follows.

Week 7, 8 (June 30 - July 13)

I will implement and revise regularized ELMs [3] and weighted ELMs [4], and extend the ELMs documentation.

Week 9, 10  (July 14- July 27)

I will implement and revise Sequential ELMs [2], and extend the ELMs documentation.

Week 11, 12 (July 28- August 10)

I will implement and revise Kernel-Based ELMs, and extend the ELMs documentation.

Week 13 - Wrap-up


I would like to thank my mentors and reviewers including @ogrisel, @larsmans @jnothman, @kasternkyle, @AlexanderFabisch for dedicating their time in providing useful feedback and comments, making sure the work meets high-quality standards. I sincerely appreciate the time PSF admins take to oversee the contributers as it encourages us to set a higher bar for quality work. I would also like to thank GSoC 2014, as this wouldn't have been possible if it hadn't been for their support and motivation.





[4]   Zong, Weiwei, Guang-Bin Huang, and Yiqiang Chen. "Weighted extreme learning machine for imbalance learning." Neurocomputing 101 (2013): 229-242.

Thursday, 12 June 2014

(Week 3) GSoC 2014: Extending Neural Networks Module for Scikit learn

This week, with the help of many reviewers I completed a user friendly multi-layer perceptron algorithm in Python. While it is still a Pull Request , the algorithm can be downloaded by following these steps,

1) git clone
2) cd scikit-learn/
3) git fetch origin refs/pull/3204/head:mlp
4) git checkout mlp

Creating an MLP classifier is easy. First, import the scikit-learn library; then, initialize an MLP classifier by executing these statements,

  from sklearn.neural_network import MultilayerPerceptronClassifier
  clf = MultilayerPerceptronClassifier()

If you'd like to have 3 hidden layers of sizes 150-100-50, create an MLP object using this statement,

   clf = MultilayerPerceptronClassifier(n_hidden=[150, 100, 50])

Training and testing are done the same way as any learning algorithm in scikit-learn.

In addition, you can tune many parameters for your classifier. Some of the interesting ones are,

1) the algorithm parameter, which allows users to select the type of algorithm for optimizing the neural network weights, which is either stochastic gradient descent SGD and l-bfgs; and

2) the max_iter parameter, which allows users to set the number of maximum iterations the network can run.

After training mlp, you can view the minimum cost achieved by printing mlp.cost_. This gives an idea of how well the algorithm has trained.

The implementation passed high standard tests and achieved expected performance for the MNIST dataset. MLP with one hidden layer of 200 neurons, and 400 iterations achieved great results in the MNIST benchmark compared to other algorithms shown below,

 Classification performance                                                         
 Classifier                         train-time                          test-time                       error-rate
  Multi layer Perceptron         655.5s                            0.30s                            0.0169
  nystroem_approx_svm         125.0s                            0.91s                            0.0239
  ExtraTrees                             79.9s                           0.34s                            0.0272
  fourier_approx_svm             148.9s                            0.60s                            0.0488
  LogisticRegression                68.9s                            0.14s                            0.0799  

For next week, I will implement extreme learning machine algorithms. These use the least-square solution approach for training neural networks, and therefore they are both quick and efficient with interesting applications.

Sunday, 1 June 2014

(Week 2) GSoC 2014: Extending Neural Networks Module for Scikit learn

I apologize for not posting my progress in GSoC 2014 for the past week. I was not aware of the weekly blog post requirement. However, I have been posting my progress in the Github link:

From this week onward, I will announce my weekly progress in detail.

To get an overview of what I have done so far,

1) I fixed the issues with hyperbolic tan activation function for the multi-layer perceptron pull-request I am implementing for scikit-learn; and

2) I extended multi-layer perceptron to support more than one hidden layer with additional unit tests to make sure the algorithm runs correctly.

While I was expected to finish the documentation for multi-layer perceptron (MLP)in the first week, my mentor and I decided to first extend it to support more than one hidden layer.

For next week, I will complete the documentation for the extended MLP as well as address any comments my mentors will issue for the implementation.

I am saving the detailed description of the algorithm for the documentation. Once completed, I will post the documentation here in full.

Thank you.

Relevant Github links,