Train neural networks in your browser!
Aug. 5, 2018
The red and blue points are used to train the neural network, while the rest of the pixels in the left canvas are used to predict which class they should belong to. Every $^1/_{10}$ of a second the network updates it's weights and makes new predictions on data from the left canvas. The canvas on the right is a direct visual of what two neurons look like at a given layer in the network. For example, the demo is defaulted to the first two neurons of the $2^{nd}$ layer in the network after applying a tanh activation function. The neurons $\boldsymbol{h}^{(2)}_0$ and $\boldsymbol{h}^{(2)}_1$ come from the hidden layer, $$\boldsymbol{h}^{(2)}=f(\boldsymbol{W}^{(2)}\boldsymbol{h}^{(1)}+b^{(2)})$$, where $h^{(l)}$ is the output vector of layer $l$, $\boldsymbol{W}^{(l)}$ is the weight matrix from layer $l$, $b^{(l)}$ is the layers bias parameter, and $f$ is the tanh function. The red and blue points lie on a 2-dimensional grid where each line represents a strip of pixels that run along the x or y-axis of the left canvas. These lines help us see how the feature space stretches and bends in each layer of the neural network.
Every dense
layer is decomposed into two seperate layers:
If you select the $1^{st}$ layer to visualize you'll notice that it looks no different from the data in the left canvas. This is simply because the network has not yet applied a nonlinear activation function. Once we apply at least one nonlinear function to our data we'll begin to see the data stetch and bend in the network.
Now if we select the Output layer you'll notice that if the model is able to classify the data correctly on the left - it has actually pulled the red and blue points completely apart from each other on the right to the point where you could draw one straight line in between them to accurately classify the two groups. This is the beauty of what neural networks are trying to accomplish :)
Tensor
is the building block of the library. It creates a 3 dimensional matrix and stores it's weights, derivatives, and dimension details. All inputs, weight matrices, and outputs must be a Tensor class.
tensor = new browsernn.Tensor(1, 2, 1, 0.0); // 1x2x1 tensor initialized with zero
tensor = new browsernn.Tensor(28, 28, 3); // 28x28x3 tensor with randomly initialized values
var data = [[10,20,30],[40,50,60]];
tensor = new browsernn.Tensor(2, 3, 1, data); // 2x3x1 tensor initialized with data values
tensor.w[4]; // returns 50
tensor.dw[4]; // returns 0 since no derivative has been computed yet
Tensor
returns has the following properties:
n_cells
= n
* d
* depth
n_cells
that holds the weights for the tensor.
n_cells
that holds the derivates for the tensor.
Model()
is used to instantiate a browserNN network.
model = new browsernn.Model();
Each layer must be passed into a list as an object containing a set of parameters to define the layer. Each network must start with an input
layer and end with a loss layer like softmax
, SVM
, etc.
layers = [];
// multilayer perceptron
layers.push({type: 'input', in_n: 1, in_d: 2, in_depth: 1});
layers.push({type: 'dense', n_neurons: 8, activation: 'relu'});
layers.push({type: 'dense', n_neurons: 4, activation: 'relu', drop_prob: 0.4});
layers.push({type: 'softmax', n_classes: 2});
The input
layer must always be defined at the start of a network, which indicates what the expected input shape will be when passed into the model. If no parameter for in_depth
is passed the model will assume tensor of size $(n, d, 1)$.
layers.push({type = 'input', in_n: 28, in_d: 28}); // 28x28 input matrix
The dense
layer acts as a densly-connected layer that returns the output $\boldsymbol{h} = f(\boldsymbol{W} \cdot \boldsymbol{x} + b)$ where $\boldsymbol{W}$ is the weight matrix created by the current layer, $\boldsymbol{x}$ is the input vector from the previous layer, $b$ is the bias parameter created by the current layer, and $f$ is the activation
function. The activation functions currently supported are: linear
, sigmoid
, relu
, and tanh
.
layers = [];
layers.push({type = 'dense', n_neurons: 64, activation: 'sigmoid'});
layers.push({type = 'dense', n_neurons: 64, activation: 'relu'});
layers.push({type = 'dense', n_neurons: 64, activation: 'tanh'});
layers.push({type = 'dense', n_neurons: 64, activation: 'linear'});
The last layer of each network must be a loss layer, one of which can be softmax
. For classification the softmax layer outputs a probability for each of the N number of classes defined by n_classes
and predicts the class with the highest probability.
layers.push({type: 'softmax', n_classes: 10}); // predicts among 10 classes
Similar to the softmax
layer, SVM
creates a Support Vector Machine layer that outputs scores for the n_classes
passed through.
layers.push({type: 'SVM', n_classes: 2});
Trainer()
is the last argument needed to train a model. This class takes in the initialized model
, layers
, and parameters
and begins training once the method .fit()
is called. The optimizers currently supported are SGD
, Adagrad
, Adadelta
, and Adam
.
trainer = new browsernn.Trainer(model, layers, params);
Stochastic Gradient Descent optimizer with support for momentum.
params = {
optimizer: 'SGD',
learning_rate: 0.01,
momentum: 0.9,ls
l2_decay: 0.01,
batch_size: 10
};
trainer = new browsernn.Trainer(model, layers, params);
momentum
with variations of the learning rate, however, for the 2D datasets above a small momentum of 0.1 is typically sufficient, since the gradients will be relatively small.
l2_decay
in the range 0.0001 - 0.5 to see it's effect on the models ability to learn.
Adagrad is an optimizer that adapts learning rates based on how frequently a parameter is changed while training. It is similar to SGD with momentum, however, shrinks it's learning rate over time by accumulating all historical gradients at each step.
params = {
optimizer: 'adagrad',
learning_rate: 0.01,
epsilon: 1e-7,
batch_size: 10
};
trainer = new browsernn.Trainer(model, layers, params);
Adadelta is an extension of Adagrad, which scales the learning rate based off of recently accumulated gradients instead of all historical gradients like the way adagrad
does. This prevents the learning rate from converging to zero, which typically happens after long periods of training with Adagrad.
params = {
optimizer: 'adadelta',
learning_rate: 1.0,
epsilon: 1e-7,
rho: 0.95,
batch_size: 10
};
trainer = new browsernn.Trainer(model, layers, params);
This library is simply for educational purposes. It is in no way intended for production use, but rather for myself and others to explore how different hyperparameters and optimization methods effect neural networks at a low level - hence this 2D demo.
For full source code please see: https://github.com/carbonati/browsernn