# browserNN.js

Train neural networks in your browser!

Aug. 5, 2018

## Introduction

Inspired by Keras, Andrej Karpathy, and dynet, browserNN.js allows you to train deep neural networks in your own browser. The demo below offers 2D datasets (left canvas) to train neural networks on. The canvas on the right visualizes the transformed feature space of 2 neurons at a given layer in the network. You can make changes to the code below to train different networks and hyperparameters. You can also choose to visualize different layers using the dropdown box under the right canvas. Documentation on how the library works is below. Have fun!

## Demo

Web browser not supported. Try google chrome :)
Web browser not supported. Try google chrome :)

### Overview

The red and blue points are used to train the neural network, while the rest of the pixels in the left canvas are used to predict which class they should belong to. Every $^1/_{10}$ of a second the network updates it's weights and makes new predictions on data from the left canvas. The canvas on the right is a direct visual of what two neurons look like at a given layer in the network. For example, the demo is defaulted to the first two neurons of the $2^{nd}$ layer in the network after applying a tanh activation function. The neurons $\boldsymbol{h}^{(2)}_0$ and $\boldsymbol{h}^{(2)}_1$ come from the hidden layer, $$\boldsymbol{h}^{(2)}=f(\boldsymbol{W}^{(2)}\boldsymbol{h}^{(1)}+b^{(2)})$$, where $h^{(l)}$ is the output vector of layer $l$, $\boldsymbol{W}^{(l)}$ is the weight matrix from layer $l$, $b^{(l)}$ is the layers bias parameter, and $f$ is the tanh function. The red and blue points lie on a 2-dimensional grid where each line represents a strip of pixels that run along the x or y-axis of the left canvas. These lines help us see how the feature space stretches and bends in each layer of the neural network.

### Dense Layer

Every dense layer is decomposed into two seperate layers:

1. The linear transformation from the previous layer to the current.
• $\boldsymbol{z}^{(l)}=\boldsymbol{W}^{(l)} \boldsymbol{h}^{(l-1)}+b^{(l)}$
2. The hidden output after applying an activation function to the linear transformation.
• $\boldsymbol{h}^{(l)}=f(\boldsymbol{z}^{(l)})=f(\boldsymbol{W^{(l)}}\boldsymbol{h}^{(l-1)}+b^{(l)})$
This allows us to see the before and after effects of applying activation functions.

### Output

If you select the $1^{st}$ layer to visualize you'll notice that it looks no different from the data in the left canvas. This is simply because the network has not yet applied a nonlinear activation function. Once we apply at least one nonlinear function to our data we'll begin to see the data stetch and bend in the network.

Now if we select the Output layer you'll notice that if the model is able to classify the data correctly on the left - it has actually pulled the red and blue points completely apart from each other on the right to the point where you could draw one straight line in between them to accurately classify the two groups. This is the beauty of what neural networks are trying to accomplish :)

## Documentation

### Tensor

Tensor is the building block of the library. It creates a 3 dimensional matrix and stores it's weights, derivatives, and dimension details. All inputs, weight matrices, and outputs must be a Tensor class.

#### Example

tensor = new browsernn.Tensor(1, 2, 1, 0.0); // 1x2x1 tensor initialized with zero
tensor = new browsernn.Tensor(28, 28, 3); // 28x28x3 tensor with randomly initialized values

var data = [[10,20,30],[40,50,60]];
tensor = new browsernn.Tensor(2, 3, 1, data); // 2x3x1 tensor initialized with data values

tensor.w[4]; // returns 50
tensor.dw[4]; // returns 0 since no derivative has been computed yet

#### Arguments

• in_n: number of rows
• in_d: number of cols
• in_depth: depth of the tensor
• init_weight: constant or array of weights to initialize each value in the tensor
• seed: integer to use as random seed

#### Properties

The object Tensor returns has the following properties:
• n: number of rows
• d: number of cols
• depth: depth of the tensor
• n_cells: Total number of values the tensor holds. n_cells = n * d * depth
• w: Array of length n_cells that holds the weights for the tensor.
• dw: Array of length n_cells that holds the derivates for the tensor.

### Model

Model() is used to instantiate a browserNN network.

model = new browsernn.Model();

### Layers

Each layer must be passed into a list as an object containing a set of parameters to define the layer. Each network must start with an input layer and end with a loss layer like softmax, SVM, etc.

#### Example

layers = [];

// multilayer perceptron
layers.push({type: 'input', in_n: 1, in_d: 2, in_depth: 1});
layers.push({type: 'dense', n_neurons: 8, activation: 'relu'});
layers.push({type: 'dense', n_neurons: 4, activation: 'relu', drop_prob: 0.4});
layers.push({type: 'softmax', n_classes: 2});

#### Arguments

• type: Name of the layer
• in_n (input): Number of rows for an input layer
• in_d (input): Number of cols for an input layer
• in_depth (input): Depth of an input layer
• n_neurons (dense): Number of units in a hidden layer
• activation (dense): Name of the activation function applied element-wise
• drop_prob: Fraction of the input units to drop. Float between 0 and 1.
• n_classes (output): Number of classes to predict on

### Layer types

#### Input

The input layer must always be defined at the start of a network, which indicates what the expected input shape will be when passed into the model. If no parameter for in_depth is passed the model will assume tensor of size $(n, d, 1)$.

layers.push({type = 'input', in_n: 28, in_d: 28}); // 28x28 input matrix

#### Dense

The dense layer acts as a densly-connected layer that returns the output $\boldsymbol{h} = f(\boldsymbol{W} \cdot \boldsymbol{x} + b)$ where $\boldsymbol{W}$ is the weight matrix created by the current layer, $\boldsymbol{x}$ is the input vector from the previous layer, $b$ is the bias parameter created by the current layer, and $f$ is the activation function. The activation functions currently supported are: linear, sigmoid, relu, and tanh.

layers = [];
layers.push({type = 'dense', n_neurons: 64, activation: 'sigmoid'});
layers.push({type = 'dense', n_neurons: 64, activation: 'relu'});
layers.push({type = 'dense', n_neurons: 64, activation: 'tanh'});
layers.push({type = 'dense', n_neurons: 64, activation: 'linear'});

#### Softmax

The last layer of each network must be a loss layer, one of which can be softmax. For classification the softmax layer outputs a probability for each of the N number of classes defined by n_classes and predicts the class with the highest probability.

layers.push({type: 'softmax', n_classes: 10}); // predicts among 10 classes

#### SVM

Similar to the softmax layer, SVM creates a Support Vector Machine layer that outputs scores for the n_classes passed through.

layers.push({type: 'SVM', n_classes: 2});

### Trainers

Trainer() is the last argument needed to train a model. This class takes in the initialized model, layers, and parameters and begins training once the method .fit() is called. The optimizers currently supported are SGD, Adagrad, Adadelta, and Adam.

#### Example

trainer = new browsernn.Trainer(model, layers, params);

#### Arguments common to all optimizers

• batch_size: Number of samples per gradient update. Defaults to 1.
• seed: seed: integer to use as random seed.

#### SGD

Stochastic Gradient Descent optimizer with support for momentum.

params = {
optimizer: 'SGD',
learning_rate: 0.01,
momentum: 0.9,ls
l2_decay: 0.01,
batch_size: 10
};
trainer = new browsernn.Trainer(model, layers, params);

#### Arguments

• learning_rate: Learning rate, $\geq$ 0.
• momentum: Parameter used to accelerate SGD by smoothing out gradients and dampening oscillations.
Note: It is typically recommended to use 0.5, 0.9, or 0.99 for momentum with variations of the learning rate, however, for the 2D datasets above a small momentum of 0.1 is typically sufficient, since the gradients will be relatively small.
• l2_decay: $L_2$ weight penalty. Regularizer that shrink weights heavily when their values are high, which typically leads to slower convergence, but stronger generalization.
Note: Try different values of l2_decay in the range 0.0001 - 0.5 to see it's effect on the models ability to learn.
• l1_decay: $L_1$ weight penalty.

Adagrad is an optimizer that adapts learning rates based on how frequently a parameter is changed while training. It is similar to SGD with momentum, however, shrinks it's learning rate over time by accumulating all historical gradients at each step.

#### Example

params = {
learning_rate: 0.01,
epsilon: 1e-7,
batch_size: 10
};
trainer = new browsernn.Trainer(model, layers, params);

#### Arguments

• learning_rate: Initial learning rate adapted over time.
• epsilon: Parameter used to smooth gradients and prevent division by zero. Defaults to $10^{-7}$.

Adadelta is an extension of Adagrad, which scales the learning rate based off of recently accumulated gradients instead of all historical gradients like the way adagrad does. This prevents the learning rate from converging to zero, which typically happens after long periods of training with Adagrad.

#### Example

params = {
learning_rate: 1.0,
epsilon: 1e-7,
rho: 0.95,
batch_size: 10
};
trainer = new browsernn.Trainer(model, layers, params);

#### Arguments

• learning_rate: Initial learning rate adapted over time. Defaults to 1.
Note: This parameter should be left at 1.
• epsilon: Parameter used to smooth gradients and prevent division by zero. Defaults to $10^{-7}$.
• rho: Decay factor, corresponding to fraction of gradients to accumulate at each time step. Recommended to use 0.95.

## Conclusion

This library is simply for educational purposes. It is in no way intended for production use, but rather for myself and others to explore how different hyperparameters and optimization methods effect neural networks at a low level - hence this 2D demo.

### Source

For full source code please see: https://github.com/carbonati/browsernn