Train neural networks in your browser!

Aug. 5, 2018


Inspired by Keras, Andrej Karpathy, and dynet, browserNN.js allows you to train deep neural networks in your own browser. The demo below offers 2D datasets (left canvas) to train neural networks on. The canvas on the right visualizes the transformed feature space of 2 neurons at a given layer in the network. You can make changes to the code below to train different networks and hyperparameters. You can also choose to visualize different layers using the dropdown box under the right canvas. Documentation on how the library works is below. Have fun!


Web browser not supported. Try google chrome :)
Web browser not supported. Try google chrome :)


The red and blue points are used to train the neural network, while the rest of the pixels in the left canvas are used to predict which class they should belong to. Every $^1/_{10}$ of a second the network updates it's weights and makes new predictions on data from the left canvas. The canvas on the right is a direct visual of what two neurons look like at a given layer in the network. For example, the demo is defaulted to the first two neurons of the $2^{nd}$ layer in the network after applying a tanh activation function. The neurons $\boldsymbol{h}^{(2)}_0$ and $\boldsymbol{h}^{(2)}_1$ come from the hidden layer, $$\boldsymbol{h}^{(2)}=f(\boldsymbol{W}^{(2)}\boldsymbol{h}^{(1)}+b^{(2)})$$, where $h^{(l)}$ is the output vector of layer $l$, $\boldsymbol{W}^{(l)}$ is the weight matrix from layer $l$, $b^{(l)}$ is the layers bias parameter, and $f$ is the tanh function. The red and blue points lie on a 2-dimensional grid where each line represents a strip of pixels that run along the x or y-axis of the left canvas. These lines help us see how the feature space stretches and bends in each layer of the neural network.

Dense Layer

Every dense layer is decomposed into two seperate layers:

  1. The linear transformation from the previous layer to the current.
    • $\boldsymbol{z}^{(l)}=\boldsymbol{W}^{(l)} \boldsymbol{h}^{(l-1)}+b^{(l)}$
  2. The hidden output after applying an activation function to the linear transformation.
    • $\boldsymbol{h}^{(l)}=f(\boldsymbol{z}^{(l)})=f(\boldsymbol{W^{(l)}}\boldsymbol{h}^{(l-1)}+b^{(l)})$
This allows us to see the before and after effects of applying activation functions.


If you select the $1^{st}$ layer to visualize you'll notice that it looks no different from the data in the left canvas. This is simply because the network has not yet applied a nonlinear activation function. Once we apply at least one nonlinear function to our data we'll begin to see the data stetch and bend in the network.

Now if we select the Output layer you'll notice that if the model is able to classify the data correctly on the left - it has actually pulled the red and blue points completely apart from each other on the right to the point where you could draw one straight line in between them to accurately classify the two groups. This is the beauty of what neural networks are trying to accomplish :)



Tensor is the building block of the library. It creates a 3 dimensional matrix and stores it's weights, derivatives, and dimension details. All inputs, weight matrices, and outputs must be a Tensor class.


tensor = new browsernn.Tensor(1, 2, 1, 0.0); // 1x2x1 tensor initialized with zero
tensor = new browsernn.Tensor(28, 28, 3); // 28x28x3 tensor with randomly initialized values

var data = [[10,20,30],[40,50,60]];
tensor = new browsernn.Tensor(2, 3, 1, data); // 2x3x1 tensor initialized with data values

tensor.w[4]; // returns 50
tensor.dw[4]; // returns 0 since no derivative has been computed yet



The object Tensor returns has the following properties:


Model() is used to instantiate a browserNN network.

model = new browsernn.Model();


Each layer must be passed into a list as an object containing a set of parameters to define the layer. Each network must start with an input layer and end with a loss layer like softmax, SVM, etc.


layers = [];

// multilayer perceptron
layers.push({type: 'input', in_n: 1, in_d: 2, in_depth: 1});
layers.push({type: 'dense', n_neurons: 8, activation: 'relu'});
layers.push({type: 'dense', n_neurons: 4, activation: 'relu', drop_prob: 0.4});
layers.push({type: 'softmax', n_classes: 2});


Layer types


The input layer must always be defined at the start of a network, which indicates what the expected input shape will be when passed into the model. If no parameter for in_depth is passed the model will assume tensor of size $(n, d, 1)$.

layers.push({type = 'input', in_n: 28, in_d: 28}); // 28x28 input matrix


The dense layer acts as a densly-connected layer that returns the output $\boldsymbol{h} = f(\boldsymbol{W} \cdot \boldsymbol{x} + b)$ where $\boldsymbol{W}$ is the weight matrix created by the current layer, $\boldsymbol{x}$ is the input vector from the previous layer, $b$ is the bias parameter created by the current layer, and $f$ is the activation function. The activation functions currently supported are: linear, sigmoid, relu, and tanh.

layers = [];
layers.push({type = 'dense', n_neurons: 64, activation: 'sigmoid'});
layers.push({type = 'dense', n_neurons: 64, activation: 'relu'});
layers.push({type = 'dense', n_neurons: 64, activation: 'tanh'});
layers.push({type = 'dense', n_neurons: 64, activation: 'linear'});


The last layer of each network must be a loss layer, one of which can be softmax. For classification the softmax layer outputs a probability for each of the N number of classes defined by n_classes and predicts the class with the highest probability.

layers.push({type: 'softmax', n_classes: 10}); // predicts among 10 classes


Similar to the softmax layer, SVM creates a Support Vector Machine layer that outputs scores for the n_classes passed through.

layers.push({type: 'SVM', n_classes: 2});


Trainer() is the last argument needed to train a model. This class takes in the initialized model, layers, and parameters and begins training once the method .fit() is called. The optimizers currently supported are SGD, Adagrad, Adadelta, and Adam.


trainer = new browsernn.Trainer(model, layers, params);

Arguments common to all optimizers


Stochastic Gradient Descent optimizer with support for momentum.

params = {
	optimizer: 'SGD',
	learning_rate: 0.01,
	momentum: 0.9,ls
	l2_decay: 0.01,
	batch_size: 10
trainer = new browsernn.Trainer(model, layers, params);



Adagrad is an optimizer that adapts learning rates based on how frequently a parameter is changed while training. It is similar to SGD with momentum, however, shrinks it's learning rate over time by accumulating all historical gradients at each step.


params = {
	optimizer: 'adagrad',
	learning_rate: 0.01,
	epsilon: 1e-7,
	batch_size: 10
trainer = new browsernn.Trainer(model, layers, params);



Adadelta is an extension of Adagrad, which scales the learning rate based off of recently accumulated gradients instead of all historical gradients like the way adagrad does. This prevents the learning rate from converging to zero, which typically happens after long periods of training with Adagrad.


params = {
	optimizer: 'adadelta',
	learning_rate: 1.0,
	epsilon: 1e-7,
	rho: 0.95,
	batch_size: 10
trainer = new browsernn.Trainer(model, layers, params);



This library is simply for educational purposes. It is in no way intended for production use, but rather for myself and others to explore how different hyperparameters and optimization methods effect neural networks at a low level - hence this 2D demo.


For full source code please see: