Cognitive Algorithms Machine Learning Math

Introduction to Machine Learning – Perceptron Algorithm

Perceptron algorithm is one of the ground breaking improvements in Machine Learning, especially in Supervised Learning.

It was invented in 1958 by Rosenblatt and still used as the basis of very powerful algorithms like Neural Networks.

The main idea behind it, is a simple Optimization Problem. The following description belongs to Perceptrons, but it tells a lot about most other machine learning algorithms as well.

  1. Define your error function.
  2. Minimize your error function.

Error Function

where M is the set of misclassified points

To break it down, let’s first take a look at the inside of the parenthesis. The first thing we should understand is the variable w

Weight Vector

Above mentioned weight vector w – or scalar, depending on the dimensionality of our dataset – is everything.

A weight vector decides how important a single data feature is.

Running Man | Physical fitness, Running man, Best physique
a person running

Imagine a data set of 10 000 runners, and we want to predict whether he/she is an amateur or a professional runner.

NameLeg Length (cm)Lung Volume (cm^3)NationalityHair LengthShoe SizeMobile CarrierAmateur
Imaginary data set

Going back to perceptron and its math, first 7 columns represents a feature in our dataset, and the last column is the corresponding class to that item

Here, x1, x2 … x7 corresponds to a a feature in our data set. If we take our first item in data set,

  • x1 will be Joe
  • x2 will be 104 , etc.

To give a more intuitive approach to weights and perceptron,

I suppose some of you have already figured out, that the weight value w2 ( leg length) would have a bigger value than its mobile carrier because it obviously has more effect on the national degree vis a vis athlete’s performance


Suppose we now have a weight vector, not necessarily the most optimal one, and we are inserting the data for test object “Joe” in our Perceptron.

This is a very straightforward process, as one can imagine.

  1. Multiply each w1 and x1 value
  2. Sum up the separate weighted feature values
  3. Use that value as input to a non-linear function f(.)

But what is the non-linear function f we should use with perceptron.?

non-linear function f(.)

It means, that our final function -or predictor, activation function etc. – , must output +1, if the test object is a professional and -1 if an amateur.

But, given our random weight vector, f(x) does not always result to the correct classification

Going back to Error Function

We need to find a way to find an f(x) so perfect, that each test object is classified correctly.

where M is the set of misclassified points
  • Ym is the expected class (+1 or -1)
  • WtXm is our prediction, where W is the vector containing w1,w2..

Since M is the set of misclassified points, wtxmym can never be a positive value, because if ym is +1, wtxm is -1, and reverse. For the sake of simplicity, We are negating the whole term, and get ourselves a nice positive value to minimize.

Now that we have a function of W, that we can minimize, we are almost done

Let’s proceed to training our model…

Gradient Descent

This is the famous step of the famous Gradient Descent algorithm.
  1. Initialize wold – could be a random value or 1/n
  2. While there is a misclassified data point,
    Pick a single “random” point Xm
    Descent in the direction of the gradient at that single point

Learning Rate

mostly effects the convergence speed

  • But, with variable sized learning rate, convergence is proved for non-linearly separable sets as well.
  • Good convergence speed can be achieved by n(t) = 1/t

Problems of Perceptron

  • Each update might lead to new misclassifications.
  • Many solutions might exist
  • Convergence might be slow
  • Only solves linearly separable problems
  • Does not give the same result every time, because xm is chosen randomly

Tips and Notes

  • After a single update error of that data point is always reduced
  • If there is a solution – if the set is linearly separable- perceptron always finds the solution
  • Perceptron solves Binary Separation Problems

Full Batch GD vs Stochastik GD

Weight Update is calculated with all training dataWeight update is the gradient of a single point
More stableFaster
State of Art

Bonus – Perceptron Code

import scipy as sp
import numpy as np
import as io
import pylab as pl

def train_perceptron(X, Y, iterations=200, eta=.1, option=0):
    ''' Trains a linear perceptron
    Definition:  w, b, acc  = train_perceptron(X,Y,iterations=200,eta=.1)
    Input:       X       -  DxN array of N data points with D features
                 Y       -  1D array of length N of class labels {-1, 1}
                 iter    -  optional, number of iterations, default 200
                 eta     -  optional, learning rate, default 0.1
                 option  -  optional, defines how eta is updated in each iteration
    Output:      w       -  1D array of length D, weight vector
                 b       -  bias term for linear classification
                 acc     -  1D array of length iter, contains classification accuracies
                            after each iteration
                            Accuracy = #correctly classified points / N
    assert option == 0 or option == 1 or option == 2
    np.random.seed(1)  # do not change
    acc = np.zeros(iterations)
    # include the bias term by adding a row of ones to X
    X = np.concatenate((np.ones((1, X.shape[1])), X))
    # initialize weight vector
    weights = np.ones((X.shape[0])) / X.shape[0]
    for it in np.arange(iterations):
        # indices of misclassified data
        wrong = (np.sign( != Y).nonzero()[0]
        # compute accuracy acc[it] (1 point)
        # ... your code here

        acc[it] = (X.shape[1] - wrong.shape[0]) / X.shape[1]

        if wrong.shape[0] > 0:
            # pick a random misclassified data point (2 points)
            point = wrong[np.random.randint(0, wrong.shape[0], 1)]
            xm = X.T[point][0]
            ym = Y[point]

            # update weight vector (using different learning rates )
            if option == 0:
                # ... your code here
                new_eta = eta / (1 + it)
                change = (xm * ym)
                weights += new_eta * change
            elif option == 1:
                new_eta = eta
                change = (xm * ym)
                weights += new_eta * change

            elif option == 2:
                new_eta = eta * (1 + it)
                change = (xm * ym)
                weights += new_eta * change

    b = -weights[0]
    w = weights[1:]
    # return weight vector, bias and accuracies
    return w, b, acc

1 thought on “Introduction to Machine Learning – Perceptron Algorithm”

Leave a Reply

Your email address will not be published. Required fields are marked *