Different types of vector norms used in machine learning

Vector norm or distance is the length or magnitude of the vector and they are of many types such as L0, L1 (Manhattan) norm, L2 norm, L-infinity or max norm.
Emroj Hossain
Thu Jan 30 2020

Machine learning algorithms quite often need to calculate vector norms to find the difference between the predicted and the actual results. Vector norm is nothing but the magnitude of the vectors. Basically the norm of a vector is a non-negative number that represents the extent of the vector in the space. There are many types of norms depending on the way in which they are calculated. Here, I will discuss different types of vector norms used in machine learning or in deep learning and will also discuss the way to calculate them in python. But just for simplicity, I am first summarizing the different types of vector norms used in machine learning and the will discuss them in detail.

Different types of vector norms

L0 norm: L0 norm denotes the number of non zero elements in a vector.

Lp or p-norm: it denotes a class of norms having similar formula but different p-value.

L1 norm (Taxicab or Manhattan norm): L1 norm is the sum of absolute values of the vector elements.

L2 norm: L2 norm is calculated by taking the square root of the sum of the squares of the components or the elements of the vector.

L-infinity or max norm: L-infinity norm is calculated by taking the largest absolute value among the vector elements.

Other norms: There are many other types of norms also such as absolute-value norm, matrix norms.

Now let me discuss each type of norms in details including how to calculate it in python and where it is used in the machine learning or in the deep learning applications.

Vector L0 norm

L0 norm denotes the total number of non-zero elements in a vector. It is not a norm in the usual sense as it does not satisfy all the properties of the norm. Let us take some example to understand its properties.

The there vectors (0, 0), (3, 0), (4, 5) have L0 norm 0, 1, and 2, respectively, because the first vector has zero, the second vector has one and the third vector has 2 non zero elements.

Python can be used to find the norm of the vector using the function norm(vec, p) wich is defined in the linear algebra (linalg) package of numpy. The first parameter of the function is the array of elements of the vector of which norm is to be calculated and the second parameter (p) is the order of the norm for L0, L1, and L2 the value of p is 0, 1, and 2, respectively

#A program to check login using L0 norm
import numpy as np #importing numpy package
from numpy.linalg import norm #importing norm package

# define function to check login

#Output:
#Login successful

Lp norm or p-norm

Most commonly vector norms belong to the family of Lp norm or p-norms that are defined as-

$$||x||_p=(\sum_{i=1}^{n}{|x_i|^p})^{\frac{1}{p}}$$

For p>0 it defines vector norm the L1, L2 etc norms are the members of Lp norm family.

Vector L1 norm (Taxicab or Manhattan norm)

L1 norm is the sum of magnitude or absolute values of the elements of the vector and is also known as Taxicab or Manhattan norm. It can be defined in the following form.

$$||X||_1 = \sum_{i=1}^{n}{a_i} = |a_1|+|a_2|+.....+|a_n|$$

In this approach to calculate the norm each components have equal weights on the norm.

As an example, for a two-component vector a=(4, 3), the L1 norm will be 7 as shown in the figure below.

$$||a||_1=|4|+|3|=7$$

L1 is a general way to measure the distance by traversing the path along one axis and then by other starting from (0, 0) to the destination point (4, 3). This norm is used in the AdderNet where instead of multiplication addition is used to build CNN type deep neural network. The L1 norm can be calculated in Python programming by using norm function and passing the value 1 as the second parameter of the norm function as shown in the example below.

# A program to calculate L1 norm
import numpy as np #importing numpy package
from numpy.linalg import norm #importing norm package
# set vector
#calculate L1 norm
vecNorm=norm(vec1, 1) # The second parameter 1 for L1
print('The L1 norm of the vector {} is {}'.format(vec1, vecNorm))

#Output:
#The L1 norm of the vector [4 3] is 7.0

Vector L2 norm

L2 norm is the sum of absolute values of the vector elements. It is also the shortest distance to reach from one point to another and also known as Euclidean norm. It is defined as the following form

$$||x||_2=\sum_{i=1}^{n}(|x_i|^2)^{\frac{1}{2}}=\sqrt{a_1^2+a_2^2+.....+a_n^2}$$

For the same vector a=(4, 3) L2 norm will be 5 and represented by the shorted distance in the image

$$||a||_2=\sqrt{4^2+3^2}=\sqrt{25}=5$$

The L2 norm can be found by passing value 2 as the second parameter of the norm() function.

# A program to calculate L2 norm
import numpy as np #importing numpy package
from numpy.linalg import norm #importing norm package
# set vector
#calculate L1 norm
vecNorm=norm(vec1, 2) # The second parameter 2 for L2
print('The L2 norm of the vector {} is {}'.format(vec1, vecNorm))

#Output:
#The L2 norm of the vector [4 3] is 5.0

Most of the deep learning models use L2 norms to calculate the difference between the predicted result and reality. The CNNs use L2 norm to train the model to minimized the difference between the filter and the features present in the data.

Vector L-infinity or max norm

L - infinity or max norm is the magnitude or absolute value of the larges element of the vector. For a vector a=(4, -8, 6, 3) the L-infinity norm would be 8.

$$||x||_\infty=max_i(|x_i|)=max(|a_1|, |a_2|, ......., |a_n|)$$
$$||a||_\infty=max(|4|, |-8|, |6|, |3|)=8$$

# A program to calculate L-infinity norm
import numpy as np #importing numpy package
from numpy.linalg import norm #importing norm package
# set vector
#calculate L-infinity norm
vecNorm=norm(vec1, np.inf) # The second parameter np.inf for L-infinity
print('The L-infinity norm of the vector {} is {}'.format(vec1, vecNorm))

#Output:
#The L-infinity norm of the vector [ 4 -8  6  3] is 8.0

L-infinity norm is mostly used to which the element is of the vector mostly contribute.

Why deep learning is popular now though existed long before?

Deep learning has a long and rich history and started back to 1940s but gained popularity just recently and mostly in this decade