Why Non Linear Activation Function is used in neural network and Why sigmoid function is linear Model
Hi, Today i am going to explain about why we are using Non-Linear activation function in neural network in both mathematical and visually,In next upcoming series of blog i will explain why is first hidden state layer very much important and why are we only using linearly separable Non Linear activation function, and How can we approach Non-Linearly separable Non-Linear activation function in single perceptron.
Lets start ,First we have to know what happen when we do arthematic operation in linear and non linear function.
Linear Addition
when we add two linear function we get other linear function
(2x+1)+(3x+1) = 5x+2
Linear Multiplication
when we multiple two linear function we non-linear function (2x+1)*(3x+1) = 6x²+5x+1
Non Linear Addition
As you can see that Non Linear to Non Linear gives Non Linear function and addition does not change the Non Linear order or shape of original bell curve, it just moving and expand or shrink the Non Linear function.
Non Linear Multiplication
(2x²+1) * (3x²+1) = 6x⁴+5x²+1
As you can see Multiplication changing the order or shape of curve
linear + linear = linear
linear * linear = linear
non linear * linear = non linear
non linear + linear = non linear
if you want to know exactly what is linear function and non linear function watch this video .
Before going into neural network you have know the Logistic Regression and i am not going to explain it here is the reference link- https://medium.com/@vigneshgig/machine-learning-classification-using-logistic-regression-mathematical-concept-220c0103f5cc,
https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc,
but I m going to explain why logistic function is linear separable.I will explain using sigmoid Function(Logistic function).
what is hyperplane?
According to wikipedia , In geometry, a hyperplane is a subspace whose dimension is one less than that of its ambient space. If a space is 3-dimensional then its hyperplanes are the 2-dimensional planes, while if the space is 2-dimensional, its hyperplanes are the 1-dimensional lines.
Topology and Manifold
If your dataset is non-linear separable to make it linear separable you have to plot data in N+1 higher dimension to make linearly separable as we can see from below diagram.In svm the kernel exactly do this job to make the non-linear separable dataset into linear separable by increasing the N+1 dimension.I am not going explain it detail,If you interested in it you check out this blog http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ and svm kernel trick.
Why Logistics Non -Linear function is linear separable model?
Why we are using Non Linear Activation Function in Neural Network ?
No Activation Funtion Neural Network
Here i added two hidden state ,After deriving the equation i get linear equation So if no activation function means a neural network can solve only a linear problem it cant solve a non linear complex problem.In real time all the problem are Non linear.If we add more and more hidden state without activation function it only going to increase the learning speed of linear problem.I show you an example
Here i used no hidden state and no activation function,lets how much time taken to get 100% accuracy
Epoch 242/500
100/100 [==============================] - 0s 150us/step - loss: 0.0312 - acc: 1.0000
It taken 242 epoch to get 100% accuracy,
Now i used 2 hidden state
Epoch 51/500
100/100 [==============================] - 0s 250us/step - loss: 0.5813 - acc: 1.0000
As we can see that it just taken 15 epoch to get 100% accuracy
And Now i used 1 hidden state and sigmoid Activation Function,
Epoch 2/500
100/100 [==============================] - 0s 250us/step - loss: 0.5813 - acc: 1.0000It takes only 2 epoch to get 100 accuracy
Activation Function
Here I used one hidden state and 3 neuron in hidden state
Instead of we solve this equation(o) separatly by split it,So that it is easy to understand How a activation function work in a neural network.
playground.tensorflow.org ,You can learn many thing about neurak network So I recommand to play with playground.tensorflow.org
XOR Examples
Coding
Epoch 1000/1000
400/400 [==============================] - 0s 90us/step - loss: 0.2501 - acc: 0.3600
With Activation Function(sigmoid)
Epoch 521/1000
400/400 [==============================] - 0s 113us/step - loss: 0.3231 - acc: 1.0000Epoch 1000/1000
400/400 [==============================] - 0s 130us/step - loss: 0.0609 - acc: 1.0000
As we can see if we convert the non linear dataset N dimension into N+1 dimension it can be linearly separable using some kernel or polyminal function or neural network.Neural network exactly doing this process by converting non linear function into linear separable function in higher dimension in classfication problem.