## Intuition behind Gradient Descent for Machine Learning Algorithms

Before jumping to Gradient Descent, let's be clear about difference between backpropagation and gradient descent. Comparing things make it easier to learn !

##### Backpropagation :¶

Backpropagation is an efficient way of calculating gradients using chain rule.

##### Gradient Descent:¶

Gradient Descent is an optimization algorithm which is used in different machine learning algorithms to find parameters/combination of parameters which mimimizes the loss function.

** In case of neural network, we use backpropagation to calculate the gradient of loss function w.r.t to weights. Weights are the parameters of neural network.

** In case of linear regression, coefficients are the parameters!.

** Many machine learning algorithms are convex problems, so using gradient descent to get extrema makes more sense. For example, if you remember the solution of linear regression :

$ \beta = (X^T X)^{-1} X^T y $

Here, we can get the analytical solution by simply solving above equations. But, inverse calculation has $ O(N^3) $ complexity. It will be worst if our data size increases.