Gradient Descent
Definition
Gradient descent is an optimization algorithm used to find the local minimum of a function. In machine learning, it is used to minimize the model's "loss" or "error" by iteratively adjusting the model's parameters (weights and biases).
Why It Matters
Gradient descent is the engine of learning in most machine learning and deep learning models. It provides a systematic way to adjust the model's parameters to make it more accurate.
Contextual Example
Imagine you are on a mountain in a thick fog and want to get to the lowest point. You would feel the slope of the ground under your feet (the gradient) and take a small step in the steepest downhill direction. You repeat this process until you reach the bottom. Gradient descent works the same way, "stepping" down the "slope" of the loss function.
Common Misunderstandings
- The "learning rate" is a hyperparameter that controls how big of a step the algorithm takes at each iteration.
- Stochastic Gradient Descent (SGD) and its variants (like Adam) are common forms used in deep learning that use a small batch of data at each step instead of the whole dataset.