AI-related math concept and core concepts of AI optimizers.

Key Math Python Libs for AI

Numpy

A library for numerical computing in Python. It provides support for arrays, matrices, and mathematical functions.
Numpy handles large datasets and mathematical operations efficiently.

Example of usage of Numpy.

import numpy as np
# Create a NumPy array
array = np.array([1, 2, 3, 4, 5])
print("NumPy Array:", array)

Pandas

A library for data manipulation and analysis. It provides data structures like DataFrames and Series.

Pandas simplifies data manipulation(ELT), exploration, and data analysis.

Example of usage of Pandas.

import pandas as pd
# Create a Pandas DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print("Pandas DataFrame:\n", df)

Calculus

Calculus is the backbone of the optimization of AI algorithms. The key concepts for AI Calculus including:

Derivatives
Gradients
Optimization

Derivatives measure how a function changes when the inputs change.

Gradients are the generalization of derivatives for multivariable functions.

Optimization is to find the minimum and maximum of a function.

Derivative

The derivative of a function f(x) measures “the rate of change” of f with respect to x. In simpler terms, it tells us how quickly the function is changing at that point.
Geometric Interpretation: The slope of the tangent line to the curve at a point.

Real-life example

Imagine driving on a highway and your speedometer shows how many kilometres you travel per hour. The speed is the rate of change of your position. If your position is a function of time, the speed is the derivative of the function.

Example of calculating derivatives in Python

import numpy as np
import matplotlib.pyplot as plt

# Define a function
def f(x):
    return x**2

# Define its derivative
def df(x):
    return 2*x

# Plot the function and its derivative
x = np.linspace(-10, 10, 100)
plt.plot(x, f(x), label="f(x) = x^2")
plt.plot(x, df(x), label="f'(x) = 2x")
plt.legend()
plt.title("Function and Its Derivative")
plt.show()

Gradient

A gradient is a vector that represents the direction and rate of the maximum increase of a scalar function. It tells how things change in that direction. It tells us both the direction and rate of the fastest change. Geometric Interpretation: Points in the direction of the steepest ascent of the function.

Real-life example: Temperature Distribution in a Room
Imagine you’re in a room where the temperature varies from place to place. The temperature at any point in the room can be described by a function T(x, y, z), where x, y, and z are the coordinates in the room. The gradient of the temperature function \nabla T at any point tells you:

Direction: The direction in which the temperature is increasing the fastest. If you follow this direction, you’ll get to the warmest spot the quickest.

Rate of Increase: The magnitude of the gradient tells you how quickly the temperature changes in that direction. A large gradient means the temperature changes rapidly over a short distance, while a small gradient means the temperature changes slowly.

For example, if you’re feeling cold and want to move to a warmer spot, you should move in the direction of the gradient of the temperature function. If you want to stay in a place where the temperature is relatively constant, you should move perpendicular to the gradient.

Python code for this example

import numpy as np
import matplotlib.pyplot as plt

# Define the temperature function
def T(x, y):
    return 20 + 5 * np.exp(-0.1 * (x**2 + y**2))

# Create a grid of points
x = np.linspace(-5, 5, 20)
y = np.linspace(-5, 5, 20)

X, Y = np.meshgrid(x, y)

# Compute the temperature values
Z = T(X, Y)

# Compute the gradient of the temperature function
grad_T_x, grad_T_y = np.gradient(Z, x, y)

# Create a contour plot
plt.contourf(X, Y, Z, levels=20, cmap='coolwarm')
plt.colorbar(label='Temperature (°C)')

# Add the gradient vectors as a quiver plot
plt.quiver(X, Y, grad_T_x, grad_T_y, color='white')

# Add labels and title
plt.xlabel('x')
plt.ylabel('y')
plt.title('Temperature Distribution and Gradient Vectors')
plt.show()

Relationship between the derivative and the gradient

The derivative and the gradient are closely related but used in different contexts.
The derivative is a scalar that represents the rate of change of a function for a single variable. The gradient, on the other hand, is a vector that represents the rate of change of a function for multiple variables.

Role of Gradients in AI Optimizer

In machine learning and deep learning, the gradient is the core concept of optimization algorithms, esp. when training models.

As I showed the concept of the gradient previously, the gradient represents the derivative of the loss function to the model parameters. It indicates the direction in the parameters space where the loss function changes most rapidly. By calculating the gradients, the optimization algorithms can adjust the model’s parameters to gradually reduce the value of the loss function, thereby improving the model’s performance.

Gradient Descent (GD)

Gradient Descent is an optimization algorithm used to find the local minimum of a differentiable function. In machine learning, it’s widely used to train models by iteratively adjusting the model parameters to minimize the loss function.

How GD Works

Initialize the Parameters: Starts with a random set of model parameters
Compute the Gradient: Calculate the gradient of the loss function for the current parameters. The gradient is a vector that points in the direction where the loss function increases most rapidly.
Update parameters (Gradient Descending): Update the parameters in the opposite direction of the gradient. (i.e. the direction where the loss function decreases most rapidly). The step size of the update is controlled by the Learning Rate (LR).
Iterate: Repeat the process until the loss function converges to a local minimum.

Gradient Descent Formula

The Gradient Descent algorithm is used to minimize a function (loss function) by iteratively moving in the direction of the negative gradient. The formula for updating the parameters is:

Where:

θold: The current value of the parameter (e.g., weights in a model).
θnew: The updated value of the parameter after one iteration.
η: The learning rate, a hyperparameter that controls the step size of the update.
∇J(θold): The gradient of the cost function J with respect to θ.

Code Example: Implementing GD with Python

import numpy as np

# Define a function and its gradient
def f(x):
    return x**2

def df(x):
    return 2*x

# Gradient Descent
def gradient_descent(starting_point, learning_rate, num_iterations):
    x = starting_point
    for i in range(num_iterations):
        grad = df(x)
        x = x - learning_rate * grad
        print(f"Iteration {i+1}: x = {x}, f(x) = {f(x)}")
    return x

# Run gradient descent
starting_point = 10
learning_rate = 0.1
num_iterations = 20
optimized_x = gradient_descent(starting_point, learning_rate, num_iterations)
print("Optimized x:", optimized_x)

Example output:

Observation
The goal is to find the value of x that minimizes the cost function f(x). In the example, the cost function is f(x)=x², and its minimum occurs at x\=0. After 20 iterations, the parameter “x” is optimized to a much smaller value “0.115", At this point, f(x)=(0.0115)²=0.000133, which is very close to the minimum value of 0.

Why Is GD Important in AI?

In machine learning, the cost function/loss function J(θ) represents the error of the model (e.g., mean squared error for regression).
The goal of training a model is to find the parameters θ that minimize the loss/cost function J(θ).
Gradient Descent is used to iteratively update the parameters θ to achieve this goal.
The "optimized" parameters are the ones that result in the best performance of the model.

Summary

Today in my article about AI Calculus, I explained the derivative, gradient, and gradient descent. Also, this article covers what’s Gradient Descent and why these mathematical concepts are important in AI. And I also show these concepts with Python code. I hope this article is easy to understand and you enjoy AI learning so far, I will uncover the veil of Machine Learning and some hands-on exercises on Data Processing in Machine Learning:) See you next time!

Calculus for AI