Skip to the content.

Detailed Specification for Module Methods: Single-Layer

__init__()

For a single-layer subclass of Module (such as a Linear or Dense layer) in the mini-torch framework, the __init__() method is responsible for setting up the layer’s dimensions, initializing its trainable parameters using NumPy, and preparing placeholder variables to cache data for manual gradient calculations.

Here is the specification for the __init__() method:

Method Signature

The method should accept the dimensions required to build the weight matrix and bias vector.

def __init__(self, input_dim, output_dim):

Trainable Parameter Initialization

The layer must define its internal weights and biases as NumPy arrays. Because mini-torch uses row-vector (batch-first) notation, the dimensions must be strictly ordered to support x @ W + b.

Caching for the Backward Pass

Because the mini-torch framework requires manual backpropagation, the layer must cache inputs and gradients. The __init__() method must define these state variables and initialize them to None.

Example Implementation

Based on the framework’s core philosophy, the complete __init__() method for a Linear layer looks like this:

def __init__(self, input_dim, output_dim):
    # He Initialization for weights
    self.W = np.random.randn(input_dim, output_dim) * np.sqrt(2.0 / input_dim)
    # Zeros for biases
    self.b = np.zeros((1, output_dim))
    
    # Cache for backward pass
    self.x = None
    self.dW = None
    self.db = None

forward()

The forward() method defines the computation performed at every call of the layer. It takes the input data, applies the layer’s mathematical operations, caches necessary data for the backpropagation step, and returns the output.

Method Signature

def forward(self, x):

Specification

Example Implementation

def forward(self, x):
    self.x = x  # Cache for backward pass
    return x @ self.W + self.b

backward()

The backward() method is responsible for manual gradient calculation using the chain rule of calculus. It computes how much the loss function changes with respect to the layer’s parameters (to update the weights) and with respect to the layer’s inputs (to continue the chain rule backwards to previous layers).

Method Signature

def backward(self, grad_output):

Specification

Example Implementation

def backward(self, grad_output):
    # 1. Calculate and store gradients w.r.t. parameters
    self.dW = self.x.T @ grad_output
    self.db = np.sum(grad_output, axis=0, keepdims=True)
    
    # 2. Calculate and return gradient w.r.t. the layer's input
    grad_input = grad_output @ self.W.T
    return grad_input

Optimizer Interface

The parameters() and grads() methods work in tandem to expose the layer’s internal state to the Optimizer class, allowing the optimizer to update the weights without needing to know the specific details of the layer’s architecture.

parameters()

The parameters() method provides access to the layer’s trainable variables.

Method Signature

def parameters(self):

Specification

Example Implementation

def parameters(self):
    # Returns the weight matrix and bias vector
    return [self.W, self.b]

grads()

The grads() method provides access to the gradients calculated during the backward() pass.

Method Signature

def grads(self):

Specification

Example Implementation

def grads(self):
    # Returns the gradients cached during the backward() pass
    return [self.dW, self.db]

How They Interact with the Optimizer

To understand why this spec is written this way, it is helpful to look at how the Optimizer relies on them. When optimizer.step() is called, it iterates through both lists simultaneously to apply the update rule (like Stochastic Gradient Descent):

# Inside the Optimizer's step() method:
params = module.parameters()
grads = module.grads()

# The strict 1-to-1 alignment allows this simple loop:
for i in range(len(params)):
    params[i] -= self.lr * grads[i]