Skip to the content.

Detailed Specification for Module Methods: Activation

In the mini-torch framework, the Activation class is an abstract base class that inherits from Module. It serves as a blueprint for element-wise, non-linear transformations (such as ReLU, Sigmoid, or GELU) that are typically placed between linear layers to allow the network to learn complex, non-linear decision boundaries.

From an object-oriented design perspective, the Activation class specializes the Module interface by enforcing the rule that activation functions contain no trainable parameters (no weights or biases). Therefore, its primary responsibilities are applying a mathematical function during the forward pass and computing the local derivative during the backward pass.

Here is the specification for the Activation base class and how it is subclassed:


__init__()

The constructor for an activation layer is drastically simplified compared to a Linear layer because it does not need to define input or output dimensions, nor does it initialize weight matrices.

Method Signature

def __init__(self):

Specification

Example Implementation

def __init__(self):
    super().__init__()
    self.x = None

forward()

The forward method applies the specific non-linear mathematical function element-wise across the entire input array.

Method Signature

def forward(self, x):

Specification


backward()

The backward method computes the gradient of the loss with respect to the layer’s input by applying the chain rule.

Method Signature

def backward(self, grad_output):

Specification


parameters() and grads()

Because activation layers have no learnable weights, they must override the Module base class methods to return empty lists. This prevents the Optimizer from attempting to update non-existent parameters.

Method Signatures

def parameters(self):
def grads(self):

Specification


Concrete Subclass Example: ReLU

To illustrate how students will use this base class, here is the full implementation of a ReLU (Rectified Linear Unit) activation layer. ReLU outputs the input directly if it is positive, and outputs zero if it is negative. Its derivative is 1 for positive inputs and 0 for negative inputs.

class ReLU(Activation):
    def __init__(self):
        super().__init__()
        
    def forward(self, x):
        self.x = x
        # Element-wise maximum: threshold negative values to 0
        return np.maximum(0, x)
        
    def backward(self, grad_output):
        # Derivative of ReLU is 1 where x > 0, and 0 where x <= 0
        local_derivative = (self.x > 0).astype(float)
        
        # Chain rule: multiply local derivative by incoming gradient element-wise
        grad_input = local_derivative * grad_output
        return grad_input
        
    def parameters(self):
        return []
        
    def grads(self):
        return []