Skip to the content.

Mini-Torch Framework Specification

This document outlines the architecture for a “Mini-Torch” framework, designed for computer science students building neural networks from scratch. It mirrors the PyTorch API using only numpy, matplotlib, select elements of scipy, and standard Python, emphasizing manual gradient calculations and batch-first row-vector notation.

To improve encapsulation and modularity, this framework incorporates the following major architectural elements: an Optimizer base class (parameter updating), a Loss base class (error and initial gradient calculation, a Dataset/DataLoader pipeline (base classes to structure and iterate through data), a Module base class (layers, forward and backward passes), and a Sequential container (Module subclass that manages chaining of multiple layers).

Core Philosophy

To bridge the gap between foundational mathematics and the modern architecture of Generative AI (LLMs, GANs), this course utilizes a “Mini-Torch” Framework.

Allowable Libraries

Object-Oriented Principles and Design Patterns

The revised architecture relies heavily on established software design patterns to ensure the framework is modular, scalable, and easy to maintain.

Using these patterns means that it is possible to first implement simplified versions of many of these classes, later adding more elaborated version without needing to change other classes.

UML Framework Architecture

classDiagram
    class Module {
        <<abstract>>
        +forward(x)
        +backward(grad_output)
        +parameters() list
        +grads() list
    }

    class Sequential {
        -modules : list~Module~
        +forward(x)
        +backward(grad_output)
        +parameters() list
    }

    class Linear {
        -W : array
        -b : array
        +forward(x)
        +backward(grad_output)
    }

    class Activation {
        <<abstract>>
        +forward(x)
        +backward(grad_output)
    }

    Module <|-- Sequential
    Module <|-- Linear
    Module <|-- Activation
    Sequential o-- Module : contains

    class Optimizer {
        <<abstract>>
        #params : list
        #lr : float
        +zero_grad()
        +step()
    }

    class SGD {
        +step()
    }

    class AdamW {
        +step()
    }

    Optimizer <|-- SGD
    Optimizer <|-- AdamW
    Optimizer --> Module : modifies parameters

    class Loss {
        <<abstract>>
        +forward(predictions, targets)
        +backward()
    }

    class MSELoss {
        +forward(predictions, targets)
        +backward()
    }

    class CrossEntropyLoss {
        +forward(predictions, targets)
        +backward()
    }

    Loss <|-- MSELoss
    Loss <|-- CrossEntropyLoss

    class Dataset {
        <<abstract>>
        +__len__()
        +__getitem__(idx)
    }

    class DataLoader {
        -dataset : Dataset
        -batch_size : int
        -shuffle : bool
        +__iter__()
    }

    DataLoader o-- Dataset : iterates over

Core Component Specifications

The Neural Network Hierarchy (Module, Sequential, and Activation)

The Module base class is the foundational building block of the neural network. Every layer (detailed information) must implement an __init__(), forward(x), and a backward(grad_output) method:

Sequential Container (detailed information)

Activation Layer (detailed information)

The Activation abstract class specializes the Module interface to act as a blueprint for parameter-free, non-linear transformations, such as ReLU, Sigmoid, or GELU. Because an activation layer’s sole purpose is to apply a mathematical function element-wise to the outputs of a preceding linear layer, its implementation of the Module interface is specialized in the following ways:

The Optimization Engine (Optimizer) (detailed information)

The Optimizer base class handles mathematical optimization of model parameters.

The Error Calculation (Loss) (detailed information)

Loss functions quantify the difference between model predictions and target values and initiate the backpropagation process.

Management (Dataset and DataLoader)

These classes separate data handling logic from the main training loop.

The Standard Training Loop

With the revised architecture, students will implement a clean training loop that perfectly maps to the standard PyTorch workflow:

  1. Iterate over epochs.
  2. Iterate over batches yielded by the DataLoader.
  3. Forward Pass: Pass the batch through the Sequential model to generate predictions.
  4. Loss Calculation: Pass predictions and targets to the Loss object’s forward method.
  5. Zero Gradients: Call optimizer.zero_grad().
  6. Backward Pass: Call loss.backward() to get the initial gradient, then pass it to model.backward(grad_output) to calculate all internal gradients.
  7. Parameter Update: Call optimizer.step() to adjust the weights.

Important References and Further Reading

The following resources are highly recommended for students to deepen their understanding of the design patterns and architectural concepts used in this framework.