Prerequisites You might be interested in my related articles on Matrix Multiplication and Concurrency before reading further!
Introduction My goal for this project was to train a simple 2-layer Multi-layer perceptron by creating a Deep Learning Framework.
Typically representing models with a Dynamic Acyclic Graph (DAG) provides a better user debugging experience, so data and calculations had to flow at runtime.
So, I gave myself the following restrictions:
Learn and incorporate as many design patterns and C++20 features as I could Use little to no dependencies in my code This is what a sample training loop might look like with my library:

1. Introduction First lets begin by defining a piece of system software called the Operating System (OS), which is responsible for orchastrating the sophisticated resource management of a given machine’s hardware as well as providing an abstracted interface for software to be built above.
At the time of me writing this article, I have a web browser open, my spotify playlist on, as well as my VS code editor and a terminal open.

Vector Space A vector space is set of mathemetical objects that can be multiplied and added together to produce objects of the same kind. This notion of vector spaces proves to be a very useful framework for extending methods and structures to very different types of problems. A few special types of vector spaces you may be already be familiar with:
Function Spaces We can add functions together and scale them as well.

Introduction Remember the good ol days when 6 x 5 easily made sense as adding 6 together with itself 5 times and whala you ended up with 30. Now you’re in college and things are hard 😭 Hopefully running through an example can give you a bit of a glimpse as to how and why we do matrix multiplication.
How to Compute Matrix Multiplication Consider two Matricies A and B. We denote the dimensions by row and columns, in that order.

Since the concept of “n choose k” seems to appear a lot in my life I decided I would make a quick post explaining the intuition behind it. Let’s start with a simple example.
Say we had a set of three greek characters representing the names of three friends, \( F = \{ \alpha, \beta, \gamma \}\) and we are interested in knowing how many uniquely paired matches could be played between two competitors of the friends in table tennis.

Introduction Since signals are sets of data or information and systems process said data, we are interested in the analysis of systems. When we deal with a special type of system that contains the properties of linearity and time-invariance, we are able to construct methods of analysis that are extremely useful for Linear Time-invariant (LTI) systems. Fourier analysis, which will be a seperate blog post, and the convolution integral are examples of exploiting system properties to decompose inputs into basic signals which are easy to work with analytically.

I hope this article serves as a basic introduction to the terminology of probability theory!
Random Variables Considering that an experiment is a procedure that produces well defined outcomes, like taking a course and finishing with a certain grade letter, we see that a random variable is a function which maps random outcomes from experiments to numerical values \(X : \Omega \to R \) . The set of all possible numerical values attainable is called the support of the random variable.