Re-learning AI/ML from scratch

The why

I've recently been looking at LLMs, and I think I somewhat have an idea about their architecture at least at a high level. This kinda feels like how I used to write kernels before I went to uni, I had a lot of fun and thought I understood stuff, but I only really acquired a decent understanding after taking all those systems classes at uni. I can imagine its the same for AI/ML. I could look at one or two things that I think are important or relevant, but having a broad overview of the field first will probably help a lot. I cant really say that I "understand" stuff in ML, I just know that they exist. If I hav learned one thing in systems, it's that really understanding what u r working with helps IMMENSELY. For ML, I want to understand how the science, software tools (libraries etc), and the hardware work in detail. I'll start with the science.

The what

I'm gonna follow CMU courses, cuz most of them have at least their lecture slides available publicly. For now, I plan to do these in sequence:

Update: Other great classes and resources below!!!

That being said, I'm not just gonna look at the lecture slides/videos/readings, I'll very likely look outside too. The courses are more like a structured index of topics that are probably helpful to know about. The most important thing would be to implement everything I learn, cuz if I cant implement it then I probably don't understand it. Here's an overview of each class and what they cover (as I understand them):

15-281 AI

The broad topics as seen on the schedule are:

Agents, search spaces & problems
Uninformed and Informed search
Adversarial Search
Classical Planning
Constraint Satisfaction Problems (sounds like linear programming?)
Convex Optimization
Integer Optimization
Bayesian Networks
Markov Chains, Hidden Markov Models
Markov Decision Processes
Reinforcement Learning
Game Theory

There's only one to four lectures on each of these, so they're not covered in depth (which makes sense, this is an introductory class). It does look like I will have to dust off my probability textbook, I took it in my freshman year and don't remember much.

10-315 Intro to ML

Oh boy. I've "officially" taken this class (I took 281 too but dropped it right at the end, and I dont remember much from it). But as always, I wasn't able to learn as much as I'd liked to have from "taking" the class, so I'll do it again myself. This was just last sem so I'm a lot more familiar with the topics:

KNN
Decision Trees
Linear Regression
Kernel Regression
Bayes Optimal Classifier
MLE/MAP
Logistic Regression
Ensemble Methods
MLPs
CNNs
PCA, clustering
Gaussian Mixture Models
SVM
Learning Theory

Again, only one to three lectures on each topic, so a very broad class. Which, I think, is gud for an intro to ML class. This one is gonna be fun.

11-785 Intro to DL

I've only heard stories of suffering from people who took this class. Here's what I think the class covers from looking at the lecture topics:

ANN training (backpropagation, SGD, momentum, normalization, dropout, etc)
CNNs
RNNs and LSTMs
LMs, transformers
VAEs
Diffusion
GANs
Graph neural nets

Many lectures are spent on ANN basics like training, convergence issues, normalization, dropout, etc. Fewer lectures spent on specific architectures. Seems like a really good class to learn the basics. The recitations and the bootcamps seem to be as important as the lectures, and I'm guessing the homeworks and quizzes is where all the pain comes from. It's good that implementation is emphasized as much as the theory is, that's the kind of class I'm familiar with from systems land.

10-714 DLsys

This was a class I came across,, I think on youtube? (sometimes the recs r good). I thought this would be a perfect class for me to learn about DL, cuz I love systems and it has both "deep learning" and "systems" in its title.

Backprop, autograd
NN library implementations/abstractions
Hardware accelerators, GPUs
CNNs & their implementation
RNNs & their implementation
Transformers & implementation
Distributed training (brief)
GANs & implementation
Model Deployment

Wow... Sounds like a very implementation heavy class, probably much more so than intro to DL. I'm really excited for this one.

11-868 LLM sys

This one seems to be more focused on LLMs (well, that's obvious from the title).

GPU programming, autograd, DL framework design
Transformers, Tokenization, LLaMA, GPT3
Acceleration of transformers on GPUs
Distributed training
Serving (Triton, LightLLM)
Quantization & compression (GPTQ), LoRA & QLoRA
ZeRO
Ocra
PagedAttention
Jax
MoE, FlashAttention
Speculative decoding, RAG

That pretty much sounds like the popular recent research that I've heard of which have also been widely implemented. I think I've been trying to jump into these without going through all the basics first. Explains why I'm not having a great time...

11-777 MMML

"Multimodal ML". This shud act as a good segue into other kinds of models than just LMs. I really want to look at diffusion models, but it's impractical for me at this moment to jump unto those. I'm not going to talk about the material, because the topics are pretty much alien to me (feel free to have a look yourself!).

10-625 Convex Optimization

This is a weird one. I knew about this class cuz a friend had some trouble getting in, and we talked about it at the time (it was quite an amusing chain of events). At a glance, it looked like this would be an important class for learning more about training. I asked him whether it's worth it, and he said:
interesting convo
He's a ML guy and I don't know much about this, so I'll have to take his word on that. But it doesn't hurt to learn, right? So I might do this one too.

Moving Forward

This is probably going to take a lot of time, with an expected 10-20hr/week workload @ 4 months per course. Time which I don't think I have. But,, if I don't do all this, I'll probably never make it, so it doesn't hurt to try. It does seem like the first class (AI) may be skippable, but I'll do it anyway (and it does seem to be the easiest of the bunch). I won't start right away, but probably in a week or two from now when I hopefully will have time. I plan on posting my progress on this site approximately weekly, with detailed reports on what I learned/found interesting. I'll hopefully no longer be an imposter in the ML community by the end of this.

Also, if you have any comments or suggestions, feel free to DM me on twitter!

Other great classes & resources

Here's some great ones:

6.5940: TinyML and Efficient Deep Learning Computing (suggested by @base9dev)

Re-learning AI/ML from scratch20 Apr 2024