Skip to main content
2026
-
Apr 18, 2026
My Workflow for Understanding LLM Architectures
A learning-oriented workflow for understanding new open-weight model releases
A learning-oriented workflow for understanding new open-weight model releases
-
Apr 4, 2026
Components of A Coding Agent
How coding agents use tools, memory, and repo context to make LLMs work better in practice
How coding agents use tools, memory, and repo context to make LLMs work better in practice
-
Mar 22, 2026
A Visual Guide to Attention Variants in Modern LLMs
From MHA and GQA to MLA, sparse attention, and hybrid architectures
From MHA and GQA to MLA, sparse attention, and hybrid architectures
-
Mar 14, 2026
New LLM Architecture Gallery
I put together a new LLM Architecture Gallery that collects the architecture figures from my recent comparison articles in one place, together with compact fact sheets and links.
-
Feb 25, 2026
A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026
A Round Up And Comparison of 10 Open-Weight LLM Releases in Spring 2026
-
Feb 1, 2026
State of AI 2026 with Sebastian Raschka, Nathan Lambert, and Lex Fridman
I recently sat down with Lex Fridman and Nathan Lambert for a comprehensive 4.5 h interview to discuss the current state of progress of AI, and what the...
-
Jan 24, 2026
Categories of Inference-Time Scaling for Improved LLM Reasoning
And an Overview of Recent Inference-Scaling Papers (Including Recursive Language Models)
Inference scaling has become one of the most effective ways to improve answer quality and accuracy in deployed LLMs. The idea is straightforward. If we are...
2025
-
Dec 30, 2025
The State Of LLMs 2025: Progress, Problems, and Predictions
A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.
-
Dec 30, 2025
LLM Research Papers: The 2025 List (July to December)
A curated list of LLM research papers from July–December 2025, organized by reasoning models, inference-time scaling, architectures, training efficiency...
-
Dec 8, 2025
From Random Forests to RLVR: A Short History of ML/AI Hello Worlds
Two years ago, I posted a list of Hello World examples for machine learning and AI on social. Here, the Hello World means beginner-friendly examples to...
-
Dec 3, 2025
From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates
Understanding How DeepSeek's Flagship Open-Weight Models Evolved
Similar to DeepSeek V3, the team released their new flagship model over a major US holiday weekend. Given DeepSeek V3.2's really good performance (on GPT-5...
-
Nov 12, 2025
Recommendations for Getting the Most Out of a Technical Book
This short article compiles a few notes I previously shared when readers ask how to get the most out of my building large language model from scratch books...
-
Nov 4, 2025
Beyond Standard LLMs
Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers
After I shared my Big LLM Architecture Comparison a few months ago, which focused on the main transformer-based LLMs, I received a lot of questions with...
-
Oct 29, 2025
DGX Spark and Mac Mini for Local PyTorch Development
First Impressions and Benchmarks
The DGX Spark for local LLM inferencing and fine-tuning was a pretty popular discussion topic recently. I got to play with one myself, primarily working...
-
Oct 5, 2025
Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
-
Sep 6, 2025
Understanding and Implementing Qwen3 From Scratch
A Detailed Look at One of the Leading Open-Source LLMs
Previously, I compared the most notable open-weight architectures of 2025 in The Big LLM Architecture Comparison. Then, I zoomed in and discussed the...
-
Aug 9, 2025
From GPT-2 to gpt-oss: Analyzing the Architectural Advances
And How They Stack Up Against Qwen3
OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, thanks...
-
Jul 19, 2025
The Big LLM Architecture Comparison
From DeepSeek V3 to GLM-5: A Look At Modern LLM Architecture Design
It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 and...
-
Jul 1, 2025
LLM Research Papers: The 2025 List (January to June)
A topic-organized collection of 200+ LLM research papers from 2025
The latest in LLM research with a hand-curated, topic-organized list of over 200 research papers from 2025.
-
Jun 17, 2025
Understanding and Coding the KV Cache in LLMs from Scratch
KV caches are one of the most critical techniques for efficient inference in LLMs in production. KV caches are an important component for compute-efficient...
-
May 10, 2025
Coding LLMs from the Ground Up: A Complete Course
Why build an LLM from scratch? It's probably the best and most efficient way to learn how LLMs really work. Plus, many readers have told me they had a lot...
-
Apr 19, 2025
The State of Reinforcement Learning for LLM Reasoning
Understanding GRPO and New Insights from Reasoning Model Papers
A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to...
-
Mar 29, 2025
First Look at Reasoning From Scratch: Chapter 1
As you know, I've been writing a lot lately about the latest research on reasoning in LLMs. Before my next research-focused blog post, I wanted to offer...
-
Mar 8, 2025
Inference-Time Compute Scaling Methods to Improve Reasoning Models
Part 1: Inference-Time Compute Scaling Methods
This article explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have emerged...
-
Feb 5, 2025
Understanding Reasoning LLMs
Methods and Strategies for Building and Refining Reasoning Models
In this article, I will describe the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities. I hope this...
-
Jan 23, 2025
Noteworthy LLM Research Papers of 2024
—12 influential AI papers from January to December 2024
This article covers 12 influential AI research papers of 2024, ranging from mixture-of-experts models to new LLM scaling laws for precision.
-
Jan 17, 2025
Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch
This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, Llama...
2024
-
Dec 29, 2024
LLM Research Papers: The 2024 List
I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It's just a list, but maybe it will come...
-
Nov 3, 2024
Understanding Multimodal LLMs
An Introduction to the Main Techniques and Latest Models
There has been a lot of new research on the multimodal LLM front, including the latest Llama 3.2 vision models, which employ diverse architectural...
-
Sep 21, 2024
Building A GPT-Style LLM Classifier From Scratch
Finetuning a GPT Model for Spam Classification
This article shows you how to transform pretrained large language models (LLMs) into strong text classifiers. But why focus on classification? First...
-
Sep 1, 2024
Building LLMs from the Ground Up: A 3-hour Coding Workshop
This tutorial is aimed at coders interested in understanding the building blocks of large language models (LLMs), how LLMs work, and how to code them from...
-
Aug 17, 2024
New LLM Pre-training and Post-training Paradigms
-- A Look at How Moderns LLMs Are Trained
There are hundreds of LLM papers each month proposing new techniques and approaches. However, one of the best ways to see what actually works well in...
-
Jul 20, 2024
Instruction Pretraining LLMs
-- The Latest Research in Instruction Finetuning
This article covers a new, cost-effective method for generating data for instruction finetuning LLMs; instruction finetuning from scratch; pretraining LLMs...
-
Jun 2, 2024
Developing an LLM: Building, Training, Finetuning
A Deep Dive into the Lifecycle of LLM Development
This is an overview of the LLM development process. This one-hour talk focuses on the essential three stages of developing an LLM: coding the architecture...
-
Jun 2, 2024
LLM Research Insights: Instruction Masking and New LoRA Finetuning Experiments?
Discussing the Latest Model Releases and AI Research in May 2024
This article covers three new papers related to instruction finetuning and parameter-efficient finetuning with LoRA in large language models (LLMs). I work...
-
May 12, 2024
How Good Are the Latest Open LLMs? And Is DPO Better Than PPO?
Discussing the Latest Model Releases and AI Research in April 2024
What a month! We had four major open LLM releases: Mixtral, Meta AI's Llama 3, Microsoft's Phi-3, and Apple's OpenELM. In my new article, I review and...
-
Apr 20, 2024
Using and Finetuning Pretrained Transformers
What are the different ways to use and finetune pretrained large language models (LLMs)? The three most common ways to use and finetune pretrained LLMs...
-
Mar 31, 2024
Tips for LLM Pretraining and Evaluating Reward Models
Research Papers in March 2024
It's another month in AI research, and it's hard to pick favorites. This month, I am going over a paper that discusses strategies for the continued...
-
Mar 3, 2024
Research Papers in February 2024
— A LoRA Successor, Small Finetuned LLMs Vs Generalist LLMs, and Transparent LLM Research
Once again, this has been an exciting month in AI research. This month, I'm covering two new openly available LLMs, insights into small finetuned LLMs, and...
-
Feb 18, 2024
Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch
Low-rank adaptation (LoRA) is a machine learning technique that modifies a pretrained model (for example, an LLM or vision transformer) to better suit a...
2023
-
Sep 15, 2023
Optimizing LLMs From a Dataset Perspective
This article focuses on improving the modeling performance of LLMs by finetuning them using carefully curated datasets. Specifically, this article...
-
Aug 10, 2023
The NeurIPS 2023 LLM Efficiency Challenge Starter Guide
Large language models (LLMs) offer one of the most interesting opportunities for developing more efficient training methods. A few weeks ago, the NeurIPS...
-
Jul 1, 2023
Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch
Peak memory consumption is a common bottleneck when training deep learning models such as vision transformers and LLMs. This article provides a series of...
-
Jun 14, 2023
Finetuning Falcon LLMs More Efficiently With LoRA and Adapters
Finetuning allows us to adapt pretrained LLMs in a cost-efficient manner. But which method should we use? This article compares different...
-
May 11, 2023
Accelerating Large Language Models with Mixed-Precision Techniques
Training and using large language models (LLMs) is expensive due to their large compute requirements and memory footprints. This article will explore how...
-
Apr 26, 2023
Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)
Pretrained large language models are often referred to as foundation models for a good reason: they perform well on various tasks, and we can use them as a...
-
Apr 12, 2023
Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to LLaMA-Adapters
In the rapidly evolving field of artificial intelligence, utilizing large language models in an efficient and effective manner has become increasingly...
-
Mar 28, 2023
Finetuning Large Language Models On A Single GPU Using Gradient Accumulation
Previously, I shared an article using multi-GPU training strategies to speed up the finetuning of large language models. Several of these strategies include...
-
Mar 23, 2023
Keeping Up With AI Research And News
When it comes to productivity workflows, there are a lot of things I'd love to share. However, the one topic many people ask me about is how I keep up with...
-
Feb 23, 2023
Some Techniques To Make Your PyTorch Models Train (Much) Faster
This blog post outlines techniques for improving the training performance of your PyTorch model without compromising its accuracy. To do so, we will wrap a...
-
Feb 9, 2023
Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch
In this article, we are going to understand how self-attention works from scratch. This means we will code it ourselves one step at a time. Since its...
-
Feb 7, 2023
Understanding Large Language Models -- A Transformative Reading List
Since transformers have such a big impact on everyone's research agenda, I wanted to flesh out a short reading list for machine learning researchers and...
-
Feb 1, 2023
What Are the Different Approaches for Detecting Content Generated by LLMs Such As ChatGPT? And How Do They Work and Differ?
Since the release of the AI Classifier by OpenAI made big waves yesterday, I wanted to share a few details about the different approaches for detecting...
-
Jan 29, 2023
Comparing Different Automatic Image Augmentation Methods in PyTorch
Data augmentation is a key tool in reducing overfitting, whether it's for images or text. This article compares three Auto Image Data Augmentation...
-
Jan 16, 2023
Curated Resources and Trustworthy Experts: The Key Ingredients for Finding Accurate Answers to Technical Questions in the Future
Conversational chat bots such as ChatGPT probably will not be able replace traditional search engines and expert knowledge anytime soon. With the vast...
-
Jan 15, 2023
Training an XGBoost Classifier Using Cloud GPUs Without Worrying About Infrastructure
Imagine you want to quickly train a few machine learning or deep learning models on the cloud but don't want to deal with cloud infrastructure. This short...
-
Jan 5, 2023
Open Source Highlights 2022 for Machine Learning & AI
Recently, I shared the top 10 papers that I read in 2022. As a follow-up, I am compiling a list of my favorite 10 open-source releases that I discovered...
-
Jan 3, 2023
Influential Machine Learning Papers Of 2022
Every day brings something new and exciting to the world of machine learning and AI, from the latest developments and breakthroughs in the field to emerging...
2022
-
Oct 15, 2022
Ahead Of AI, And What's Next?
About monthly machine learning musings, and other things I am currently workin on ...
-
Jul 24, 2022
A Short Chronology Of Deep Learning For Tabular Data
Occasionally, I share research papers proposing new deep learning approaches for tabular data on social media, which is typically an excellent discussion...
-
Jul 5, 2022
No, We Don't Have to Choose Batch Sizes As Powers Of 2
Regarding neural network training, I think we are all guilty of doing this: we choose our batch sizes as powers of 2, that is, 64, 128, 256, 512, 1024, and...
-
Jun 30, 2022
Sharing Deep Learning Research Models with Lightning Part 2: Leveraging the Cloud
In this article, we will take deploy a Super Resolution App on the cloud using lightning.ai. The primary goal here is to see how easy it is to create and...
-
Jun 17, 2022
Sharing Deep Learning Research Models with Lightning Part 1: Building A Super Resolution App
In this post, we will build a Lightning App. Why? Because it is 2022, and it is time to explore a more modern take on interacting with, presenting, and...
-
Jun 12, 2022
Taking Datasets, DataLoaders, and PyTorch’s New DataPipes for a Spin
The PyTorch team recently announced TorchData, a prototype library focused on implementing composable and reusable data loading utilities for PyTorch. In...
-
May 18, 2022
Running PyTorch on the M1 GPU
Today, PyTorch officially introduced GPU support for Apple's ARM M1 chips. This is an exciting day for Mac users out there, so I spent a few minutes trying...
-
Apr 25, 2022
Creating Confidence Intervals for Machine Learning Classifiers
Developing good predictive models hinges upon accurate performance evaluation and comparisons. However, when evaluating machine learning models, we...
-
Apr 4, 2022
Losses Learned
-- Optimizing Negative Log-Likelihood and Cross-Entropy in PyTorch (Part 1)
The cross-entropy loss is our go-to loss for training deep learning-based classifiers. In this article, I am giving you a quick tour of how we usually...
-
Mar 24, 2022
TorchMetrics
-- How do we use it, and what's the difference between .update() and .forward()?
TorchMetrics is a really nice and convenient library that lets us compute the performance of models in an iterative fashion. It's designed with PyTorch (and...
-
Feb 25, 2022
Machine Learning with PyTorch and Scikit-Learn
-- The *new* Python Machine Learning Book
Machine Learning with PyTorch and Scikit-Learn has been a long time in the making, and I am excited to finally get to talk about the release of my new book...
2021
-
Dec 29, 2021
Introduction to Machine Learning
-- Video Lectures about Python Basics, Tree-based Methods, Model Evaluation, and Feature Selection
About half a year ago, I organized all my deep learning-related videos in a handy blog post to have everything in one place. Since many people liked this...
-
Jul 9, 2021
Introduction to Deep Learning
-- 170 Video Lectures from Adaptive Linear Neurons to Zero-shot Classification with Transformers
I just sat down this morning and organized all deep learning related videos I recorded in 2021. I am sure this will be a useful reference for my future...
-
Feb 11, 2021
Datasets for Machine Learning and Deep Learning
-- Some of the Best Places to Explore
With the semester being in full swing, I recently shared this set of dataset repositories with my deep learning class. However, I thought that beyond using...
-
Jan 21, 2021
Book Review: Deep Learning With PyTorch
-- A Practical Deep Learning Guide With a Computer Vision Focus and an Interesting Structure
After its release in August 2020, Deep Learning with PyTorch has been sitting on my shelf before I finally got a chance to read it during this winter break...
-
Jan 3, 2021
How I Keep My Projects Organized
Since I started my undergraduate studies in 2008, I have been obsessed with productivity tips, notetaking solutions, and todo-list management. Over the...
2020
2019
2018
2016
-
Oct 2, 2016
Model evaluation, model selection, and algorithm selection in machine learning
Part III - Cross-validation and hyperparameter tuning
Almost every machine learning algorithm comes with a large number of settings that we, the machine learning researchers and practitioners, need to specify...
-
Aug 13, 2016
Model evaluation, model selection, and algorithm selection in machine learning
Part II - Bootstrapping and uncertainties
In this second part of this series, we will look at some advanced techniques for model evaluation and techniques to estimate the uncertainty of our...
-
Jun 11, 2016
Model evaluation, model selection, and algorithm selection in machine learning
Part I - The basics
Machine learning has become a central part of our life -- as consumers, customers, and hopefully as researchers and practitioners! Whether we are applying...
2015
-
Sep 24, 2015
Writing 'Python Machine Learning'
– A Reflection on a Journey
It's been about time. I am happy to announce that "Python Machine Learning" was finally released today! Sure, I could just send an email around to all the...
-
Aug 24, 2015
Python, Machine Learning, and Language Wars
– A Highly Subjective Point of View
This has really been quite a journey for me lately. And regarding the frequently asked question “Why did you choose Python for Machine Learning?” I guess it...
-
Mar 24, 2015
Single-Layer Neural Networks and Gradient Descent
This article offers a brief glimpse of the history and basic concepts of machine learning. We will take a look at the first algorithmically described neural...
-
Jan 27, 2015
Principal Component Analysis
in 3 Simple Steps
Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as stock...
-
Jan 11, 2015
Implementing a Weighted Majority Rule Ensemble Classifier
in scikit-learn
Here, I want to present a simple and conservative approach of implementing a weighted majority rule ensemble classifier in scikit-learn that yielded...
2014
-
Dec 5, 2014
MusicMood
– A Machine Learning Model for Classifying Music by Mood Based on Song Lyrics
In this article, I want to share my experience with a recent data mining project which probably was one of my most favorite hobby projects so far. It's all...
-
Nov 28, 2014
Turn Your Twitter Timeline into a Word Cloud
– using Python
Last week, I posted some visualizations in context of Happy Rock Song data mining project, and some people were curious about how I created the word clouds...
-
Oct 4, 2014
Naive Bayes and Text Classification
– Introduction and Theory
Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes’ probability theorem, are known for creating simple yet well performing...
-
Sep 14, 2014
Kernel tricks and nonlinear dimensionality reduction via RBF kernel PCA
The focus of this article is to briefly introduce the idea of kernel methods and to implement a Gaussian radius basis function (RBF) kernel that is used to...
-
Aug 25, 2014
Predictive modeling, supervised machine learning, and pattern classification
— the big picture
When I was working on my next pattern classification application, I realized that it might be worthwhile to take a step back and look at the big picture of...
-
Aug 3, 2014
Linear Discriminant Analysis
– Bit by Bit
I received a lot of positive feedback about the step-wise Principal Component Analysis (PCA) implementation. Thus, I decided to write a little follow-up...
-
Jul 19, 2014
Dixon's Q test for outlier identification
– A questionable practice
I recently faced the impossible task to identify outliers in a dataset with very, very small sample sizes and Dixon's Q test caught my attention. Honestly...
-
Jul 11, 2014
About Feature Scaling and Normalization
– and the effect of standardization for machine learning algorithms
I received a couple of questions in response to my previous article (Entry point: Data) where people asked me why I used Z-score standardization as feature...
-
Jun 27, 2014
Entry Point Data
– Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses
In this short tutorial I want to provide a short overview of some of my favorite Python tools for common procedures as entry points for general pattern...
-
Jun 26, 2014
Molecular docking, estimating free energies of binding, and AutoDock's semi-empirical force field
Discussions and questions about methods, approaches, and tools for estimating (relative) binding free energies of protein-ligand complexes are quite...
-
Jun 20, 2014
An introduction to parallel programming using Python's multiprocessing module
– using Python's multiprocessing module
The default Python interpreter was designed with simplicity in mind and has a thread-safe mechanism, the so-called "GIL" (Global Interpreter Lock). In order...
-
Jun 19, 2014
Numeric matrix manipulation
– The cheat sheet for MATLAB, Python NumPy, R, and Julia
At its core, this article is about a simple cheat sheet for basic operations on numeric matrices, which can be very useful if you working and experimenting...
-
Jun 19, 2014
Kernel density estimation via the Parzen-Rosenblatt window method
– explained using Python
The Parzen-window method (also known as Parzen-Rosenblatt window method) is a widely used non-parametric approach to estimate a probability density function...
-
Jun 1, 2014
The key differences between Python 2.7.x and Python 3.x with examples
Many beginning Python users are wondering with which version of Python they should start. My answer to this question is usually something along the lines...
-
May 28, 2014
5 simple steps for converting Markdown documents into HTML and adding Python syntax highlighting
In this little tutorial, I want to show you in 5 simple steps how easy it is to add code syntax highlighting to your blog articles.
-
May 20, 2014
Creating a table of contents with internal links in IPython Notebooks and Markdown documents
Many people have asked me how I create the table of contents with internal links for my IPython Notebooks and Markdown documents on GitHub. Well, no...
-
May 12, 2014
A Beginner's Guide to Python's Namespaces, Scope Resolution, and the LEGB Rule
A short tutorial about Python's namespaces and the scope resolution for variable names using the LEGB-rule with little quiz-like exercises.
-
Apr 21, 2014
Diving deep into Python
– the not-so-obvious language parts
Some while ago, I started to collect some of the not-so-obvious things I encountered when I was coding in Python. I thought that it was worthwhile sharing...
-
Apr 13, 2014
Implementing a Principal Component Analysis (PCA)
– in Python, step by step
In this article I want to explain how a Principal Component Analysis (PCA) works by implementing it in Python step by step. At the end we will compare the...
-
Mar 13, 2014
Installing Scientific Packages for Python3 on MacOS 10.9 Mavericks
I just went through some pain (again) when I wanted to install some of Python's scientific libraries on my second Mac. I summarized the setup and...
-
Mar 7, 2014
A thorough guide to SQLite database operations in Python
After I wrote the initial teaser article "SQLite - Working with large data sets in Python effectively" about how awesome SQLite databases are via sqlite3 in...
-
Feb 23, 2014
Using OpenEye software for substructure alignments
and best-matching low-energy conformer overlays
This is a quickguide showing how to use OpenEye software command line tools to align target molecules to a query based on substructure matches and how to...
2013