Companion interactive apps & notebooks for Complete Reinforcement Learning Journey: From Basics to RLHF
Donโt just read about algorithms โ watch them think.
Each chapter has two companions:
| Chapter | Web App | Colab Notebook | Concepts |
|---|---|---|---|
| Ch 2: MDPs | โถ MDP Explorer | States, Actions, Rewards, Transitions, Deterministic vs Stochastic | |
| Ch 3: DP | โถ Policy Iteration | Policy Evaluation, Policy Iteration, Value Iteration, Convergence | |
| Ch 4 | coming soon | coming soon | Monte Carlo Methods |
| Ch 5 | coming soon | coming soon | TD Learning, SARSA, Q-Learning |
| Ch 6 | coming soon | coming soon | Deep RL, DQN |
| Ch 7 | coming soon | coming soon | Policy Gradients, RLHF |
The notebooks can be run in three ways โ pick whichever you prefer:
| Platform | How to Open | Setup Required |
|---|---|---|
| Google Colab | Click the โOpen in Colabโ badge above โ runs instantly in your browser | None โ everything pre-installed |
| VS Code | Clone this repo โ open .ipynb file โ select Python kernel โ run cells |
Install Python + libraries (see below) |
| Jupyter Notebook / JupyterLab | Clone this repo โ jupyter notebook โ open .ipynb file |
Install Python + libraries (see below) |
git clone https://github.com/mlnjsh/rl-book-labs.git
cd rl-book-labs
pip install gymnasium numpy matplotlib seaborn pandas
.ipynb file in VS Code โ click โSelect Kernelโ โ choose your Python environment โ run cells with Shift+Enterjupyter notebook
.ipynb file and open itBuild 7 MDP environments from scratch in Python:
| # | Environment | States | Actions | Key Lesson |
|---|---|---|---|---|
| 1 | GridWorld 5ร5 | 22 cells | โโโโ | Walls, goal, pit, spatial navigation |
| 2 | FrozenLake 4ร4 | 16 cells | โโโโ | Slippery ice, holes |
| 3 | Traffic Light | 6 states | keep/switch | Real-world control |
| 4 | Thermostat | 3 states | heat/cool/off | Energy vs comfort tradeoff |
| 5 | Contextual Bandit | 3 contexts | machine A/B/C | Context-dependent rewards |
| 6 | Inventory Management | 5 levels | order 0/1/2 | Supply chain, stockouts |
| 7 | Robot Rooms | 4 rooms | go/stay | Locked doors, path planning |
What youโll do: Inspect transition tables, compute Q-values, visualize value heatmaps, compare deterministic vs stochastic, experiment with ฮณ.
Implement 3 core DP algorithms and run them on all 7 environments:
| Algorithm | What It Does |
|---|---|
| Policy Evaluation | Compute V^ฯ sweep by sweep โ watch values converge |
| Policy Iteration | Evaluate โ Improve loop until ฯ* found |
| Value Iteration | Single Bellman max update โ faster convergence |
What youโll do: Animated convergence plots, PI vs VI comparison table, ฮณ effect on convergence speed, stochastic policy comparison (5 slip values side by side).
gymnasium - RL environments (FrozenLake, etc.)
numpy - numerical computation
matplotlib - plotting and visualization
seaborn - heatmaps for value functions
pandas - data tables
Quick install:
pip install gymnasium numpy matplotlib seaborn pandas
๐ก Google Colab users: All libraries are pre-installed. Just click โOpen in Colabโ and run โ no setup needed!
๐ก VS Code / Jupyter users: Run the install command above once, then youโre good to go.
Three modes: ๐ Explore (click cells โ see S,A,R,P), ฯ Policy (click to change arrows), V Value (heatmap). Features: deterministic/stochastic toggle, editable grid, robot episodes, Q-value inspector.
Step through PI on FrozenLake: โ One Eval Sweep (cells light up blue), โก Improve (arrows flash green), ๐ค Run Robot (animated walk). Speed control, ฮณ and slip sliders.
rl-book-labs/
โโโ README.md
โโโ Ch2_MDP_Environments_Lab.ipynb # ๐ Notebook: 7 MDP environments
โโโ Ch3_Dynamic_Programming_Lab.ipynb # ๐ Notebook: PI, VI, convergence
โโโ ch2/
โ โโโ index.html # ๐ Web App: MDP Explorer
โโโ ch3/
โ โโโ index.html # ๐ Web App: Policy Iteration
โโโ ch4/ # (coming soon)
โโโ ch5/ # (coming soon)
Complete Reinforcement Learning Journey: From Basics to RLHF
The only book that takes you from โWhat is a Markov Decision Process?โ all the way to โHow do we align language models with human values?โ โ with intuition, math, code, and interactive labs at every step.
Found a bug? Have an idea for a new visualization? Contributions welcome!
git checkout -b feature/new-lab)MIT License โ free to use, modify, and distribute.
Built with โค๏ธ as a companion to the book.
"The best way to learn an algorithm is to watch it think."