rl-book-labs

๐ŸŽฎ RL Book Interactive Labs

Companion interactive apps & notebooks for Complete Reinforcement Learning Journey: From Basics to RLHF

Donโ€™t just read about algorithms โ€” watch them think.

License: MIT GitHub Pages


๐Ÿงช What Is This?

Each chapter has two companions:


๐Ÿ“š Available Labs

Chapter Web App Colab Notebook Concepts
Ch 2: MDPs โ–ถ MDP Explorer Open In Colab States, Actions, Rewards, Transitions, Deterministic vs Stochastic
Ch 3: DP โ–ถ Policy Iteration Open In Colab Policy Evaluation, Policy Iteration, Value Iteration, Convergence
Ch 4 coming soon coming soon Monte Carlo Methods
Ch 5 coming soon coming soon TD Learning, SARSA, Q-Learning
Ch 6 coming soon coming soon Deep RL, DQN
Ch 7 coming soon coming soon Policy Gradients, RLHF

๐Ÿ““ Colab Notebooks

๐Ÿ’ป How to Run the Notebooks

The notebooks can be run in three ways โ€” pick whichever you prefer:

Platform How to Open Setup Required
Google Colab Click the โ€œOpen in Colabโ€ badge above โ€” runs instantly in your browser None โ€” everything pre-installed
VS Code Clone this repo โ†’ open .ipynb file โ†’ select Python kernel โ†’ run cells Install Python + libraries (see below)
Jupyter Notebook / JupyterLab Clone this repo โ†’ jupyter notebook โ†’ open .ipynb file Install Python + libraries (see below)

Running in VS Code

  1. Install the Jupyter extension for VS Code
  2. Clone this repo:
    git clone https://github.com/mlnjsh/rl-book-labs.git
    cd rl-book-labs
    
  3. Install dependencies:
    pip install gymnasium numpy matplotlib seaborn pandas
    
  4. Open any .ipynb file in VS Code โ†’ click โ€œSelect Kernelโ€ โ†’ choose your Python environment โ†’ run cells with Shift+Enter

Running in Jupyter Notebook

  1. Clone and install (same as above)
  2. Launch Jupyter:
    jupyter notebook
    
  3. Navigate to the .ipynb file and open it

Ch2: MDP Environments Lab

Open In Colab

Build 7 MDP environments from scratch in Python:

# Environment States Actions Key Lesson
1 GridWorld 5ร—5 22 cells โ†โ†“โ†’โ†‘ Walls, goal, pit, spatial navigation
2 FrozenLake 4ร—4 16 cells โ†โ†“โ†’โ†‘ Slippery ice, holes
3 Traffic Light 6 states keep/switch Real-world control
4 Thermostat 3 states heat/cool/off Energy vs comfort tradeoff
5 Contextual Bandit 3 contexts machine A/B/C Context-dependent rewards
6 Inventory Management 5 levels order 0/1/2 Supply chain, stockouts
7 Robot Rooms 4 rooms go/stay Locked doors, path planning

What youโ€™ll do: Inspect transition tables, compute Q-values, visualize value heatmaps, compare deterministic vs stochastic, experiment with ฮณ.

Ch3: Dynamic Programming Lab

Open In Colab

Implement 3 core DP algorithms and run them on all 7 environments:

Algorithm What It Does
Policy Evaluation Compute V^ฯ€ sweep by sweep โ€” watch values converge
Policy Iteration Evaluate โ†’ Improve loop until ฯ€* found
Value Iteration Single Bellman max update โ€” faster convergence

What youโ€™ll do: Animated convergence plots, PI vs VI comparison table, ฮณ effect on convergence speed, stochastic policy comparison (5 slip values side by side).


๐Ÿ“ฆ Required Libraries

gymnasium    - RL environments (FrozenLake, etc.)
numpy        - numerical computation
matplotlib   - plotting and visualization
seaborn      - heatmaps for value functions
pandas       - data tables

Quick install:

pip install gymnasium numpy matplotlib seaborn pandas

๐Ÿ’ก Google Colab users: All libraries are pre-installed. Just click โ€œOpen in Colabโ€ and run โ€” no setup needed!

๐Ÿ’ก VS Code / Jupyter users: Run the install command above once, then youโ€™re good to go.


๐ŸŒ Interactive Web Apps

Ch2: MDP Explorer

โ–ถ Launch App

Three modes: ๐Ÿ” Explore (click cells โ†’ see S,A,R,P), ฯ€ Policy (click to change arrows), V Value (heatmap). Features: deterministic/stochastic toggle, editable grid, robot episodes, Q-value inspector.

Ch3: Policy Iteration Visualizer

โ–ถ Launch App

Step through PI on FrozenLake: โ‘  One Eval Sweep (cells light up blue), โ‘ก Improve (arrows flash green), ๐Ÿค– Run Robot (animated walk). Speed control, ฮณ and slip sliders.


๐Ÿ—๏ธ Project Structure

rl-book-labs/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ Ch2_MDP_Environments_Lab.ipynb      # ๐Ÿ““ Notebook: 7 MDP environments
โ”œโ”€โ”€ Ch3_Dynamic_Programming_Lab.ipynb   # ๐Ÿ““ Notebook: PI, VI, convergence
โ”œโ”€โ”€ ch2/
โ”‚   โ””โ”€โ”€ index.html                      # ๐ŸŒ Web App: MDP Explorer
โ”œโ”€โ”€ ch3/
โ”‚   โ””โ”€โ”€ index.html                      # ๐ŸŒ Web App: Policy Iteration
โ”œโ”€โ”€ ch4/                                # (coming soon)
โ””โ”€โ”€ ch5/                                # (coming soon)

๐ŸŽ“ About the Book

Complete Reinforcement Learning Journey: From Basics to RLHF

The only book that takes you from โ€œWhat is a Markov Decision Process?โ€ all the way to โ€œHow do we align language models with human values?โ€ โ€” with intuition, math, code, and interactive labs at every step.

Key Features


๐Ÿค Contributing

Found a bug? Have an idea for a new visualization? Contributions welcome!

  1. Fork the repo
  2. Create a branch (git checkout -b feature/new-lab)
  3. Commit your changes
  4. Open a Pull Request

๐Ÿ“„ License

MIT License โ€” free to use, modify, and distribute.


Built with โค๏ธ as a companion to the book.
"The best way to learn an algorithm is to watch it think."