Deep-Q-Networks-for-Pac-Man

EC418 Final Project: DQN for Pac-Man using tabular Q learning target method along with a ResNet

Why This Project Matters

This project goes beyond achieving performance improvements in a reinforcement learning benchmark. Its primary contribution is demonstrating how reproducible experimentation, systematic ablation, and curriculum design can dramatically affect stability and learning outcomes in deep RL systems. Rather than taking a results oriented approach this proves a strong process oriented understanding.

Rather than treating training instability as a hyperparameter issue, this work investigates why and how different architectural and algorithmic choices succeed or fail under sparse rewards and complex dynamics.

The methods used in this project generalize to real-world ML systems where instability, overfitting, and misleading evaluation metrics can derail otherwise promising approaches.

Default:

Simple DQN utilizing a single CNN

FINAL RESULTS: 2/10 wins (20.0%) Layout: spiral_harder Model device: cpu

ResNet:

Allows us to have a deeper model that trains more efficiently; Essentially having less parameters. This architecture skip happens when number of channels or resolution changes. This promotes better feature reuse, allows learning of "deeper" sentimental features in gameplay thus improving decision making.

FINAL RESULTS: Wins: 1/10 (10.0%) #simplified resnet Layout: spiral_harder Model device: cuda

Pre-trained ResNet:

Utilizes tabular q learning as a target to initialize weights in ResNet. Allowing for better convergence. Additional, learning rate adapted with a cosine scheduler. python train_dqn_with_tabular.py --layout classic --episodes 25 --epochs 10 --batch_size 64

FINAL RESULTS: Wins: 3/10 (30.0%) Layout: spiral_harder Model device: cuda

Multi-layout pre-trained ResNet:

Original version used trained only on a classic layout. Training over a randomized selection of layouts will increase robustness in model initialization and prevent overfitting, thus improving generalized ResNet DQN performance. python train_dqn_with_tabular.py --episodes 25 --epochs 10 --batch_size 64

FINAL RESULTS: Wins: 0/10 (0.0%) Layout: spiral_harder Model device: cuda

Staged Multi-layout:

Pretraining with classic gave better results but I belive increasing robustness by staging a brief multi-layout training staged after will give better results. Non-staged multi-layout training seemed to be too broad as on average pacman would live for many more steps but not win. python train_dqn_tabular_staged.py --fast
To make sure we have a secure baseline I ran

python train_dqn_with_tabular_s1.py --layout classic --episodes 25 --epochs 10 --batch_size 64

FINAL RESULTS: Wins: 66/100 (66.0%) Layout: spiral_harder Model device: cuda

Now for staged

python train_dqn_tabular_staged.py --fast

FINAL RESULTS: Wins: 58/100 (58.0%) Layout: spiral_harder Model device: cuda

Best overall performance (win%)

layout: classic, spiral, spiral_harder, empty, staged: ,,58, classic: spriral: spiral_h: empty:

Prioritzed Replay Memory

For fast paced progress we test on spiral_harder as its quick to train. Once better results are reached we fine tune on classic !python spiral_curr_train.py --episodes 50 --epochs 10 --batch_size 32

FINAL RESULTS: Wins: 20/50 (40.0%) Layout: spiral_harder Model device: cuda

Double DQN

To solve for overestimation of Q values and increase wins by not making model overconfident

FINAL RESULTS: Wins: 27/50 (54.0%) Layout: spiral_harder Model device: cuda

Distributional Q learning

Update theta to match distribution of actual play

FINAL RESULTS: Wins: 0/50 (0.0%) Layout: spiral_harder Model device: cuda Expand distributional hyperparameters

Noisy Net Small e-Greedy

FINAL RESULTS: Wins: 59/100 (59.0%) Layout: spiral_harder Model device: cuda

Remove e-Greedy

FINAL RESULTS: Wins: 9/100 (9.0%) Layout: spiral_harder Model device: cuda

Noisey net small e-Greedy (patched)

FINAL RESULTS: Wins: 32/100 (32.0%) Layout: spiral_harder Model device: cuda

I assume model must overfit and we still have small memory_cap

Increase memory cap drop 1 Resnet block

drop a resnet and boost memory cap from 20000 to 200000.

FINAL RESULTS: Wins: 17/100 (17.0%) Layout: spiral_harder Model device: cuda

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classic_curr_train.py		classic_curr_train.py
dqn_agent.py		dqn_agent.py
empty_curr_train.py		empty_curr_train.py
pacman_dqn_classic.pt		pacman_dqn_classic.pt
pacman_dqn_empty.pt		pacman_dqn_empty.pt
pacman_dqn_spiral.pt		pacman_dqn_spiral.pt
pacman_dqn_spiral_harder.pt		pacman_dqn_spiral_harder.pt
pacman_env.py		pacman_env.py
play_cv.py		play_cv.py
play_pacman.py		play_pacman.py
pretrained_dqn_spiral_harder.pth		pretrained_dqn_spiral_harder.pth
requirements.txt		requirements.txt
spiral_curr_train.py		spiral_curr_train.py
tabular_q_learning.py		tabular_q_learning.py
tabular_q_spiral.pkl		tabular_q_spiral.pkl
train.py		train.py
train_classic.py		train_classic.py
train_dqn_tabular_staged.py		train_dqn_tabular_staged.py
train_dqn_with_tabular.py		train_dqn_with_tabular.py
train_dqn_with_tabular_s1.py		train_dqn_with_tabular_s1.py
train_dqn_with_tabular_s2.py		train_dqn_with_tabular_s2.py
train_empty.py		train_empty.py
train_spiral.py		train_spiral.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep-Q-Networks-for-Pac-Man

Why This Project Matters

Default:

ResNet:

Pre-trained ResNet:

Multi-layout pre-trained ResNet:

Staged Multi-layout:

Best overall performance (win%)

Prioritzed Replay Memory

Double DQN

Distributional Q learning

Noisy Net Small e-Greedy

Remove e-Greedy

Noisey net small e-Greedy (patched)

Increase memory cap drop 1 Resnet block

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep-Q-Networks-for-Pac-Man

Why This Project Matters

Default:

ResNet:

Pre-trained ResNet:

Multi-layout pre-trained ResNet:

Staged Multi-layout:

Best overall performance (win%)

Prioritzed Replay Memory

Double DQN

Distributional Q learning

Noisy Net Small e-Greedy

Remove e-Greedy

Noisey net small e-Greedy (patched)

Increase memory cap drop 1 Resnet block

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages