RL course experiments

Overview

This repository provides code implementations for popular Reinforcement Learning algorithms.

Main idea was to generalise main RL algorithms and provide unified interface for testing them on any gym environment. For example, now your can create your own Double Dueling Deep Recurrent Q-Learning agent (Let's name it, 3DRQ). For simplicity, all main agent blocks are in agents folder.

For now, repository is under after-course refactoring. So, many documentation needed.

All code is written in Python 3 and uses RL environments from OpenAI Gym. Advanced techniques use Tensorflow for neural network implementations.

Inspired by:

Additional thanks to JustHeuristic for Practical_RL course

PYTHONPATH=. python DQN/run_dqn.py --plot_history --env CartPole-v0 \
--feature_network linear --layers 128-128 --hidden_size 64 \
--n_epochs 1000 --n_games 4 --batch_size 128 --t_max 500 --episode_limit 500 \
--replay_buffer simple --replay_buffer_size 2000 \
--qvalue_lr 0.0001 --feature_lr 0.0001 --value_lr 0.0001 \
--initial_epsilon 0.8 --final_epsilon 0.1 \
--gpu_option 0.25 \
--api_key <paste_your_gym_api_key_here>

Reinforce:

PYTHONPATH=. python PG/run_reinforce.py --plot_history --env CartPole-v0 \ 
--feature_network linear --layers 128-128 --hidden_size 64 \ 
--n_epochs 10000 --n_games 1 --batch_size 1 --t_max 500 --episode_limit 500 \
--entropy_factor 0.005 --policy_lr 0.0000001 --feature_lr 0.0000001 --grad_clip 10.0 \ 
 --gpu_option 0.25 --time_major \
--api_key <paste_your_gym_api_key_here>

Feed-Forward Asynchronous Advantage Actor-Critic:

PYTHONPATH=. python A3C/run_a3c.py --plot_history --env CartPole-v0 \
--feature_network linear --layers 128-128 --hidden_size 64 \  
--n_epochs 500 --n_games 1 --batch_size 1 --t_max 100 --episode_limit 500 \
--entropy_factor 0.005 --policy_lr 0.00001 --feature_lr 0.00001 --value_lr 0.00001 --grad_clip 10.0 \
--gpu_option 0.25 --time_major \
--api_key <paste_your_gym_api_key_here>

If agent start to play well, you can always stop training by Ctrl+C hotkey. If something go wrong, you can always evaluate agent thought magic --load --n_epochs 0 combination.

Metrics

loss - typical neural network loss
reward - typical environment reward, but because Environment Pool is always used not very informative for now
steps - mean number of game ends per epoch session

If you have linux with NVIDIA GPU and no X server, but want to try gym

You need to reinstall NVIDIA drivers.

issue source how-to guide

and add bash xvfb start; DISPLAY=:1 before run command.

Contributing

write code

Found a bug or know how to write it simpler? Or maybe you want to create your own agent? Just follow PEP8 and make merge request.

...or play a game

We have a lot of RL algorithms, and even more gym environments to test them. So, play a game, save

agent parameters (so anyone can reproduce)
agent itself (model.ckpt*)
plots (they will be automatically generated with --plot_history flag)
gym-link (main results)
make merge request (solutions should be at field/solutions.md, for example DQN/solutions.md)

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
A3C		A3C
CEM		CEM
DP		DP
DQN		DQN
FA		FA
GEN		GEN
MC		MC
PG		PG
TD		TD
agents		agents
common		common
wrappers		wrappers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment_corners.ipynb		environment_corners.ipynb
jedi_upload.py		jedi_upload.py
requirements.txt		requirements.txt
xvfb		xvfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL course experiments

Overview

Inspired by:

Additional thanks to JustHeuristic for Practical_RL course

Table of Contents

Special requirements

Example usage

Metrics

If you have linux with NVIDIA GPU and no X server, but want to try gym

Contributing

write code

...or play a game

About

Releases

Packages

Languages

License

Scitator/rl-course-experiments

Folders and files

Latest commit

History

Repository files navigation

RL course experiments

Overview

Inspired by:

Additional thanks to JustHeuristic for Practical_RL course

Table of Contents

Special requirements

Example usage

Metrics

If you have linux with NVIDIA GPU and no X server, but want to try gym

Contributing

write code

...or play a game

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages