Rainy 0.8.0

Description:

rainy 0.8.0

Rainy

Reinforcement learning utilities and algrithm implementations using PyTorch.
Example
Rainy has a main decorator which converts a function that returns rainy.Config
to a CLI app.
All function arguments are re-interpreted as command line arguments.
import os

from torch.optim import RMSprop

import rainy
from rainy import Config, net
from rainy.agents import DQNAgent
from rainy.envs import Atari
from rainy.lib.explore import EpsGreedy, LinearCooler

@rainy.main(DQNAgent, script_path=os.path.realpath(__file__))
def main(
envname: str = "Breakout",
max_steps: int = int(2e7),
replay_size: int = int(1e6),
replay_batch_size: int = 32,
) -> Config:
c = Config()
c.set_env(lambda: Atari(envname))
c.set_optimizer(
lambda params: RMSprop(params, lr=0.00025, alpha=0.95, eps=0.01, centered=True)
)
c.set_explorer(lambda: EpsGreedy(1.0, LinearCooler(1.0, 0.1, int(1e6))))
c.set_net_fn("dqn", net.value.dqn_conv())
c.replay_size = replay_size
c.replay_batch_size = replay_batch_size
c.train_start = 50000
c.sync_freq = 10000
c.max_steps = max_steps
c.eval_env = Atari(envname)
c.eval_freq = None
return c

if __name__ == "__main__":
main()

Then you can use this script like
python dqn.py --replay-batch-size=64 train --eval-render

See examples directory for more.
API documentation
COMING SOON
Supported python version
Python >= 3.6.1
Implementation Status

Algorithm
Multi Worker(Sync)
Recurrent
Discrete Action
Continuous Action
MPI support

DQN/Double DQN
:heavy_check_mark:
:x:
:heavy_check_mark:
:x:
:x:

BootDQN/RPF
:x:
:x:
:heavy_check_mark:
:x:
:x:

DDPG
:heavy_check_mark:
:x:
:x:
:heavy_check_mark:
:x:

TD3
:heavy_check_mark:
:x:
:x:
:heavy_check_mark:
:x:

SAC
:heavy_check_mark:
:x:
:x:
:heavy_check_mark:
:x:

PPO
:heavy_check_mark:
:heavy_check_mark:
:heavy_check_mark:
:heavy_check_mark:
:heavy_check_mark:

A2C
:heavy_check_mark:
:small_red_triangle:(1)
:heavy_check_mark:
:heavy_check_mark:
:x:

ACKTR
:heavy_check_mark:
:x:(2)
:heavy_check_mark:
:heavy_check_mark:
:x:

AOC
:heavy_check_mark:
:x:
:heavy_check_mark:
:heavy_check_mark:
:x:

PPOC
:heavy_check_mark:
:x:
:heavy_check_mark:
:heavy_check_mark:
:x:

ACTC(3)
:heavy_check_mark:
:x:
:heavy_check_mark:
:heavy_check_mark:
:x:

(1): Very unstable
(2): Needs https://openreview.net/forum?id=HyMTkQZAb implemented
(3): Incomplete implementation. β is often too high.
Sub packages

intrinsic-rewards

Contains an implementation of RND(Random Network Distillation)

References
DQN (Deep Q Network)

https://www.nature.com/articles/nature14236/

DDQN (Double DQN)

https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389

Bootstrapped DQN

https://arxiv.org/abs/1602.04621

RPF(Randomized Prior Functions)

https://arxiv.org/abs/1806.03335

DDPQ(Deep Deterministic Policy Gradient)

https://arxiv.org/abs/1509.02971

TD3(Twin Delayed Deep Deterministic Policy Gradient)

https://arxiv.org/abs/1802.09477

SAC(Soft Actor Critic)

https://arxiv.org/abs/1812.05905

A2C (Advantage Actor Critic)

http://proceedings.mlr.press/v48/mniha16.pdf , https://arxiv.org/abs/1602.01783 (A3C, original version)
https://blog.openai.com/baselines-acktr-a2c/ (A2C, synchronized version)

ACKTR (Actor Critic using Kronecker-Factored Trust Region)

https://papers.nips.cc/paper/7112-scalable-trust-region-method-for-deep-reinforcement-learning-using-kronecker-factored-approximation

PPO (Proximal Policy Optimization)

https://arxiv.org/abs/1707.06347

AOC (Advantage Option Critic)

https://arxiv.org/abs/1609.05140 (DQN-like option critic)
https://arxiv.org/abs/1709.04571 (A3C-like option critic called A2OC)

PPOC (Proximal Option Critic)

https://arxiv.org/abs/1712.00004

ACTC (Actor Critic Termination Critic)

http://proceedings.mlr.press/v89/harutyunyan19a.html

Implementaions I referenced
Thank you!
https://github.com/openai/baselines
https://github.com/ikostrikov/pytorch-a2c-ppo-acktr
https://github.com/ShangtongZhang/DeepRL
https://github.com/chainer/chainerrl
https://github.com/Thrandis/EKFAC-pytorch (for ACKTR)
https://github.com/jeanharb/a2oc_delib (for AOC)
https://github.com/mklissa/PPOC (for PPOC)
https://github.com/sfujim/TD3 (for DDPG and TD3)
https://github.com/vitchyr/rlkit (for SAC)
License
This project is licensed under Apache License, Version 2.0
(LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0).

Overview

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

You're allowed to use the code bits in the repositories in unlimited projects.
Attribution is not required to use the code bits.

What you can do with it

Use them freely in your personal and professional work.

What you can't do with it

Don't be greedy. Selling or distributing these repositories in their original state is prohibited.

zed

Languages

Categories

Description:

License:

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

airflow-plugins 0.1.3

airflow-fs 0.1.0

airflowdaggenerator 0.0.2

airflow-dag-artifact 0.2.1

airflow-bootstrap-utils 0.2.0

rainy 0.8.0

Languages

Categories

Description:

License:

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

airflow-plugins 0.1.3

airflow-fs 0.1.0

airflowdaggenerator 0.0.2

airflow-dag-artifact 0.2.1

airflow-bootstrap-utils 0.2.0