imitation_learning:PyTorch实现的一些强化学习算法:优势演员评论(A2C),近距离策略优化(PPO),V-MPO,行为克隆(BC)。 将添加更多算法-源码

上传者: 42128015 | 上传时间: 2021-02-02 16:36:47 | 文件大小: 11.42MB | 文件类型: ZIP
模仿学习 此仓库包含一些强化学习算法的简单PyTorch实现: 优势演员评论家(A2C) 的同步变体 近端策略优化(PPO)-最受欢迎的RL算法 , ,, 策略上最大后验策略优化(V-MPO)-DeepMind在其上次工作中使用的算法 (尚不起作用...) 行为克隆(BC)-一种将某些专家行为克隆到新策略中的简单技术 每种算法都支持向量/图像/字典观察空间和离散/连续动作空间。 为什么回购被称为“模仿学习”? 当我开始这个项目并进行回购时,我认为模仿学习将是我的主要重点,并且无模型方法仅在开始时用于培训“专家”。 但是,PPO实施(及其技巧)似乎比我预期的花费了更多时间。 结果,现在大多数代码与PPO有关,但是我仍然对模仿学习感兴趣,并打算添加一些相关算法。 当前功能 目前,此仓库包含一些无模型的基于策略的算法实现:A2C,PPO,V-MPO和BC。 每种算法都支持离散(分类,伯努利,GumbelSoftmax)和连续(贝塔,正态,tanh(正态))策略分布以及矢量或图像观察环境。 Beta和tanh(Normal)在我的实验中效果最好(在BipedalWalker和Huma

文件下载

资源详情

[{"title":"( 41 个子文件 11.42MB ) imitation_learning:PyTorch实现的一些强化学习算法:优势演员评论(A2C),近距离策略优化(PPO),V-MPO,行为克隆(BC)。 将添加更多算法-源码","children":[{"title":"imitation_learning-master","children":[{"title":"utils","children":[{"title":"vec_env.py <span style='color:#111;'> 11.44KB </span>","children":null,"spread":false},{"title":"batch_crop.py <span style='color:#111;'> 773B </span>","children":null,"spread":false},{"title":"utils.py <span style='color:#111;'> 816B </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"env_wrappers.py <span style='color:#111;'> 8.27KB </span>","children":null,"spread":false}],"spread":true},{"title":"algorithms","children":[{"title":"agents","children":[{"title":"v_mpo.py <span style='color:#111;'> 7.77KB </span>","children":null,"spread":false},{"title":"bc.py <span style='color:#111;'> 3.01KB </span>","children":null,"spread":false},{"title":"ppo.py <span style='color:#111;'> 7.62KB </span>","children":null,"spread":false},{"title":"agent_train.py <span style='color:#111;'> 7.81KB </span>","children":null,"spread":false},{"title":"a2c.py <span style='color:#111;'> 2.69KB </span>","children":null,"spread":false}],"spread":true},{"title":"kl_divergence.py <span style='color:#111;'> 1.56KB </span>","children":null,"spread":false},{"title":"real_nvp.py <span style='color:#111;'> 6.71KB </span>","children":null,"spread":false},{"title":"nn","children":[{"title":"conv_encoders.py <span style='color:#111;'> 3.25KB </span>","children":null,"spread":false},{"title":"recurrent_encoders.py <span style='color:#111;'> 2.48KB </span>","children":null,"spread":false},{"title":"actor_critic.py <span style='color:#111;'> 3.31KB </span>","children":null,"spread":false},{"title":"agent_model.py <span style='color:#111;'> 5.47KB </span>","children":null,"spread":false}],"spread":true},{"title":"normalization.py <span style='color:#111;'> 2.82KB </span>","children":null,"spread":false},{"title":"distributions.py <span style='color:#111;'> 9.36KB </span>","children":null,"spread":false}],"spread":true},{"title":"test.py <span style='color:#111;'> 6.31KB </span>","children":null,"spread":false},{"title":"requirements.txt <span style='color:#111;'> 92B </span>","children":null,"spread":false},{"title":"trainers","children":[{"title":"base_trainer.py <span style='color:#111;'> 2.29KB </span>","children":null,"spread":false},{"title":"rollout.py <span style='color:#111;'> 7.51KB </span>","children":null,"spread":false},{"title":"on_policy.py <span style='color:#111;'> 9.09KB </span>","children":null,"spread":false},{"title":"behavior_cloning.py <span style='color:#111;'> 1.51KB </span>","children":null,"spread":false}],"spread":true},{"title":"train_scripts","children":[{"title":"bc","children":[{"title":"cart_pole_10_episodes.py <span style='color:#111;'> 1.62KB </span>","children":null,"spread":false}],"spread":true},{"title":"ppo","children":[{"title":"bipedal_rnn.py <span style='color:#111;'> 2.38KB </span>","children":null,"spread":false},{"title":"car_racing.py <span style='color:#111;'> 2.21KB </span>","children":null,"spread":false},{"title":"cart_pole.py <span style='color:#111;'> 1.69KB </span>","children":null,"spread":false},{"title":"bipedal_hardcore.py <span style='color:#111;'> 2.58KB </span>","children":null,"spread":false},{"title":"bipedal.py <span style='color:#111;'> 1.80KB </span>","children":null,"spread":false},{"title":"humanoid.py <span style='color:#111;'> 1.94KB </span>","children":null,"spread":false},{"title":"cart_pole_rnn.py <span style='color:#111;'> 2.16KB </span>","children":null,"spread":false}],"spread":true},{"title":"a2c","children":[{"title":"cart_pole.py <span style='color:#111;'> 1.61KB </span>","children":null,"spread":false},{"title":"cart_pole_rnn.py <span style='color:#111;'> 2.13KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":".gitignore <span style='color:#111;'> 90B </span>","children":null,"spread":false},{"title":"gifs","children":[{"title":"cartpole.gif <span style='color:#111;'> 84.82KB </span>","children":null,"spread":false},{"title":"car_racing.gif <span style='color:#111;'> 5.93MB </span>","children":null,"spread":false},{"title":"humanoid.gif <span style='color:#111;'> 3.67MB </span>","children":null,"spread":false},{"title":"bipedal.gif <span style='color:#111;'> 1.78MB </span>","children":null,"spread":false}],"spread":true},{"title":"custom_environments","children":[{"title":"mario_wrapper.py <span style='color:#111;'> 1.42KB </span>","children":null,"spread":false}],"spread":true},{"title":"readme.md <span style='color:#111;'> 7.00KB </span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

  • misterdays :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-08-09

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明