armed-bandit
PublicSolving n-armed-bandit problems using different policies to find the path with the least regret. The policies used in this project were policy gradient and Thompson sampling. All the environments and agents are implemented with the aid of the Amalearn library. This project was carried out as part of the Reinforcement learning master course offered at the University of Tehran under the supervision of Prof Nili.