Robust-Asynchronous-Q-Learning-with-Markovian-Data
Public?????? ?????-?/?????-???/?: The first provably robust variants of asynchronous Q-learning that tolerates adversarially corrupted rewards. Our algorithm is distribution-agnostic, and achieves near-optimal finite-time guarantees up to a provably unavoidable corruption-dependent additive term.