Abstract: Actor-critic methods, like Twin Delayed Deep Deterministic Policy Gradient (TD3), depend on basic noisebased exploration, which can result in less than optimal policy convergence. In this ...