Chinmaya Andukuri

Invariance of state-dependent baseline in Vanilla Policy Gradient

My handwritten proof to understand why the Vanilla Policy Gradient does not change when subtracting a state-dependent baseline of our choice. Used OpenAI Spinning Up in Deep RL as a starting point.