Value Smoothing via Latent Embedding Similarity

Published: July 30, 2024

Created an experiment to modify the reward structure of reinforcement learning algorithms to enhance learning capabilities in environments with sparse rewards.

This used similarity in embedding space to teach a model how to understand when an output receives a negative reward but is “almost correct” — smoothing the value landscape around near-correct states.

Experimentally showed that the algorithm performs better than standard approaches in environments with sparse rewards.

Share on

Facebook LinkedIn

Aniket Wagde

Share on