Value Smoothing via Latent Embedding Similarity

Published:

Created an experiment to modify the reward structure of reinforcement learning algorithms to enhance learning capabilities in environments with sparse rewards.

This used similarity in embedding space to teach a model how to understand when an output receives a negative reward but is “almost correct” — smoothing the value landscape around near-correct states.

Experimentally showed that the algorithm performs better than standard approaches in environments with sparse rewards.

Categories: