From 3ed7d3f784c7dc0ec760940436c75aed22f9dca8 Mon Sep 17 00:00:00 2001
From: Quirin Anton Klaus Hosse <quirin.hosse@hosse.de>
Date: Wed, 30 Apr 2025 13:27:30 +0200
Subject: [PATCH] Update reinforcement_learning_terms.md

---
 docs/Cheatsheets/reinforcement_learning_terms.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/Cheatsheets/reinforcement_learning_terms.md b/docs/Cheatsheets/reinforcement_learning_terms.md
index e021c4f..d8c5937 100644
--- a/docs/Cheatsheets/reinforcement_learning_terms.md
+++ b/docs/Cheatsheets/reinforcement_learning_terms.md
@@ -69,7 +69,7 @@ Think of the advantage function as a measure of how much better it was to take t
 
 ## The Learning Process
 
-Most learning algorithms consider an *objective function* $J(\pi)$, which is a function that maps a policy $\pi$ to a real number. The goal of learning is then to find a policy $\pi^*$ that maximizes the objective function, i.e. $J(\pi^*) = \max_{\pi} J(\pi)$. A convenient choice for $J$ would be any of the Q function, value function, or advantage function. For our purposes we will focus on the advantage function, because the Proximal Policy Optimization (PPO) algorithm uses that as an  bjective.
+Most learning algorithms consider an *objective function* $J(\pi)$, which is a function that maps a policy $\pi$ to a real number. The goal of learning is then to find a policy $\pi^*$ that maximizes the objective function, i.e. $J(\pi^*) = \max_{\pi} J(\pi)$. A convenient choice for $J$ would be any of the Q function, value function, or advantage function. For our purposes we will focus on the advantage function, because the Proximal Policy Optimization (PPO) algorithm uses that as an objective.
 
 
 ## Generalized Advantage Estimation