Tuning Hyperparameters Learning Rate and Gamma in Gym Environment Inverted Pendulum
    1. Department of Telecommunication and Network Engineering, Institute of Technology of Cambodia, Russian Federation Blvd., P.O. Box 86, Phnom Penh, Cambodia

Received: August 28,2024 / Revised: September 11,2024 / / Accepted: September 19,2024 / Available online: August 30,2025

Download PDF
Browse Figures
×

 Abstract: In this paper, reinforcement learning, as a subfield of artificial intelligence and machine learning, is examined based on its capacity for interdependent statistics, optimization, and mathematical concepts and how agents can use trial-error learning to solve tasks. Furthermore, fine-tuning of parameters in machine learning is the ability of scientists to directly influence the architecture, usability, and effectiveness of models in finding the right solution for a particular problem. This paper investigates the impact of learning rate and gamma on the reinforcement algorithm of deep reinforcement learning in a gym environment using inverted pendulum version 4 (inverted pendulumv4) that will acted as agent on Google Collab. As a result, the model's hyperparameter choice significantly impacts the agent's performance in terms of sample efficiency, learning stability, and cumulative rewards, determining how quickly and reliably the agent can learn from interactions, maintain consistent performance across training sessions, and achieve optimal outcomes over time. The careful tuning of these hyperparameters is therefore crucial to maximizing the agent's effectiveness in various environments, as even small adjustments can lead to substantial differences in performance metrics. From experiments, the optimal results were observed with a learning rate of 0.0001 and a Gamma value of 0.99. These settings yielded the highest cumulative rewards and demonstrated effective learning stability comparing to others value.