ស្វែងយល់ពី Proximal Policy Optimization | PPO | Machine Learning Series | TFD Workshop

វីដេអូដែលបាន Record នៃសិក្ខាសាលា Online អំពី "ស្វែងយល់ពី Proximal Policy Optimization" ជាផ្នែកនៃ Machine Learning Series Recorded video of online workshop: "Understanding Proximal Policy Optimization" as part of Web Security Series ចូលទាញយក Demo នឹង លំហាត់: https://github.com/tfd-ed/tfd-worksho... TFD Workshop Repo: https://github.com/tfd-ed/tfd-workshop 🔑 អ្វីដែលរៀនបាន Part 1: Reinforcement Learning Foundations The RL framework: agents, environments, rewards, and policies States, observations, and action spaces (discrete vs continuous) The credit assignment problem and why RL is challenging Real-world RL applications (games, robotics, control systems) Part 2: Policy Gradient Methods From value-based to policy-based methods Understanding the policy gradient theorem Why vanilla policy gradients are unstable The importance of trust regions in learning Part 3: Understanding PPO The fundamental problem PPO solves Clipping mechanism and surrogate objectives Actor-Critic architecture Generalized Advantage Estimation (GAE) Part 4: Complete PPO Implementation Actor and Critic neural networks in PyTorch Memory buffer for experience collection Computing advantages and returns The PPO update loop with clipping Part 5: Training the Lunar Lander Environment setup with Gymnasium Hyperparameter configuration Training loop implementation Monitoring and debugging training metrics Visualizing learned behaviors Live Demonstrations Lunar Lander Environment - Understanding the observation space and actions Untrained Agent Behavior - Random actions and crashes PPO Training Process - Watching the agent learn in real-time Trained Agent Performance - Successful landings and optimal behavior Training Metrics Visualization - Interpreting reward curves and losses Hands-On Lab Exercises Exercise 1: Understanding the environment and action space Exercise 2: Implementing the Actor-Critic networks Exercise 3: Computing advantages with GAE Exercise 4: The PPO update step Exercise 5: Training your own agent IG:   / darachaukh   YouTube:    / @tfdevs   Website: https://www.tfdevs.com/ Linkedin:   / qiang-cun-zhi   TikTok: https://www.tiktok.com/@chaudarakh?_r... Telegram Channel: https://t.me/tfdTech Facebook Page:   / chaudarascienceengineer   #MachineLearning #ReinforcementLearning #AI #PPO #Workshop #TechEducation #LearningByDoing #AIWorkshop #DeepLearning #PyTorch

Container Security Basics  | មូលដ្ឋានគ្រឹះសុវត្ថិភាព Container | Web Security Series | TFD Workshop
▶︎

Container Security Basics | មូលដ្ឋានគ្រឹះសុវត្ថិភាព Container | Web Security Series | TFD Workshop

មូលដ្ឋានគ្រឹះ Docker | Docker Fundamental | TFDevs
▶︎

មូលដ្ឋានគ្រឹះ Docker | Docker Fundamental | TFDevs

របៀបធ្វើការជាក្រុមជាមួយ Git Workflow  | Collab Dev Series | TFD Workshop
▶︎

របៀបធ្វើការជាក្រុមជាមួយ Git Workflow | Collab Dev Series | TFD Workshop

មុននឹងប្រើ AWS ត្រូវគិតសិន! របៀបជ្រើសរើស Cloud | Cloud Decision Framework | TFDevs & VCloudia
▶︎

មុននឹងប្រើ AWS ត្រូវគិតសិន! របៀបជ្រើសរើស Cloud | Cloud Decision Framework | TFDevs & VCloudia

សិក្សាមេរៀនវគ្គ Docker ពីមូលដ្ឋានគ្រឹះ រហូតដល់កម្រិតខ្ពស់ (ពីដើមដល់ចប់) - Full Course | Docker
▶︎

សិក្សាមេរៀនវគ្គ Docker ពីមូលដ្ឋានគ្រឹះ រហូតដល់កម្រិតខ្ពស់ (ពីដើមដល់ចប់) - Full Course | Docker

Beginner to T-SQL [Full Course]
▶︎

Beginner to T-SQL [Full Course]

ចំណាយពេល 5 ខែទើបចេញផុតពី ការបាក់ទឹកចិត្ត Depression ! ខ្ញុំរៀនបានអីខ្លះ? | My battle with Depression
▶︎

ចំណាយពេល 5 ខែទើបចេញផុតពី ការបាក់ទឹកចិត្ត Depression ! ខ្ញុំរៀនបានអីខ្លះ? | My battle with Depression

Data Modeling for Power BI [Full Course] 📊
▶︎

Data Modeling for Power BI [Full Course] 📊

ពេល Server ស៊ី RAM ដល់ទៅ 30GB ទើបដឹងខ្លួនថាត្រូវប្តូរមកប្រើ K3s វិញ | I ditched k8s for k3s
▶︎

ពេល Server ស៊ី RAM ដល់ទៅ 30GB ទើបដឹងខ្លួនថាត្រូវប្តូរមកប្រើ K3s វិញ | I ditched k8s for k3s

តិចនិកធ្វើឲ្យគេហទំព័រដើរលឿន | Frontend Optimization | Web Optimization Series | TFD Workshop
▶︎

តិចនិកធ្វើឲ្យគេហទំព័រដើរលឿន | Frontend Optimization | Web Optimization Series | TFD Workshop

Build a Complete Medical Chatbot with LLMs, LangChain, Pinecone, Flask & AWS 🔥
▶︎

Build a Complete Medical Chatbot with LLMs, LangChain, Pinecone, Flask & AWS 🔥

ខ្ញុំលែងប្រើ Cloud ហើយធ្វើ HomeLab Server មួយខ្លួនឯង | Moving from Cloud to Home Lab | TFDevs
▶︎

ខ្ញុំលែងប្រើ Cloud ហើយធ្វើ HomeLab Server មួយខ្លួនឯង | Moving from Cloud to Home Lab | TFDevs

Mini Hackathon - Build a Power App! [Full Course]
▶︎

Mini Hackathon - Build a Power App! [Full Course]

AI សរសេរកូដបាន… តោះឈប់រៀន IT ? | AI Writes Code, But Who Fixes the Problems | TFDevs
▶︎

AI សរសេរកូដបាន… តោះឈប់រៀន IT ? | AI Writes Code, But Who Fixes the Problems | TFDevs

What is SonarQube | Introduction SonarQube | SonarQube Tutorial | SonarQube Basics | Intellipaat
▶︎

What is SonarQube | Introduction SonarQube | SonarQube Tutorial | SonarQube Basics | Intellipaat

ចំណាយ $0 លើ ChatGPT ! បង្កើត AI Agent ប្រើលើ Server ខ្លួនឯង (Local LLM) ជួយខ្ញុំគ្រប់គ្រង Server
▶︎

ចំណាយ $0 លើ ChatGPT ! បង្កើត AI Agent ប្រើលើ Server ខ្លួនឯង (Local LLM) ជួយខ្ញុំគ្រប់គ្រង Server

Why Aliens Would NEVER Invade Africa
▶︎

Why Aliens Would NEVER Invade Africa

Music Theory Masterclass 1: Drilling the Basics
▶︎

Music Theory Masterclass 1: Drilling the Basics

ចង់ទៅដល់ចំណុចមួយ ត្រូវហ៊ានបោះចោលរឿងខ្លះ | You Must Leave Something Behind | Life 2.0
▶︎

ចង់ទៅដល់ចំណុចមួយ ត្រូវហ៊ានបោះចោលរឿងខ្លះ | You Must Leave Something Behind | Life 2.0

ហេតុអ្វីខ្ញុំប្រើ Self-Hosted Runner ដើម្បីដាក់ Project លើ HomeLab  | សុវត្ថិភាព HomeLab | TFDevs
▶︎

ហេតុអ្វីខ្ញុំប្រើ Self-Hosted Runner ដើម្បីដាក់ Project លើ HomeLab | សុវត្ថិភាព HomeLab | TFDevs