Minimizing Sim2real Error in Robotics: Maybe the best open source content available.

Mustafa Mohammadi

Published Apr 13, 2025

+ Follow

Polymorphic Bipedal Robot TRON1 SDK Development Guide - Open Source Algorithm Update

Click here for the website: https://www.limxdynamics.com/en

Sales: https://www.linkedin.com/in/yejiahui/

Cofounder: https://www.linkedin.com/in/li-zhang-1029454/

Thank you to friends at Lymx dynamics. This content is written by Limx team. And I have permission to post it.

Many students encounter the following questions when using TRON 1 for secondary development:

Why does the trained policy exhibit poor performance after following the open-source RL training process outlined in the SDK Development Guide?
How many rounds of training are typically needed to achieve satisfactory results?
Why does a policy, after training, perform well in the simulator but yield very different results when deployed on a real machine.
Watch the video below!

Today's article will answer these questions.

01 Why update the open-source training algorithms?

Let's address the first question: Why might policy performance be poor after following the open-source RL training process in the SDK Development Guide?

Previously, the learning method involved starting with open-source algorithms, such as PPO, to help users familiarize themselves with the framework and principles of reinforcement learning, as well as the RL development process for legged robots.

However, the standard open-source PPO algorithm has limitations, making it difficult to achieve optimal training results when used directly:

a. Low sample efficiency: Data needs resampling every time the policy is updated, and historical experience cannot be reused.
b. Hyperparameter sensitivity: PPO's performance heavily depends on hyperparameter choices, including learning rate, discount factor, clipping threshold (ε), etc. These require fine-tuning for specific tasks; otherwise, training may become unstable or converge slowly.
c. Poor training stability: Policy collapse or oscillation can still occur in complex tasks.
d. Poor adaptability to sparse rewards: In robotics tasks, reward signals are often sparse (e.g., rewards given only upon task completion), making effective learning difficult for PPO.

Optimizing open-source policies requires a significant learning process. Currently, comprehensive learning materials are limited. Consequently, many students spend one or two months achieving a relatively stable walking gait for TRON 1 in developer mode.

Due to these challenges, LimX Dynamics (assuming "Zhuji Power" refers to the company) has optimized the existing open-source algorithms, aiming to further lower the entry threshold for embodied intelligence RL.

02 What improvements have been made in this update?

In the paper "CTS: Concurrent Teacher-Student Reinforcement Learning for Legged Locomotion," co-published by LimX Dynamics, a new CTS reinforcement learning architecture was proposed. The core idea is to enable teacher and student policies to mutually enhance each other and progress together during training.

(Image Placeholder: Diagram of CTS architecture)

This open-source update incorporates engineering improvements based on the CTS architecture. The updated TRON 1 open-source training algorithm significantly improves training effectiveness in simulation and also reduces the Sim2Real (Simulation-to-Reality) gap. All beginners can now achieve stable flat-ground movement with TRON 1 after completing the open-source training.

This methodology has been updated on the LimX Dynamics technical support website. Please refer to the "RL Model Training" and "RL Training Results Deployment" chapters in the "TRON 1 SDK Development Guide." Remember to update your local repository before use.

(Image Placeholder: Training at Isaac Gym)

After training, we first need to confirm the Sim2Sim performance. The first 'Sim' refers to our training simulation environment (e.g., Isaac Gym, Isaac Sim), and the second 'Sim' is our simulation deployment environment (e.g., Gazebo, MuJoCo). A large Sim2Sim gap necessitates evaluating the appropriateness of the training policy, particularly whether movement behavior is adequately regulated via reward functions and whether contradictions or conflicts exist between rewards.

Addressing the second question: How many training rounds are generally required for satisfactory results?

The number of training iterations needed in reinforcement learning depends mainly on several factors:

Training policies and frameworks used
Reward parameter settings
Environmental complexity
Differences in mission objectives

Our experience indicates it typically ranges from 10,000 to 20,000 rounds; a well-designed training framework might require even less time. An excessive number of training rounds can lead to policy overfitting, diminishing its generalization performance.

(Image Placeholder: Simulation results after 15,000 rounds of training)

(Image Placeholder: Use TensorBoard to view the training curve. A normal curve will gradually converge.)

Finally, let's tackle the third question: Why does a policy perform well in the simulator but differently when deployed on a real machine?

Click here for the website: https://www.limxdynamics.com/en

How to reduce the Sim2Real Gap

When deploying a trained policy to the real robot, we often encounter stable simulation performance but markedly different real-world behavior – the Sim2Real Gap. This is a common challenge in robot locomotion training. When facing a significant Gap, we need to check if the following two aspects have been properly addressed:

Build accurate robot models:
Establish an accurate simulation environment:

If a large gap persists after checking these conditions, we need to analyze whether our policy is overfitting, meaning it's only applicable within a narrow range of environments. In this case, focus on the appropriateness of Domain Randomization. Adding randomization items or increasing the randomization range can enhance the model's generalization performance.

After optimization, the open-source algorithm's performance on the real robot closely approaches that of the deeply optimized algorithm in remote control mode.

(Image Placeholder: Real machine deployment effect)

03 Opportunities for Further Optimization

The algorithm developed this time is implemented based on the CTS framework. To further enhance performance, researchers are encouraged to explore the following directions in-depth:

CTS framework enhancement: Improve the CTS framework on multiple levels, such as innovatively designing the Teacher Policy mechanism, using knowledge distillation to improve adaptability in complex terrain, and incorporating insights into robot dynamics into the Student Policy.
State representation optimization: The current version explicitly models the robot's linear velocity. Further work could expand implicit feature learning from historical observation data.
Improved network architecture: Upgrade the existing MLP policy network to a sequence-aware architecture (e.g., CNN or GRU) to better handle temporal dependencies in motion control.
Introducing more observations: Incorporating sensor data like vision can further enrich the robot's environmental perception.
Introduce other generative models: Experiment with methods like diffusion models to generate longer action sequence commands.

Do you own Limx Biped? What are you doing that is different from us. Reach out to Limx to discuss more techniques and robotic solutions. @https://www.linkedin.com/company/limx-dynamics/

Click here for the website: https://www.limxdynamics.com/en

#Robotics #BipedalRobot #TRON1 #ReinforcementLearning #RL #EmbodiedIntelligence #AI #MachineLearning #OpenSource #SDK #Simulation #Sim2Real #IsaacGym #CTS #RobotLocomotion #Algorithm #LimX #TechInnovation #Python #ControlSystems

Mustafa Mohammadi

8mo

The AI framework used in training the robot is here:https://clearlab-sustech.github.io/concurrentTS/ ! Congratulations! Amazing paper Hua Chen!

1 Reaction

To view or add a comment, sign in

Minimizing Sim2real Error in Robotics: Maybe the best open source content available.

Mustafa Mohammadi

More articles by Mustafa Mohammadi

Explore content categories

More articles by Mustafa Mohammadi

VLA's are not ready for industrial use case because it is not trained with industrial data.

Accurate Force Modeling for Sim2Real : A Measurement-Driven Approach Using Isaac Sim Custom Physics Plugins

How do you know your grasping algorithm is better if you’re not testing on the same objects?

TikTok Refugee: A Dilemma for Meta/Instagram Product Team

How Many AI Models Will Dominate the Future?

For AI Product Managers: How Can 3D Reconstruction and Generative AI Transform Your Product?

AI Product Managers: Should You Build Your Own Model or Use Existing Ones? 🤔

Your Phone's Data: The Goldmine AI Companies Are After

Your Phone Data Should Make You Rich Enough to Quit Your Job

PyTorch vs. TensorFlow for Mobile Machine Learning Applications

Explore content categories