Synthetic and Real Data for Robot Training

Explore top LinkedIn content from expert professionals.

Summary

Synthetic and real data for robot training means using both computer-generated scenarios and actual hands-on experiences to teach robots how to perform tasks. This approach helps robots quickly learn new skills by multiplying limited human data in digital simulations, making training much faster and more flexible.

  • Combine data sources: Use both simulated data and real-world examples to build a more adaptable and well-rounded robot.
  • Multiply training scenarios: Generate many virtual environments and actions to expose robots to a wider variety of situations without needing physical setups.
  • Standardize formats: Adopt standardized ways to record and use robot actions so that new training data can be shared and reused easily across different tasks.
Summarized by AI based on LinkedIn member posts
  • View profile for Jim Fan
    Jim Fan Jim Fan is an Influencer

    NVIDIA Director of AI & Distinguished Scientist. Co-Lead of Project GR00T (Humanoid Robotics) & GEAR Lab. Stanford Ph.D. OpenAI's first intern. Solving Physical AGI, one motor at a time.

    223,331 followers

    Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data.  2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the *motion* of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro  -> RoboCasa produces N (varying visuals)  -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are creating tools to enable everyone in the ecosystem to scale up with us: - RoboCasa: our generative simulation framework (Yuke Zhu). It's fully open-source! Here you go: http://robocasa.ai - MimicGen: our generative action framework (Ajay Mandlekar). The code is open-source for robot arms, but we will have another version for humanoid and 5-finger hands: https://lnkd.in/gsRArQXy - We are building a state-of-the-art Apple Vision Pro -> humanoid robot "Avatar" stack. Xiaolong Wang group’s open-source libraries laid the foundation: https://lnkd.in/gUYye7yt - Watch Jensen's keynote yesterday. He cannot hide his excitement about Project GR00T and robot foundation models! https://lnkd.in/g3hZteCG Finally, GEAR lab is hiring! We want the best roboticists in the world to join us on this moon-landing mission to solve physical AGI: https://lnkd.in/gTancpNK

  • View profile for Adithya Murali

    Senior Research Scientist at NVIDIA | MIT TR35, Prev CMU PhD, Berkeley AI Research

    3,008 followers

    I’m super excited to release a multi-year project we have been cooking at NVIDIA Robotics. Grasping is a foundational challenge in robotics 🤖 — whether for industrial picking or general-purpose humanoids. VLA + real data collection is all the rage now but is expensive and scales poorly for this task. For every new embodiment and/or scene, we'll have to recollect the dataset in this paradigm for the best perf. Key Idea: Since grasping is a well-defined task in physics simulation - why can’t we just scale synthetic data generation and train a GenAI model for grasping? By embracing modularity and standardized grasp formats, we can make this a turnkey technology that works zero-shot for multiple settings. Introducing… 🚀 GraspGen: A Diffusion-Based Framework for 6-DOF Grasping GraspGen is a modular framework for diffusion-based 6-DOF grasp generation that scales across embodiment types, observability conditions, clutter, task complexity. Key Features: ✅ Multi-embodiment support: suction, antipodal pinch, and underactuated pinch grippers ✅ Generalization to both partial and complete 3D point clouds ✅ Generalization to both single-objects and cluttered scenes  ✅ Modular design relies on other robotics packages and foundation models (SAM2, cuRobo, FoundationStereo, FoundationPose). This allows GraspGen to focus on only one thing - grasp generation ✅ Training recipe: grasp discriminator is trained with On-Generator data from the diffusion model - so that it learns to correct any mistakes of the diffusion generator ✅ Real-time performance (~20 Hz) before any GPU acceleration; low memory footprint 📊 Results: • SOTA on the FetchBench [Han et. al. CoRL 2024] benchmark • Zero-shot sim-to-real transfer on unknown objects and cluttered scenes • Dataset of 53M simulated grasps across 8K objects from Objaverse We're also releasing: 🔹 Simulation-based grasp data generation workflows 🔹 Standardized formats and gripper definitions 🔹 Full training infrastructure 📄 arXiv: https://lnkd.in/gaYmcfz4 🌐 Website: https://lnkd.in/gGiKRCMX 💻 Code: https://lnkd.in/gYR77bEh A huge thank you to everyone involved in this journey — excited to hear the feedback from the community! Joint work with Clemens Eppner, Balakumar Sundaralingam, Yu-Wei Chao, Mark T. Carlson, Jun Yamada and other collaborators. Many thanks to Yichao Pan, Shri Sundaram, Spencer Huang, Buck Babich, Amit Goel for product management and feedback. #robotics #grasping #physicalAI #simtoreal

  • View profile for Antonio Loquercio

    Assistant Professor at the University of Pennsylvania

    3,212 followers

    How can you scale up robot learning? We introduce HOP: Extract hand-object trajectories from in-the-wild videos. Train a next-token-prediction model to get a manipulation prior. Adapt to your task using BC or RL. Check out our webpage for paper and code: http://bit.ly/47tyeDv. Short summary below: We generate 70,000 robot trajectories from in-the-wild videos of human activities using a novel sim-in-the-loop retargeting scheme. Our proposed framework for retargeting enables scalable robot data generation from videos. Check out our repo to access the dataset. BC-finetuning our prior with real-world data leads to more sample-efficient training than baselines. We find that initializing RL training with our prior leads to policies that are more robust to disturbances and generalize better to unseen objects compared to training from scratch.

Explore categories