Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data. 2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the *motion* of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro -> RoboCasa produces N (varying visuals) -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are creating tools to enable everyone in the ecosystem to scale up with us: - RoboCasa: our generative simulation framework (Yuke Zhu). It's fully open-source! Here you go: http://robocasa.ai - MimicGen: our generative action framework (Ajay Mandlekar). The code is open-source for robot arms, but we will have another version for humanoid and 5-finger hands: https://lnkd.in/gsRArQXy - We are building a state-of-the-art Apple Vision Pro -> humanoid robot "Avatar" stack. Xiaolong Wang group’s open-source libraries laid the foundation: https://lnkd.in/gUYye7yt - Watch Jensen's keynote yesterday. He cannot hide his excitement about Project GR00T and robot foundation models! https://lnkd.in/g3hZteCG Finally, GEAR lab is hiring! We want the best roboticists in the world to join us on this moon-landing mission to solve physical AGI: https://lnkd.in/gTancpNK
Managing High-Quality Robot Data at Scale
Explore top LinkedIn content from expert professionals.
Summary
Managing high-quality robot data at scale means creating and maintaining large, reliable datasets that help robots learn, adapt, and perform tasks more accurately. This involves collecting initial real-world data, expanding it using simulations and automated quality checks, and ensuring that only useful information is used for training and deployment.
- Automate data checks: Build processes that automatically catch and fix errors in sensor data before moving forward, so you don’t waste resources on flawed information.
- Multiply training scenarios: Use simulation tools to create diverse environments and variations, quickly turning limited real-world robot demonstrations into thousands of unique learning experiences.
- Filter out failures: Apply quality filtering that removes unsuccessful robot actions from the dataset, helping machines learn only from examples that work well.
-
-
🔥 Introducing Physical AI workbench, how Voxel51 and NVIDIA are solving the data pipeline bottleneck that’s blocking Physical AI at scale. 🔥 As autonomous vehicles, humanoid and industrial robots move from lab to deployment, teams need high-fidelity simulations to deploy with confidence. However, over 50% of Physical AI simulations fail because of bad input data, slowing teams and wasting millions in compute costs. ⚠️ The problem: Physical AI systems process petabytes of multimodal sensor data—LiDAR, radar, cameras, IMU. Even a small calibration or timing error between LiDAR, radar, and camera sensors can ripple downstream, resulting in inaccurate neural reconstructions and wasted compute dollars. ✅ The solution: Physical AI Workbench integrates with NVIDIA Omniverse NuRec and NVIDIA Cosmos, giving teams a standardized way to audit, enrich, and prepare multimodal data for simulation and neural reconstruction, ensuring every test starts with trusted data. How it works: 🔍 Catch and fix errors automatically by auditing sensor data across 75+ critical checkpoints 🔧 Transform raw sensor streams into structured, searchable data with AI data enrichment ⚡ Trigger neural reconstructions and generate synthetic scene variations 📈 Scale simulation workflows with complete traceability and speed Automated QA checks enable teams to catch data quality issues before wasting valuable compute resources, prevent downstream failures, and increase simulation ROI. Walking the floor at #NVIDIAGTC DC last week, the signal for Physical AI is stronger than ever. If you're building AVs NVIDIA DRIVE, humanoid robots NVIDIA Robotics, manufacturing automation—this is the data engine you need between your sensors and your simulations. 🔗 See how it works here: https://lnkd.in/e2zgyx7K
-
How can you scale up robot learning? We introduce HOP: Extract hand-object trajectories from in-the-wild videos. Train a next-token-prediction model to get a manipulation prior. Adapt to your task using BC or RL. Check out our webpage for paper and code: http://bit.ly/47tyeDv. Short summary below: We generate 70,000 robot trajectories from in-the-wild videos of human activities using a novel sim-in-the-loop retargeting scheme. Our proposed framework for retargeting enables scalable robot data generation from videos. Check out our repo to access the dataset. BC-finetuning our prior with real-world data leads to more sample-efficient training than baselines. We find that initializing RL training with our prior leads to policies that are more robust to disturbances and generalize better to unseen objects compared to training from scratch.
-
NVIDIA researchers are using the Apple Vision Pro headset to control humanoid robots in real-time. Imagine putting on a headset and suddenly feeling as if you're inside a robot's body, controlling its movements with your own. According to the researchers, that's exactly the experience - they describe it as feeling "immersed" in another body, much like the movie Avatar. 𝐒𝐨, 𝐇𝐨𝐰 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤? Let me break it down: - Human Demonstration with Apple Vision Pro: Operators wear the Apple Vision Pro headset to control humanoid robots. This provides initial demonstration data as they perform tasks the robot needs to learn. - RoboCasa Simulation Framework: This is a simulation tool that takes the real-world data from the human demonstrations and multiplies it by generating a variety of virtual environments. Think of it as creating numerous practice scenarios without needing more human input. - MimicGen Data Augmentation: Building on that, MimicGen creates new robot motion paths based on the human demonstrations. It's like giving the robot creativity to try new ways of performing tasks. - Quality Filtering: The system automatically filters out any failed attempts, ensuring the robot learns only from successful actions. This process turns limited human input into a vast, high-quality dataset. 𝐖𝐡𝐲 𝐈𝐬 𝐓𝐡𝐢𝐬 𝐚 𝐁𝐢𝐠 𝐃𝐞𝐚𝐥? Traditionally, training robots requires a lot of human time and effort, which can be expensive and slow. NVIDIA's approach can multiply robot training data by 1,000 times or more using simulations. By leveraging powerful GPUs (graphics processing units), researchers can substitute computational power for costly human labor. Just as large language models (like those behind advanced chatbots) have rapidly improved by scaling up training data, this method could lead to advances in robot capabilities and adaptability. We're talking about robots that can learn and adapt much more quickly than before. The ability to efficiently scale training data means we could see rapid advancements in how robots perform complex tasks, interact with environments, and maybe even integrate into our daily lives sooner than we thought. Do you see this as a step forward in robotics and AI? How might this impact the future of work and technology? #innovation #technology #future #management #startups