Robotics Model Training Using Community Data

Explore top LinkedIn content from expert professionals.

Summary

Robotics model training using community data refers to the process of teaching robots new skills and behaviors by using large, diverse datasets collected and shared by everyday people or organizations. This approach helps robots learn to operate in real-world environments by drawing on a variety of examples instead of relying solely on expert demonstrations or controlled lab setups.

  • Gather diverse data: Use videos and recordings from different robots, homes, and real-world scenarios to help models learn to handle messy, unpredictable situations.
  • Decouple tasks: Break down the training process into stages, such as first predicting motions from visual data and then translating those motions into robot actions, to make learning more flexible and scalable.
  • Utilize open tools: Take advantage of open-source datasets and software to make robotics research more accessible, reproducible, and collaborative for everyone interested in advancing robot capabilities.
Summarized by AI based on LinkedIn member posts
  • View profile for Dana Aubakirova

    ML Research Engineer | SmolVLA Lead 🤗 @ Hugging Face

    6,855 followers

    🚀𝐖𝐞 𝐚𝐫𝐞 𝐢𝐧𝐭𝐫𝐨𝐝𝐮𝐜𝐢𝐧𝐠 𝐒𝐦𝐨𝐥𝐕𝐋𝐀-𝟒𝟓𝟎𝐌, 𝐚𝐧 𝐨𝐩𝐞𝐧-𝐬𝐨𝐮𝐫𝐜𝐞 𝐕𝐢𝐬𝐢𝐨𝐧-𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞-𝐀𝐜𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐫𝐨𝐛𝐨𝐭𝐢𝐜𝐬! SmolVLA achieves best-in-class performance and inference speed, and the best part? It’s trained entirely on open-source datasets from the 🤖 LeRobot project hosted on the Hugging Face Hub. 🔍 Why is SmolVLA so good? Turns out that pretraining on a large, diverse and noisy collection of real-world community robotics data leads to better generalization and control. We saw a 26% boost in task success rate simply from adding community dataset pretraining! ⚡ Why is SmolVLA so fast? 1. We halved the size of SmolVLM and extract intermediate representations 2. Introduced interleaved cross- and self-attention layers in the action expert 3. Enabled asynchronous inference so the robot acts and reacts simultaneously 💡 Unlike most academic datasets, these community-contributed datasets are naturally diverse: ✅ Multiple robots, camera angles, and manipulation tasks ✅ Real-world messiness and complexity ✅ Crowd-sourced and community-cleaned using Qwen2.5-VL for high-quality task descriptions 🌍 SmolVLA is a step toward making robotics research more affordable, reproducible, and collaborative. 📖 Want to dive deeper? Check out our blog post & start using it today: https://lnkd.in/e3Gmy8gT Huge thanks to the team who made this possible: @Mustafa Shukor Francesco Capuano Remi Cadene, and the entire Lerobot team, amazing HF team Andrés Marafioti Merve Noyan Aritra Roy Gosthipaty Pedro Cuenca Loubna Ben Allal, Thomas Wolf  and to the amazing contributors to the LeRobot community: Ville Kuosmanen, Alexandre Chapin, Marina Barannikov, and more!

  • View profile for Ilir Aliu

    AI & Robotics | 100k+ | Scaling Deep Tech

    95,794 followers

    90% success rate in unseen environments. No new data, no fine-tuning. Autonomously. Most robots need retraining to work in new places. What if they didn’t? Robot Utility Models (RUMs) learn once and work anywhere... zero-shot. A team from NYU and Hello Robot built a set of general-purpose robot policies that can open drawers, pick up bags, and more, without needing extra training for each new home, setup, or lighting condition. Why RUMs matter ✅ Trained once and deployed in 25+ new environments without fine-tuning ✅ Uses a cheap handheld iPhone-based tool for fast, high-quality data collection ✅ Boosts performance with mLLM-based self-check and retry system ✅ Achieves 90 percent average success rate in real-world zero-shot tasks RUMs show that smart data collection and multi-modal learning can unlock truly general robotics. Try it yourself: Code, models, and data are all open source at https://lnkd.in/dNH_HCat.

  • View profile for Animesh Garg

    RL + Foundation Models in Robotics. Faculty at Georgia Tech. Prev at Nvidia

    18,669 followers

    Robotics data is expensive and slow to collect. A lot of videos are available online, but not readily usable by robotics because of lack of action labels. AMPLIFY solves this problem by learning Actionless Motion Priors that unlock better sample efficiency, generalization, and scaling for robot learning. Our key insight is to factor the problem into two stages: The "what": Predict the visual dynamics required to accomplish a task The "how": Map predicted motions to low-level actions This decoupling enables remarkable generalizability: our policy can perform tasks where we have NO action data, only videos. We outperform SOTA BC baselines on this by 27x 🤯 AMPLIFY is composed of three stages: 1. Motion Tokenization: We track dense keypoint grids through videos and compress their trajectories into discrete motion tokens. 2. Forward Dynamics: Given an image and task description (e.g., "open the box"), we autoregressively predict a sequence of motion tokens representing how keypoints should move over the next second or so. This model can train on ANY text-labeled video data - robot demonstrations, human videos, YouTube videos. 3. Inverse Dynamics: We decode predicted motion tokens into robot actions. This module learns the robot-specific mapping from desired motions to actions. This part can train on ANY robot interaction data - not just expert demonstrations (think off-task data, play data, or even random actions). So, does it actually work? Few-shot learning: Given just 2 action-annotated demos per task, AMPLIFY nearly doubles SOTA few-shot performance on LIBERO. This is possible because our Actionless Motion Priors provide a strong inductive bias that dramatically reduces the amount of robot data needed to train a policy. Cross-embodiment learning: We train the forward dynamics model on both human and robot videos, but the inverse model sees only robot actions. Result: 1.4× average improvement on real-world tasks. Our system successfully transfers motion information from human demonstrations to robot execution. And now my favorite result: AMPLIFY enables zero-shot task generalization. We train on LIBERO-90 tasks and evaluate on tasks where we’ve seen no actions, only pixels. While our best baseline achieves ~2% success, AMPLIFY reaches a 60% average success rate, outperforming SOTA behavior cloning baselines by 27x. This is a new way to train VLAs for robotics which dont always start with large scale teleoperation. Instead of collecting millions of robot demonstrations, we just need to teach robots how to read the language of motion. Then, every video becomes training data. led by Jeremy Collins & Loránd Cheng in collaboration with Kunal Aneja, Albert Wilcox, Benjamin Joffe at College of Computing at Georgia Tech Check out our paper and project page for more details: 📄 Paper: https://lnkd.in/eZif-mB7 🌐 Website: https://lnkd.in/ezXhzWGQ

Explore categories