Training AI Models With Limited Data

Explore top LinkedIn content from expert professionals.

  • View profile for Jim Fan
    Jim Fan Jim Fan is an Influencer

    NVIDIA Director of AI & Distinguished Scientist. Co-Lead of Project GR00T (Humanoid Robotics) & GEAR Lab. Stanford Ph.D. OpenAI's first intern. Solving Physical AGI, one motor at a time.

    223,316 followers

    Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data.  2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the *motion* of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro  -> RoboCasa produces N (varying visuals)  -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are creating tools to enable everyone in the ecosystem to scale up with us: - RoboCasa: our generative simulation framework (Yuke Zhu). It's fully open-source! Here you go: http://robocasa.ai - MimicGen: our generative action framework (Ajay Mandlekar). The code is open-source for robot arms, but we will have another version for humanoid and 5-finger hands: https://lnkd.in/gsRArQXy - We are building a state-of-the-art Apple Vision Pro -> humanoid robot "Avatar" stack. Xiaolong Wang group’s open-source libraries laid the foundation: https://lnkd.in/gUYye7yt - Watch Jensen's keynote yesterday. He cannot hide his excitement about Project GR00T and robot foundation models! https://lnkd.in/g3hZteCG Finally, GEAR lab is hiring! We want the best roboticists in the world to join us on this moon-landing mission to solve physical AGI: https://lnkd.in/gTancpNK

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    598,968 followers

    If you are wondering how RLHF works, and how we can teach large language models to be helpful, harmless, and honest, read along 👇 The key isn’t just in scaling up model size, it’s in aligning models with human intent. The InstructGPT paper (2022) introduced a three-step process called Reinforcement Learning from Human Feedback (RLHF). And even today, it remains the foundation of how we build instruction-following models like ChatGPT. Let me walk you through the workflow in plain terms, based on the now-famous diagram below 👇 𝟭. 𝗦𝘂𝗽𝗲𝗿𝘃𝗶𝘀𝗲𝗱 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴 (𝗦𝗙𝗧) → Start by showing the model examples of great answers to real prompts, written by humans. → These examples help the model learn how to respond: clear, direct, and grounded. → Think of this as training a junior writer by giving them a stack of perfect first drafts. → Even with a small dataset (13k samples), this creates a solid instruction-following base. 𝟮. 𝗥𝗲𝘄𝗮𝗿𝗱 𝗠𝗼𝗱𝗲𝗹 (𝗥𝗠) → Next, we collect several outputs for the same prompt and ask humans to rank them from best to worst. → We then train a separate model- the reward model, to predict those rankings. → Now, we’ve turned human preferences into a numerical score the model can optimize for. → This is the real magic: turning subjective feedback into something that can guide learning. 𝟯. 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 (𝗣𝗣𝗢) → Now the model generates new answers, gets scored by the reward model, and adjusts its behavior to maximize reward. → We use Proximal Policy Optimization (PPO), an RL algorithm that gently nudges the model in the right direction without making it forget what it already knows. → A “KL penalty” keeps it from straying too far, like a seatbelt keeping it grounded. 𝗪𝗵𝘆 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀❓ ✅ A small 1.3B model trained with this pipeline outperformed GPT-3 (175B) in human evaluations. ✅ It generalized to unseen domains with little extra supervision. ✅ And it required orders of magnitude less data than pre-training. 𝗪𝗵𝗮𝘁 𝘁𝗵𝗶𝘀 𝗺𝗲𝗮𝗻𝘀 𝗳𝗼𝗿 𝗯𝘂𝗶𝗹𝗱𝗲𝗿𝘀❓ → Bigger isn’t always better. Better feedback leads to better behavior. → Pairwise comparisons are often more scalable than manual ratings. → RLHF lets us teach models values, not just vocabulary. If you're building AI systems, aligning them with human preferences isn’t just a safety concern- it’s a product strategy. --------- Share this with your network ♻️ Follow me (Aishwarya Srinivasan) for more AI insights.

  • View profile for Dana Aubakirova

    ML Research Engineer | SmolVLA Lead 🤗 @ Hugging Face

    6,852 followers

    🚀𝐖𝐞 𝐚𝐫𝐞 𝐢𝐧𝐭𝐫𝐨𝐝𝐮𝐜𝐢𝐧𝐠 𝐒𝐦𝐨𝐥𝐕𝐋𝐀-𝟒𝟓𝟎𝐌, 𝐚𝐧 𝐨𝐩𝐞𝐧-𝐬𝐨𝐮𝐫𝐜𝐞 𝐕𝐢𝐬𝐢𝐨𝐧-𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞-𝐀𝐜𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐫𝐨𝐛𝐨𝐭𝐢𝐜𝐬! SmolVLA achieves best-in-class performance and inference speed, and the best part? It’s trained entirely on open-source datasets from the 🤖 LeRobot project hosted on the Hugging Face Hub. 🔍 Why is SmolVLA so good? Turns out that pretraining on a large, diverse and noisy collection of real-world community robotics data leads to better generalization and control. We saw a 26% boost in task success rate simply from adding community dataset pretraining! ⚡ Why is SmolVLA so fast? 1. We halved the size of SmolVLM and extract intermediate representations 2. Introduced interleaved cross- and self-attention layers in the action expert 3. Enabled asynchronous inference so the robot acts and reacts simultaneously 💡 Unlike most academic datasets, these community-contributed datasets are naturally diverse: ✅ Multiple robots, camera angles, and manipulation tasks ✅ Real-world messiness and complexity ✅ Crowd-sourced and community-cleaned using Qwen2.5-VL for high-quality task descriptions 🌍 SmolVLA is a step toward making robotics research more affordable, reproducible, and collaborative. 📖 Want to dive deeper? Check out our blog post & start using it today: https://lnkd.in/e3Gmy8gT Huge thanks to the team who made this possible: @Mustafa Shukor Francesco Capuano Remi Cadene, and the entire Lerobot team, amazing HF team Andrés Marafioti Merve Noyan Aritra Roy Gosthipaty Pedro Cuenca Loubna Ben Allal, Thomas Wolf  and to the amazing contributors to the LeRobot community: Ville Kuosmanen, Alexandre Chapin, Marina Barannikov, and more!

  • View profile for Vin Vashishta
    Vin Vashishta Vin Vashishta is an Influencer

    AI Strategist | Monetizing Data & AI For The Global 2K Since 2012 | 3X Founder | Best-Selling Author

    205,701 followers

    The majority of AI and agentic workloads will run locally for two main reasons: cost and personalization. Businesses can save thousands every month per developer by connecting Ollama to VS Code or your favorite JetBrains IDE using Continue. The percentage of requests that truly require the biggest Claude or Gemini models is tiny. Run them head-to-head for a week and you’ll switch to local LLMs for almost everything. Local LLMs run faster, and I feel like that helps me stay in the flow better. Cloud instances take longer to respond, and those lags break focus. It’s the same for most generative AI workloads, from marketing content creation to meeting summarization. The majority of tasks can be done just as well by smaller models running locally, and often local LLMs have significant workflow advantages. They also allow for personalization. I can keep my own coding style guide locally and fill it with all the things that make my code uniquely mine. Those little nuances are built on decades of experience. The only way I’m sharing them with an LLM is if I know they’re not going to be used to replace me. That’s a common sentiment I have heard over the last three years implementing AI in enterprises. Experts aren’t revealing their secret sauce to AI. The lack of trust leads to low adoption rates. AI doesn’t perform as well without the expert context and nuanced guidance. The more time I spend building agents, the more benefits I see to breaking workflows down more granularly (into several detailed subtasks). It gives me more control, explainability, and reliability. The more granular the task, the less difference there is between a large cloud model and a smaller local model. Big AI like ChatGPT will still have plenty of work to do, but I don’t see a future where the majority of workloads are done by Anthropic, OpenAI, or Google’s largest models. Local AI and smaller models have way too many benefits to be ignored for much longer.

  • View profile for Andreas Sjostrom
    Andreas Sjostrom Andreas Sjostrom is an Influencer

    LinkedIn Top Voice | AI Agents | Robotics I Vice President at Capgemini's Applied Innovation Exchange | Author | Speaker | San Francisco | Palo Alto

    13,640 followers

    LLMs aren’t just pattern matchers... they learn on the fly. A new research paper from Google Research sheds light on something many of us observe daily when deploying LLMs: models adapt to new tasks using just the prompt, with no retraining. But what’s happening under the hood? The paper shows that large language models simulate a kind of internal, temporary fine-tuning at inference time. The structure of the transformer, specifically the attention + MLP layers, allows the model to "absorb" context from the prompt and adjust its internal behavior as if it had learned. This isn’t just prompting as retrieval. It’s prompting as implicit learning. Why this matters for enterprise AI, with real examples: ⚡ Public Sector (Citizen Services): Instead of retraining a chatbot for every agency, embed 3–5 case-specific examples in the prompt (e.g. school transfers, public works complaints). The same LLM now adapts per citizen's need, instantly. ⚡ Telecom & Energy: Copilots for field engineers can suggest resolutions based on prior examples embedded in the prompt; no model updates, just context-aware responses. ⚡ Financial Services: Advisors using LLMs for client summaries can embed three recent interactions in the prompt. Each response is now hyper-personalized, without touching the model weights. ⚡ Manufacturing & R&D: Instead of retraining on every new machine log or test result format, use the prompt to "teach" the model the pattern. The model adapts on the fly. Why is this paper more than “prompting 101”? We already knew prompting works. But we didn’t know why so well. This paper, "Learning without training: The implicit dynamics of in-context learning" (Dherin et al., 2025), gives us that why. It mathematically proves that prompting a model with examples performs rank-1 implicit updates to the MLP layer, mimicking gradient descent. And it does this without retraining or changing any parameters. Prior research showed this only for toy models. This paper shows it’s true for realistic transformer architectures, the kind we actually use in production. The strategic takeaway: This strengthens the case for LLMs in enterprise environments. It shows that: * Prompting isn't fragile — it's a valid mechanism for task adaptation. * You don’t need to fine-tune models for every new use case. * With the right orchestration and context injection, a single foundation model can power dozens of dynamic, domain-specific tasks. LLMs are not static tools. They’re dynamic, runtime-adaptive systems, and that’s a major reason they’re here to stay. 📎 Link to the paper: http://bit.ly/4mbdE0L

  • View profile for Muazma Zahid

    Data and AI Leader | Advisor | Speaker

    17,649 followers

    Happy Friday! This week in #learnwithmz, I’m building on my recent post about running LLMs/SLMs locally: https://lnkd.in/gpz3kXhD Since sharing that, the landscape has rapidly evolved, local LLM tooling is more capable and deployment-ready than ever. In fact, at a conference last week, I was asked twice about private model hosting. Clearly, the demand is real. So let's dive deeper into the frameworks making local inference faster, easier, and more scalable. Ollama (Most User-Friendly) Run models like llama3, phi-3, and deepseek with one command. https://ollama.com/ llama.cpp (Lightweight & C++-based) Fast inference engine for quantized models. https://lnkd.in/ghxrSnY3 MLC LLM (Cross-Platform Compiler Stack) Runs LLMs on iOS, Android, and Web via TVM. https://mlc.ai/mlc-llm/ ONNX Runtime (Enterprise-Ready) Cross-platform, hardware-accelerated inference from Microsoft. https://onnxruntime.ai/ LocalAI (OpenAI API-Compatible Local Inference) Self-hosted server with model conversion, whisper integration, and multi-backend support. https://lnkd.in/gi4N8v5H LM Studio (Best UI for Desktop) A polished desktop interface to chat with local models. https://lmstudio.ai/ Qualcomm AI Hub (For Snapdragon-powered Devices) Deploy LLMs optimized for mobile and edge hardware. https://lnkd.in/geDVwRb7 LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite Still solid for embedded and mobile deployments. https://lnkd.in/g2QGSt9H CoreML (Apple) Optimized for deploying LLMs on Apple devices using Apple Silicon + Neural Engine. https://lnkd.in/gBvkj_CP MediaPipe (Google) Optimized for LLM inference on Android devices. https://lnkd.in/gZJzTcrq Nexa AI SDK (Nexa AI) Cross-platform SDK for integrating LLMs directly into mobile apps. https://lnkd.in/gaVwv7-5 Why Local LLMs Matter? - Edge AI and privacy-first features are rising - Cost, latency, and sovereignty concerns are real - Mobile + Desktop + Web apps need on-device capabilities - Developers + PMs: This is your edge. Building products with LLMs doesn't always need the cloud. Start testing local-first workflows. What stack are you using or exploring? #AI #LLMs #EdgeAI #OnDeviceAI #AIInfra #ProductManagement #Privacy #AItools #learnwithmz

  • View profile for Paul Iusztin

    Senior AI Engineer • Founder @ Decoding AI • Author @ LLM Engineer’s Handbook ~ I ship AI products and teach you about the process.

    86,547 followers

    A blueprint for designing production LLM systems: From Notebooks to production For example, we will fine-tune an LLM and do RAG on social media data, but it can easily be adapted to any data. We have 4 core components. We will follow the feature/training/inference (FTI) pipeline architecture. 𝟭. 𝗗𝗮𝘁𝗮 𝗖𝗼𝗹𝗹𝗲𝗰𝘁𝗶𝗼𝗻 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 It is based on an ETL that: - crawls your data from blogs and socials - standardizes it - loads it to a NoSQL database (e.g., MongoDB) As: - we work with text data, which is naturally unstructured - no analytics required → a NoSQL database fits like a glove. 𝟮. 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 It takes raw articles, posts and code data points from the data warehouse, processes them, and loads them into a logical feature store. Let's focus on the logical feature store. As with any RAG-based system, a vector database is one of the central pieces of the infrastructure. We directly use a vector database as a logical feature store. Unfortunately, the vector database doesn't offer the concept of a training dataset. To implement this, we will wrap the retrieved data into a versioned, tracked, and shareable MLOps artifact. To conclude: - the training pipeline will use the instruct datasets as artifacts (offline) - the inference pipeline will query the vector DB for RAG (online) 𝟯. 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 It consumes instruct datasets from the feature store, fine-tunes an LLM with it, and stores the tuned LLM weights in a model registry. More concretely, when a new instruct dataset is available in the logical feature store, we will trigger the training pipeline, consume the artifact, and fine-tune the LLM. We run multiple experiments to find the best model and hyperparameters. We will use an experiment tracker to compare and select the best hyperparameters. After the experimentation phase, we store and reuse the best hyperparameters for continuous training (CT). The LLM candidate's testing pipeline is triggered for a detailed analysis. If it passes, the model is tagged as accepted and deployed to production. Our modular design lets us leverage an ML orchestrator to schedule and trigger the pipelines for CT. 𝟰. 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 It is connected to the model registry and logical feature store. From the model registry, it loads a fine-tuned LLM, and from the logical feature store, it accesses the vector DB for RAG. It receives client requests as queries through a REST API. It uses the fine-tuned LLM and vector DB to do RAG to answer the queries. Everything is sent to a prompt monitoring system to analyze, debug, and understand the system. #artificialintelligence #machinelearning #mlops

  • View profile for Akshet Patel 🤖

    Robotics Engineer | Creator

    45,427 followers

    1. Scan 2. Demo 3. Track 4. Render 5. Train models 6. Deploy What if robots could learn new tasks from just a smartphone scan and a single human demonstration, without needing physical robots or complex simulations? [⚡Join 2400+ Robotics enthusiasts - https://lnkd.in/dYxB9iCh] A paper by Justin Yu, Letian (Max) Fu, Huang Huang, Karim El-Refai, Rares Andrei Ambrus, Richard Cheng, Muhammad Zubair Irshad, and Ken Goldberg from the University of California, Berkeley and Toyota Research Institute Introduces a scalable approach for generating robot training data without dynamics simulation or robot hardware. "Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware" • Utilises a smartphone-captured object scan and a single human demonstration video as inputs • Reconstructs detailed 3D object geometry and tracks 6-DoF object motion using 3D Gaussian Splatting • Synthesises thousands of high-fidelity, robot-agnostic demonstrations through photorealistic rendering and inverse kinematics • Generates data compatible with vision-language-action models and imitation learning policies • Demonstrates that models trained on this data can match the performance of those trained on 150 human teleoperation demonstrations • Achieves a 27× increase in data generation throughput compared to traditional methods This approach enables scalable robot learning by decoupling data generation from physical robot constraints. It opens avenues for democratising robot training data collection, allowing broader participation using accessible tools. If robots can be trained effectively without physical hardware or simulations, how will this transform the future of robotics? Paper: https://lnkd.in/emjzKAyW Project Page: https://lnkd.in/evV6UkxF #RobotLearning #DataGeneration #ImitationLearning #RoboticsResearch #ICRA2025

  • View profile for Mary Newhauser

    Machine Learning Engineer

    24,649 followers

    Fine-tuning generative LLMs used to be a drag. Then it got efficient. Parameter Efficient Fine-tuning (PEFT) has played a huge role in this. PEFT techniques allow you to train models for your specific needs without updating or retraining the entire model from scratch. So what are the main benefits of PEFT? ✅ Less computational costs: fine-tuning PEFT approaches can make training models easier for a wide variety of hardware limitations and requirements. ✅ Better memory usage because all pretrained parameters frozen, and only new adaptor weights are trained. ✅ Avoids catastrophic forgetting: Freezing pretrained parameters means that knowledge is not forgotten when updating new weights during fine-tuning. Here are 3 examples of common PEFT techniques: • LoRA (Low-Rank Adaptation): Freezes the original model weights and injects low-rank trainable matrices into each layer of the Transformer architecture. • Prefix Tuning: Optimizes a sequence of continuous task-specific vectors (the "prefix") prepended to the input, while keeping the language model parameters frozen. • Prompt Tuning: Similar to prefix tuning, but optimizes continuous prompts added to the input text to guide the model towards the desired task. Hugging Face's 🤗 PEFT library is the best open source resource out there for all things PEFT! Their developer guide on the docs page also has a great overview of various techniques. 👨💻 Repo: https://lnkd.in/gcEtJGVu 📓 Developer guide: https://lnkd.in/gq2NZk4B

  • View profile for Adithya Murali

    Senior Research Scientist at NVIDIA | MIT TR35, Prev CMU PhD, Berkeley AI Research

    3,008 followers

    I’m super excited to release a multi-year project we have been cooking at NVIDIA Robotics. Grasping is a foundational challenge in robotics 🤖 — whether for industrial picking or general-purpose humanoids. VLA + real data collection is all the rage now but is expensive and scales poorly for this task. For every new embodiment and/or scene, we'll have to recollect the dataset in this paradigm for the best perf. Key Idea: Since grasping is a well-defined task in physics simulation - why can’t we just scale synthetic data generation and train a GenAI model for grasping? By embracing modularity and standardized grasp formats, we can make this a turnkey technology that works zero-shot for multiple settings. Introducing… 🚀 GraspGen: A Diffusion-Based Framework for 6-DOF Grasping GraspGen is a modular framework for diffusion-based 6-DOF grasp generation that scales across embodiment types, observability conditions, clutter, task complexity. Key Features: ✅ Multi-embodiment support: suction, antipodal pinch, and underactuated pinch grippers ✅ Generalization to both partial and complete 3D point clouds ✅ Generalization to both single-objects and cluttered scenes  ✅ Modular design relies on other robotics packages and foundation models (SAM2, cuRobo, FoundationStereo, FoundationPose). This allows GraspGen to focus on only one thing - grasp generation ✅ Training recipe: grasp discriminator is trained with On-Generator data from the diffusion model - so that it learns to correct any mistakes of the diffusion generator ✅ Real-time performance (~20 Hz) before any GPU acceleration; low memory footprint 📊 Results: • SOTA on the FetchBench [Han et. al. CoRL 2024] benchmark • Zero-shot sim-to-real transfer on unknown objects and cluttered scenes • Dataset of 53M simulated grasps across 8K objects from Objaverse We're also releasing: 🔹 Simulation-based grasp data generation workflows 🔹 Standardized formats and gripper definitions 🔹 Full training infrastructure 📄 arXiv: https://lnkd.in/gaYmcfz4 🌐 Website: https://lnkd.in/gGiKRCMX 💻 Code: https://lnkd.in/gYR77bEh A huge thank you to everyone involved in this journey — excited to hear the feedback from the community! Joint work with Clemens Eppner, Balakumar Sundaralingam, Yu-Wei Chao, Mark T. Carlson, Jun Yamada and other collaborators. Many thanks to Yichao Pan, Shri Sundaram, Spencer Huang, Buck Babich, Amit Goel for product management and feedback. #robotics #grasping #physicalAI #simtoreal

Explore categories