Data Quality for AI

Explore top LinkedIn content from expert professionals.

  • View profile for Jim Fan
    Jim Fan Jim Fan is an Influencer

    NVIDIA Director of AI & Distinguished Scientist. Co-Lead of Project GR00T (Humanoid Robotics) & GEAR Lab. Stanford Ph.D. OpenAI's first intern. Solving Physical AGI, one motor at a time.

    223,316 followers

    Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data.  2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the *motion* of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro  -> RoboCasa produces N (varying visuals)  -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are creating tools to enable everyone in the ecosystem to scale up with us: - RoboCasa: our generative simulation framework (Yuke Zhu). It's fully open-source! Here you go: http://robocasa.ai - MimicGen: our generative action framework (Ajay Mandlekar). The code is open-source for robot arms, but we will have another version for humanoid and 5-finger hands: https://lnkd.in/gsRArQXy - We are building a state-of-the-art Apple Vision Pro -> humanoid robot "Avatar" stack. Xiaolong Wang group’s open-source libraries laid the foundation: https://lnkd.in/gUYye7yt - Watch Jensen's keynote yesterday. He cannot hide his excitement about Project GR00T and robot foundation models! https://lnkd.in/g3hZteCG Finally, GEAR lab is hiring! We want the best roboticists in the world to join us on this moon-landing mission to solve physical AGI: https://lnkd.in/gTancpNK

  • View profile for Pooja Jain
    Pooja Jain Pooja Jain is an Influencer

    Storyteller | Lead Data Engineer@Wavicle| Linkedin Top Voice 2025,2024 | Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP’2022

    183,177 followers

    You wouldn't cook a meal with rotten ingredients, right? Yet, businesses pump messy data into AI models daily— ..and wonder why their insights taste off. Without quality, even the most advanced systems churn unreliable insights. Let’s talk simple — how do we make sure our “ingredients” stay fresh? Start Smart → Know what matters: Identify your critical data (customer IDs, revenue, transactions) → Pick your battles: Monitor high-impact tables first, not everything at once Build the Guardrails: → Set clear rules: Is data arriving on time? Is anything missing? Are formats consistent? → Automate checks: Embed validations in your pipelines (Airflow, Prefect) to catch issues before they spread → Test in slices: Check daily or weekly chunks first—spot problems early, fix them fast Stay Alert (But Not Overwhelmed): → Tune your alarms: Too many false alerts = team burnout. Adjust thresholds to match real patterns → Build dashboards: Visual KPIs help everyone see what's healthy and what's breaking Fix It Right: → Dig into logs when things break—schema changes? Missing files? → Refresh everything downstream: Fix the source, then update dependent dashboards and reports → Validate your fix: Rerun checks, confirm KPIs improve before moving on Now, in the era of AI, data quality deserves even sharper focus. Models amplify what data feeds them — they can’t fix your bad ingredients. → Garbage in = hallucinations out. LLMs amplify bad data exponentially → Bias detection starts with clean, representative datasets → Automate quality checks using AI itself—anomaly detection, schema drift monitoring → Version your data like code: Track lineage, changes, and rollback when needed Here's the amazing step-by-step guide curated by DQOps - Piotr Czarnas to deep dive in the fundamentals of Data Quality. Clean data isn’t a process — it’s a discipline. 💬 What's your biggest data quality challenge right now?

  • View profile for Bob Carver

    CEO Cybersecurity Boardroom ™ | CISSP, CISM, M.S. Top Cybersecurity Voice

    51,191 followers

    Russian propaganda may be influencing certain answers from AI chatbots, including OpenAI’s ChatGPT and Meta’s Meta AI, according to a new report. NewsGuard, a company that develops rating systems for news and information websites, claims to have found evidence that a Moscow-based network named “Pravda” is publishing false claims to affect the responses of AI models. Pravda has flooded search results and web crawlers with pro-Russian falsehoods, publishing 3.6 million misleading articles in 2024 alone, per NewsGuard, citing statistics from the nonprofit American Sunlight Project. NewsGuard’s analysis, which probed 10 leading chatbots, found that the chatbots collectively repeated false Russian disinformation narratives, like that the U.S. operates secret bioweapons labs in Ukraine, 33% of the time. According to NewsGuard, the Pravda network’s effectiveness in infiltrating AI chatbot outputs can be largely attributed to its techniques, which involve search engine optimization strategies to boost the visibility of its content. This may prove to be an intractable problem for chatbots heavily reliant on web engines. #AI #Russia #Pravda #influenceoperations #datapoisoning

  • View profile for Daniel Gural

    Machine Learning and DevRel @ Voxel51

    6,268 followers

    🔥 Introducing Physical AI workbench, how Voxel51 and NVIDIA are solving the data pipeline bottleneck that’s blocking Physical AI at scale. 🔥 As autonomous vehicles, humanoid and industrial robots move from lab to deployment, teams need high-fidelity simulations to deploy with confidence. However, over 50% of Physical AI simulations fail because of bad input data, slowing teams and wasting millions in compute costs. ⚠️ The problem: Physical AI systems process petabytes of multimodal sensor data—LiDAR, radar, cameras, IMU. Even a small calibration or timing error between LiDAR, radar, and camera sensors can ripple downstream, resulting in inaccurate neural reconstructions and wasted compute dollars. ✅ The solution: Physical AI Workbench integrates with NVIDIA Omniverse NuRec and NVIDIA Cosmos, giving teams a standardized way to audit, enrich, and prepare multimodal data for simulation and neural reconstruction, ensuring every test starts with trusted data. How it works: 🔍 Catch and fix errors automatically by auditing sensor data across 75+ critical checkpoints 🔧 Transform raw sensor streams into structured, searchable data with AI data enrichment ⚡ Trigger neural reconstructions and generate synthetic scene variations 📈 Scale simulation workflows with complete traceability and speed Automated QA checks enable teams to catch data quality issues before wasting valuable compute resources, prevent downstream failures, and increase simulation ROI. Walking the floor at #NVIDIAGTC DC last week, the signal for Physical AI is stronger than ever. If you're building AVs NVIDIA DRIVE, humanoid robots NVIDIA Robotics, manufacturing automation—this is the data engine you need between your sensors and your simulations. 🔗 See how it works here: https://lnkd.in/e2zgyx7K

  • View profile for Golok Kumar Simli

    Visionary Tech Leader | President - Technology & Innovation | Expert in Digital Public Infrastructure, AI Strategy & eGovernance | Advancing Global Digital Transformation

    5,676 followers

    Govern Data for excellence in Governance and Business Objectives: Organisations be it private or public need to deploy a Data Governance framework to capture, process & store data aligned with People, Execution Model, Data Mangement Rules and Tools & Technologies. It is also important for all stakeholders to abide by the rules of engagements (compliances, regulations and law of the land say DPDP) be it Data Principals, Data Fiduciary to protect data in its possession or control including processing by itself or on its behalf a data processor and User of the Data. Leveraging AI would help achieve the above objectives and organisations may use AI in the following key areas - To leverage AI effectively in data governance, consider the following steps: 1. Data Categorisation- Use AI algorithms to automatically identify and classify data based on its sensitivity, applicability, importance, and regulatory requirements. This would help in prioritizing data protection efforts. 2. Data Quality Assessment - Veracity and Noise in the Data yields to catastrophic. Employ AI techniques to assess data quality by detecting anomalies/noises, inconsistencies, and errors. This helps in maintaining high-quality data for better decision-making and analysis. 3. Data Lineage Tracking - Implement AI-driven tools to track the lineage of data, including its origin, the journey, transformations, and usage throughout its lifecycle. This ensures data traceability and transparency. 4. Access Control and Authorization : Utilize AI-driven access control mechanisms to manage user permissions and enforce security policies based on data sensitivity and user roles for effective execution and adherence. 5. Regulatory Compliance - Leverage AI to automate compliance monitoring, orchestrating resources and reporting processes, ensuring adherence to regulations such as recently enabled DPDP and others like GDPR, HIPAA, and CCPA. 6. Data Stewardship - Implement AI-powered data stewardship platforms to facilitate collaboration among data stewards, automate data governance workflows, and resolve data-related issues efficiently for better insights & informed decisions. 7. Predictive Analytics - Use AI and machine learning models to analyze data trends, models and patterns, identify potential risks, and anticipate future data governance challenges. 8. Natural Language Processing (NLP) - Employ NLP techniques to analyze unstructured data such as documents, emails, and social media posts for insights. 9. Continuous Improvement - Continuous monitoring and refining of AI models and algorithms to adapt to evolving data governance requirements, business objectives and data landscape changes. By incorporating AI as above for data governance, organizations can enhance data management capabilities, ensure regulatory compliance, and derive actionable insights from their data assets for business and service excellence. #datagovernance #technologymanagement #innovation

  • View profile for José Manuel de la Chica
    José Manuel de la Chica José Manuel de la Chica is an Influencer

    Global Head of Santander AI Lab | Leading frontier AI with responsibility. Shaping the future with clarity and purpose.

    15,021 followers

    AI meet Consensus? A New Consensus Framework that Makes Models More Reliable and Collaborative. This paper addresses the challenge of ensuring the reliability of LLMs in high-stakes domains such as healthcare, law, and finance. Traditional methods often depend on external knowledge bases or human oversight, which can limit scalability. To overcome this, the author proposes a novel framework that repurposes ensemble methods for content validation through model consensus. Key Findings: Improved Precision: In tests involving 78 complex cases requiring factual accuracy and causal consistency, the framework increased precision from 73.1% to 93.9% with two models (95% CI: 83.5%-97.9%) and to 95.6% with three models (95% CI: 85.2%-98.8%). Inter-Model Agreement: Statistical analysis showed strong inter-model agreement (κ > 0.76), indicating that while models often concurred, their independent errors could be identified through disagreements. Scalability: The framework offers a clear pathway to further enhance precision with additional validators and refinements, suggesting its potential for scalable deployment. Relevance to Multi-Agent and Collaborative AI Architectures: This framework is particularly pertinent to multi-agent systems and collaborative AI architectures for several reasons: Enhanced Reliability: By leveraging consensus among multiple models, the system can achieve higher reliability, which is crucial in collaborative environments where decisions are based on aggregated outputs. Error Detection: The ability to detect errors through model disagreement allows for more robust systems where agents can cross-verify information, reducing the likelihood of propagating incorrect data. Scalability Without Human Oversight: The framework's design minimizes the need for human intervention, enabling scalable multi-agent systems capable of operating autonomously in complex, high-stakes domains. In summary, the proposed ensemble validation framework offers a promising approach to improving the reliability of LLMs, with significant implications for the development of dependable multi-agent AI systems. https://lnkd.in/d8is44jk

  • View profile for Kuldeep Singh Sidhu
    Kuldeep Singh Sidhu Kuldeep Singh Sidhu is an Influencer

    Senior Data Scientist @ Walmart | BITS Pilani

    13,285 followers

    Unlocking the Next Era of RAG System Evaluation: Insights from the Latest Comprehensive Survey Retrieval-Augmented Generation (RAG) has become a cornerstone for enhancing large language models (LLMs), especially when accuracy, timeliness, and factual grounding are critical. However, as RAG systems grow in complexity-integrating dense retrieval, multi-source knowledge, and advanced reasoning-the challenge of evaluating their true effectiveness has intensified. A recent survey from leading academic and industrial research organizations delivers the most exhaustive analysis yet of RAG evaluation in the LLM era. Here are the key technical takeaways: 1. Multi-Scale Evaluation Frameworks The survey dissects RAG evaluation into internal and external dimensions. Internal evaluation targets the core components-retrieval and generation-assessing not just their standalone performance but also their interactions. External evaluation addresses system-wide factors like safety, robustness, and efficiency, which are increasingly vital as RAG systems are deployed in real-world, high-stakes environments. 2. Technical Anatomy of RAG Systems Under the hood, a typical RAG pipeline is split into two main sections: - Retrieval: Involves document chunking, embedding generation, and sophisticated retrieval strategies (sparse, dense, hybrid, or graph-based). Preprocessing such as corpus construction and intent recognition is essential for optimizing retrieval relevance and comprehensiveness. - Generation: The LLM synthesizes retrieved knowledge, leveraging advanced prompt engineering and reasoning techniques to produce contextually faithful responses. Post-processing may include entity recognition or translation, depending on the use case. 3. Diverse and Evolving Evaluation Metrics The survey catalogues a wide array of metrics: - Traditional IR Metrics: Precision@K, Recall@K, F1, MRR, NDCG, MAP for retrieval quality. - NLG Metrics: Exact Match, ROUGE, BLEU, METEOR, BertScore, and Coverage for generation accuracy and semantic fidelity. - LLM-Based Metrics: Recent trends show a rise in LLM-as-judge approaches (e.g., RAGAS, Databricks Eval), semantic perplexity, key point recall, FactScore, and representation-based methods like GPTScore and ARES. These enable nuanced, context-aware evaluation that better aligns with real-world user expectations. 4. Safety, Robustness, and Efficiency The survey highlights specialized benchmarks and metrics for: - Safety: Evaluating robustness to adversarial attacks (e.g., knowledge poisoning, retrieval hijacking), factual consistency, privacy leakage, and fairness. - Efficiency: Measuring latency (time to first token, total response time), resource utilization, and cost-effectiveness-crucial for scalable deployment.

  • View profile for Nivedan Rathi
    Nivedan Rathi Nivedan Rathi is an Influencer

    Founder @Future & AI | 500k Subscribers | TEDx Speaker | IIT Bombay | AI Strategy & Training for Decision Makers in Top Companies | Building AI Agents for Sales, Marketing & Operations

    29,182 followers

    𝗕𝗲𝘀𝘁 𝗘𝘅𝗮𝗺𝗽𝗹𝗲 𝗼𝗳 𝗔𝗜'𝘀 𝗜𝗺𝗽𝗮𝗰𝘁 𝗶𝗻 𝗔𝗴𝗿𝗶𝗰𝘂𝗹𝘁𝘂𝗿𝗲: 𝗠𝗮𝗵𝗮𝗿𝗮𝘀𝗵𝘁𝗿𝗮 𝗙𝗮𝗿𝗺𝗲𝗿𝘀 𝗜𝗻𝗰𝗿𝗲𝗮𝘀𝗲𝗱 𝗬𝗶𝗲𝗹𝗱𝘀 𝗯𝘆 𝟮𝟬% People tend to focus only on the parts where technology brings misery, but we need to realise that technology is actually a gift. The Microsoft-AgriPilot.ai partnership in Maharashtra proves this point spectacularly. Their innovative "no-touch" approach using satellite imagery and AI analysis has achieved a 20% increase in crop yields for small-scale farmers. How exactly did AI drive this transformation? Well, their solution combines satellite imagery and drone data to create comprehensive farm assessments without setting foot on the land. Then, advanced AI algorithms analyse this data to generate customised recommendations for: · Precise soil nutrient management based on soil composition analysis. · Optimal irrigation scheduling using predictive moisture modelling. · Weather-based planting decisions from pattern recognition. · Early pest and disease detection through image analysis. 👉🏻 What makes this truly amazing? They delivered these insights in local languages like Marathi. This made advanced agricultural science easily accessible to farmers. And the results speak volumes: • Sugarcane grew THREE TIMES larger than conventional methods. • Successful cultivation of exotic crops like strawberries and dragon fruit. • Income increased by up to 10X for small-scale farmers. What sets this initiative apart is their deliberate focus on farmers with less than two acres of land – those who traditionally get left behind in technological revolutions. This exemplifies what I believe about the future of AI – it creates a golden era for all those people who have a compelling vision, care about solving real-world problems, and have the persistence to make things happen. Are we thinking boldly enough about how AI can transform traditional industries? Or are we just "doing the same things a little faster"?

  • View profile for Anurag(Anu) Karuparti

    Agentic AI Leader @Microsoft | Author - Generative AI for Cloud Solutions | LinkedIn Learning Instructor | Responsible AI Advisor | Ex-PwC, EY | Marathon Runner

    22,824 followers

    As we scale GenAI from demos to real-world deployment, one thing becomes clear: 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗱𝗮𝘁𝗮𝘀𝗲𝘁𝘀 𝗰𝗮𝗻 𝗺𝗮𝗸𝗲 𝗼𝗿 𝗯𝗿𝗲𝗮𝗸 𝗮 𝗚𝗲𝗻𝗔𝗜 𝘀𝘆𝘀𝘁𝗲𝗺. A model can be trained on massive amounts of data, but that doesn’t guarantee it understands context, nuance, or intent at inference time. You can teach a student all the textbook theory in the world. But unless you ask the right questions, in the right setting, under realistic pressure, you’ll never know what they truly grasp. This snapshot outlines the 6 dataset types that AI teams use to rigorously evaluate systems at every stage of maturity: The Evaluation Spectrum 1. 𝐐𝐮𝐚𝐥𝐢𝐟𝐢𝐞𝐝 𝐚𝐧𝐬𝐰𝐞𝐫𝐬 Meaning: Expert-reviewed responses Use: Measure answer quality (groundedness, coherence, etc.) Goal: High-quality, human-like responses 2. 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 Meaning: AI-generated questions and answers Use: Test scale and performance Goal: Maximize response accuracy, retrieval quality, and tool use precision 3. 𝐀𝐝𝐯𝐞𝐫𝐬𝐚𝐫𝐢𝐚𝐥 Meaning: Malicious or risky prompts (e.g., jailbreaks) Use: Ensure safety and resilience Goal: Avoid unsafe outputs 4. 𝐎𝐎𝐃 (𝐎𝐮𝐭 𝐨𝐟 𝐃𝐨𝐦𝐚𝐢𝐧) Meaning: Unusual or irrelevant topics Use: See how well the model handles unfamiliar territory Goal: Avoid giving irrelevant or misleading answers 5. 𝐓𝐡𝐮𝐦𝐛𝐬 𝐝𝐨𝐰𝐧 Meaning: Real examples where users rated answers poorly Use: Identify failure modes Goal: Internal review, error analysis 6. 𝐏𝐑𝐎𝐃 Meaning: Cleaned, real user queries from deployed systems Use: Evaluate live performance Goal: Ensure production response quality This layered approach is essential for building: • Trustworthy AI • Measurable safety • Meaningful user experience Most organizations still rely on "accuracy-only" testing. But GenAI in production demands multi-dimensional evaluation — spanning risk, relevance, and realism. If you’re deploying GenAI at scale, ask: Are you testing the right things with the right datasets? Let’s sharpen the tools we use to measure intelligence. Because better testing = better AI. 👇 Would love to hear how you’re designing your eval pipelines. #genai #evaluation #llmops #promptengineering #aiarchitecture #openai

  • View profile for Davide Giacobbe

    Automotive & EVs | Co-Founder @ Voltest

    5,305 followers

    Complete transparency on exterior damage. Total blindness on battery health. Another day, another dealer learns the hard way. Last week, a new dealer in Arizona joined the Voltest network. With every new customer, first thing we do is schedule an onboarding call with our team to make sure they have all the necessary elements to test their inventory and to clearly explain the value of our reports to their customers. During this first call, we invite them to run a test together on an EV they have in their lot. They told us "Let's try this on the 2019 Tesla Model 3 Long Range we bought at an auction a few days before. We noticed it has some problem with charging..." What you see in the picture is what came out after a 2-minute test with Voltest. Something we have started to see more frequently in the last few months. Significant cell imbalance: 150 mV difference between highest and lowest cell voltages. Cell voltage range from 3.726V to 3.876V caused by a single underperforming cell group. What does that mean? Essentially, the car is almost inoperable as it charges up to a very limited charge level. Even worse, at this mileage, this vehicle is past battery warranty coverage. As per the dealer's exact words: "We would never have bought this car if we had this information." Now they're facing a $15k+ battery replacement on a vehicle they thought was a solid acquisition. Two minutes of testing would have saved them from this mistake. Auction houses continue operating as if battery condition is somehow unknowable. They'll document every scratch on the bumper but won't tell you if the most expensive component is failing. This scenario doesn't happen often, but when it does, the financial impact is devastating. Smart dealers are starting to demand this data upfront. Because buying blind in the EV market is just financially reckless. The technology to prevent these situations exists today and is called Voltest. The question is: if you're in the car business, what are you waiting for? Used EVs will be more and more part of your day-to-day. And even if you don't prioritize EVs in your inventory, you might want to be informed before getting a bad trade-in. You have all the right tools to determine residual value on internal combustion engine vehicles. Why miss the same peace of mind on EVs? What are you doing today to protect your EV purchases?

Explore categories