Improving Predictive Accuracy

Explore top LinkedIn content from expert professionals.

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    598,968 followers

    If you’re an AI engineer trying to understand and build with GenAI, RAG (Retrieval-Augmented Generation) is one of the most essential components to master. It’s the backbone of any LLM system that needs fresh, accurate, and context-aware outputs. Let’s break down how RAG works, step by step, from an engineering lens, not a hype one: 🧠 How RAG Works (Under the Hood) 1. Embed your knowledge base → Start with unstructured sources - docs, PDFs, internal wikis, etc. → Convert them into semantic vector representations using embedding models (e.g., OpenAI, Cohere, or HuggingFace models) → Output: N-dimensional vectors that preserve meaning across contexts 2. Store in a vector database → Use a vector store like Pinecone, Weaviate, or FAISS → Index embeddings to enable fast similarity search (cosine, dot-product, etc.) 3. Query comes in - embed that too → The user prompt is embedded using the same embedding model → Perform a top-k nearest neighbor search to fetch the most relevant document chunks 4. Context injection → Combine retrieved chunks with the user query → Format this into a structured prompt for the generation model (e.g., Mistral, Claude, Llama) 5. Generate the final output → LLM uses both the query and retrieved context to generate a grounded, context-rich response → Minimizes hallucinations and improves factuality at inference time 📚 What changes with RAG? Without RAG: 🧠 “I don’t have data on that.” With RAG: 🤖 “Based on [retrieved source], here’s what’s currently known…” Same model, drastically improved quality. 🔍 Why this matters You need RAG when: → Your data changes daily (support tickets, news, policies) → You can’t afford hallucinations (legal, finance, compliance) → You want your LLMs to access your private knowledge base without retraining It’s the most flexible, production-grade approach to bridge static models with dynamic information. 🛠️ Arvind and I are kicking off a hands-on workshop on RAG This first session is designed for beginner to intermediate practitioners who want to move beyond theory and actually build. Here’s what you’ll learn: → How RAG enhances LLMs with real-time, contextual data → Core concepts: vector DBs, indexing, reranking, fusion → Build a working RAG pipeline using LangChain + Pinecone → Explore no-code/low-code setups and real-world use cases If you're serious about building with LLMs, this is where you start. 📅 Save your seat and join us live: https://lnkd.in/gS_B7_7d

  • View profile for Leon Chlon, PhD

    Founder & CEO @ Reliably AI, the pre‑generation, training‑free trust infra between users & frontier APIs. Build today with confidence.

    37,032 followers

    LLM hallucinations aren't bugs, they're compression artefacts. And we just figured out how to predict them before they happen. 400 stars in one week, the reception has been unreal. Our toolkit is open source and anyone can use it. https://lnkd.in/e4s3X8GK When your LLM confidently states that "Napoleon won the Battle of Waterloo," it's not broken. It's doing exactly what it was trained to do: compress the entire internet into model weights, then decompress on demand. Sometimes, there isn't enough information to perfectly reconstruct rare facts, so it fills gaps with statistically plausible but wrong content. Think of it like a ZIP file corrupted during compression. The decompression algorithm still runs, but outputs garbage where data was lost. The breakthrough: We proved hallucinations occur when information budgets fall below mathematical thresholds. Using our Expectation-level Decompression Law (EDFL), we can calculate exactly how many bits of information are needed to prevent any specific hallucination, before generation even starts. This resolves a fundamental paradox: LLMs achieve near-perfect Bayesian performance on average, yet systematically fail on specific inputs. We proved they're "Bayesian in expectation, not in realisation", optimising average-case compression rather than worst-case reliability. Why this changes everything? Instead of treating hallucinations as inevitable, we can now: Calculate risk scores before generating any text Set guaranteed error bounds (e.g. 95%) Know precisely when to gather more context vs. abstain The full preprint is being released on arXiv this week. Until then, read the preprint PDF we uploaded here: https://lnkd.in/eRf_ecu3 The toolkit works with any OpenAI-compatible API. Zero retraining required. Provides mathematical SLA guarantees for compliance. Perfect for healthcare, finance, legal, anywhere errors aren't acceptable. The era of "trust me, bro" AI is ending. Welcome to bounded, predictable AI reliability. Big thanks to Ahmed K. Maggie C. for all the help putting this + the repo together! #AI #MachineLearning #ResponsibleAI #OpenSource #LLM #Innovation

  • View profile for Sahar Mor

    I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

    40,978 followers

    In the last three months alone, over ten papers outlining novel prompting techniques were published, boosting LLMs’ performance by a substantial margin. Two weeks ago, a groundbreaking paper from Microsoft demonstrated how a well-prompted GPT-4 outperforms Google’s Med-PaLM 2, a specialized medical model, solely through sophisticated prompting techniques. Yet, while our X and LinkedIn feeds buzz with ‘secret prompting tips’, a definitive, research-backed guide aggregating these advanced prompting strategies is hard to come by. This gap prevents LLM developers and everyday users from harnessing these novel frameworks to enhance performance and achieve more accurate results. https://lnkd.in/g7_6eP6y In this AI Tidbits Deep Dive, I outline six of the best and recent prompting methods: (1) EmotionPrompt - inspired by human psychology, this method utilizes emotional stimuli in prompts to gain performance enhancements (2) Optimization by PROmpting (OPRO) - a DeepMind innovation that refines prompts automatically, surpassing human-crafted ones. This paper discovered the “Take a deep breath” instruction that improved LLMs’ performance by 9%. (3) Chain-of-Verification (CoVe) - Meta's novel four-step prompting process that drastically reduces hallucinations and improves factual accuracy (4) System 2 Attention (S2A) - also from Meta, a prompting method that filters out irrelevant details prior to querying the LLM (5) Step-Back Prompting - encouraging LLMs to abstract queries for enhanced reasoning (6) Rephrase and Respond (RaR) - UCLA's method that lets LLMs rephrase queries for better comprehension and response accuracy Understanding the spectrum of available prompting strategies and how to apply them in your app can mean the difference between a production-ready app and a nascent project with untapped potential. Full blog post https://lnkd.in/g7_6eP6y

  • View profile for Pascal Biese

    AI Lead at PwC </> Daily AI highlights for 80k+ experts 📲🤗

    83,570 followers

    LLMs tend to fail catastrophically after a few hundred steps. A new approach just solved a task with over 1 million steps perfectly - according to the authors. Even state-of-the-art reasoning models inevitably make errors when chaining their capabilities into extended processes. Recent experiments showed that after a few hundred steps, the process becomes derailed. This fundamental reliability problem has been blocking LLMs from executing the kind of large-scale tasks that organizations and societies routinely perform. Researchers at Cognizant AI Lab and UT Austin just demonstrated MAKER, a system that successfully completed over one million LLM steps with zero errors. Their approach might sound counterintuitive: instead of relying on increasingly intelligent base models, they achieve reliability through extreme decomposition and error correction. MAKER breaks tasks into minimal subtasks - each handled by a focused microagent - and applies multi-agent voting at each step. This modular approach enables effective error correction that scales logarithmically with task length. The team provides formal scaling laws showing why this works: under maximal decomposition, the system scales log-linearly (Θ(s log s)) rather than exponentially failing like traditional approaches. Relatively small non-reasoning models like GPT-4.1-mini prove more cost-effective than advanced reasoning models for this architecture. When each agent focuses on a single tiny step, raw reasoning power matters less than reliability and cost efficiency. This might open an alternative path to AI scaling beyond building ever-larger models. By decomposing intelligence into millions of coordinated pieces, we might build systems that are not just more capable, but fundamentally more reliable and controllable. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡

  • View profile for Armand Ruiz
    Armand Ruiz Armand Ruiz is an Influencer

    building AI systems

    202,543 followers

    Explaining the Evaluation method LLM-as-a-Judge (LLMaaJ). Token-based metrics like BLEU or ROUGE are still useful for structured tasks like translation or summarization. But for open-ended answers, RAG copilots, or complex enterprise prompts, they often miss the bigger picture. That’s where LLMaaJ changes the game. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗶𝘁? You use a powerful LLM as an evaluator, not a generator. It’s given: - The original question - The generated answer - And the retrieved context or gold answer 𝗧𝗵𝗲𝗻 𝗶𝘁 𝗮𝘀𝘀𝗲𝘀𝘀𝗲𝘀: ✅ Faithfulness to the source ✅ Factual accuracy ✅ Semantic alignment—even if phrased differently 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀: LLMaaJ captures what traditional metrics can’t. It understands paraphrasing. It flags hallucinations. It mirrors human judgment, which is critical when deploying GenAI systems in the enterprise. 𝗖𝗼𝗺𝗺𝗼𝗻 𝗟𝗟𝗠𝗮𝗮𝗝-𝗯𝗮𝘀𝗲𝗱 𝗺𝗲𝘁𝗿𝗶𝗰𝘀: - Answer correctness - Answer faithfulness - Coherence, tone, and even reasoning quality 📌 If you’re building enterprise-grade copilots or RAG workflows, LLMaaJ is how you scale QA beyond manual reviews. To put LLMaaJ into practice, check out EvalAssist; a new tool from IBM Research. It offers a web-based UI to streamline LLM evaluations: - Refine your criteria iteratively using Unitxt - Generate structured evaluations - Export as Jupyter notebooks to scale effortlessly A powerful way to bring LLM-as-a-Judge into your QA stack. - Get Started guide: https://lnkd.in/g4QP3-Ue - Demo Site: https://lnkd.in/gUSrV65s - Github Repo: https://lnkd.in/gPVEQRtv - Whitepapers: https://lnkd.in/gnHi6SeW

  • View profile for Pavan Belagatti
    Pavan Belagatti Pavan Belagatti is an Influencer

    AI Evangelist | Developer Advocate | Tech Content Creator

    95,717 followers

    Throw out the old #RAG approaches; use Corrective RAG instead! Corrective RAG introduces the additional layer of checking and correcting retrieved documents, ensuring more accurate and relevant information before generating a final response. This approach enhances the reliability of the generated answers by refining or correcting the retrieved context dynamically. The key idea here is to retrieve document chunks from the vector database as usual and then use an LLM to check if each retrieved document chunk is relevant to the input question. The process roughly goes as below, ⮕ Step 1: Retrieve context documents from vector database from the input query. ⮕ Step 2: Use an LLM to check if retrieved documents are relevant to the input question. ⮕ Step 3: If all documents are relevant (Correct), no specific action is needed. ⮕ Step 4: If some or all documents are not relevant (Ambiguous or Incorrect), rephrase the query and search the web to get relevant context information. ⮕ Step 5: Send rephrased query and context documents or information to the LLM for response generation. I have made a complete video on corrective RAG using LangGraph: https://lnkd.in/gKaEjEvk Know more in-depth about corrective RAG in this paper: https://lnkd.in/g8FkrMzS

  • View profile for Damien Benveniste, PhD
    Damien Benveniste, PhD Damien Benveniste, PhD is an Influencer

    Founder @ TheAiEdge | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field.

    173,014 followers

    Most people do not look beyond the basic RAG pipeline, and it rarely works out as expected! RAG is known to lack robustness due to the LLM weaknesses, but it doesn't mean we cannot build robust pipelines! Here is how we can improve them. The RAG pipeline, in its simplest form, is composed of a retriever and a generator. The user question is used to retrieve the database data that could be used as context to answer the question better. The retrieved data is used as context in a prompt for an LLM to answer the question. Instead of using the original user question as a query to the database, it is typical to rewrite the question for optimized retrieval. Instead of blindly returning the answer to the user, we better assess the generated answer. That is the idea behind Self-RAG. We can check for hallucinations and relevance to the question. If the model hallucinates, we are going to try again the generation, and if the answer doesn't address the question, we are going to restart the retrieval by rewriting the query. If the answer passes the validation, we can return it to the user. It might be better to provide feedback for the new retrieval and the new generation to be performed in a more educated manner. In the case we have too many iterations, we are going to assume that we just reach a state where the model will apologize for not being able to provide an answer to the question. When we are retrieving the documents, we are likely retrieving irrelevant documents, so it could be a good idea to filter only the relevant ones before providing them to the generator. Once the documents are filtered, it is likely that a lot of the information contained in the documents is irrelevant, so it is also good to extract only what could be useful to answer the question from the documents. This way, the generator will only see relevant information to answer the question. The assumption in typical RAG is that the question will be about the data stored in the database, but this is a very rigid assumption. We can use the idea behind Adaptive-RAG, where we are going to assess the question first and route to a datastore RAG, a websearch or a simple LLM. It is possible that we realize that none of the documents are actually relevant to the question, and we better reroute the question back to the web search. That is part of the idea behind Corrective RAG. If we reach the maximum of web search retries, we can give up and apologize to the user. Here is how I implemented this pipeline with LangGraph: https://lnkd.in/g8AAF7Fw

  • View profile for Sarthak Rastogi
    Sarthak Rastogi Sarthak Rastogi is an Influencer

    AI engineer | Posts on agents + advanced RAG | Experienced in LLM research, ML engineering, Software Engineering

    22,062 followers

    5 steps that Amazon Finance took to improve their RAG pipeline's accuracy from 49% to 86% 📈 -- - They started by fixing document chunking problems. They saw that the original fixed-size chunks were causing inaccuracies because they didn’t capture complete context. By using the QUILL Editor, they turned unstructured text into HTML, and then identified logical structures based on HTML tags. Just chunking the docs differently raised the accuracy from 49% to 64%. 😦 - Next, prompt engineering. They aimed to: 1. stop hallucinations when there wasn’t relevant context, 2. support both concise and detailed answers, and 3. give citations. They also worked on implementing chain-of-thought reasoning to improve how the LLM structured its answers. This got the accuracy to 76%. - Finally they optimised their embedding models. They tested different first-party and third-party models and found that models like bge-base-en-v1.5 offered better performance on their dataset. Ultimately, they settled on Amazon Titan Embeddings G1. Better retrieval finally got them a better accuracy of 86%. Their targeted improvements in the RAG pipeline and they all added up. Link to the article from Amazon Web Services (AWS): https://lnkd.in/gFDBfhJm #AI #LLMs #GenAI

  • View profile for Aurimas Griciūnas
    Aurimas Griciūnas Aurimas Griciūnas is an Influencer

    Founder @ SwirlAI • UpSkilling the Next Generation of AI Talent • Author of SwirlAI Newsletter • Public Speaker

    173,390 followers

    What impact does 𝗽𝗿𝗼𝗺𝗽𝘁 𝗳𝗼𝗿𝗺𝗮𝘁𝘁𝗶𝗻𝗴 have on your 𝗟𝗟𝗠 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲? There is interesting debate happening in the community around the impact of both input and output formatting on performance of your LLM applications. In general, we are converging to the conclusion that both matter and should be part of your Prompt Engineering strategy. Recently there was a paper released that specifically evaluates the impact of input formatting. Key takeaways I am bringing from the paper and 𝗔𝗜 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝘀 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗔𝗜 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 𝘀𝗵𝗼𝘂𝗹𝗱 𝗰𝗼𝗻𝘀𝗶𝗱𝗲𝗿 𝗮𝘀 𝘄𝗲𝗹𝗹: ➡️ Testing different variations of prompt formatting even with the same instructions should be considered in your prompt engineering process. Consider: 👉 Plain text 👉 Markdown 👉 YAML 👉 JSON 👉 XML ❗️The difference in performance driven by prompt formatting can be as much as 40%! It is clearly worth experimenting with. ➡️ Format efficiency of your prompts is likely not consistent between LLMs even in the same family (e.g. GPT). ❗️ You should reevaluate your application performance if switching underlying models. ➡️ Evaluating and keeping track of your LLM application parameters is critical if you want to bring your applications to production. ✅ In general, I consider it to be good news as we have more untapped space to improve our application performance. ℹ️ As models keep improving, we should see reduced numbers in formatting impact on results variability. Read the full paper here: https://lnkd.in/d-AD-Ptq Kudos to the authors! Looking forward to following research. #AI #LLM #MachineLearning

  • View profile for Ross Dawson
    Ross Dawson Ross Dawson is an Influencer

    Futurist | Board advisor | Global keynote speaker | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice | Founder: AHT Group - Informivity - Bondi Innovation

    34,041 followers

    LLMs struggle with rationality in complex game theory situations, which are very common in the real world. However integrating structured game theory workflows into LLMs enables them to compute and execute optimal strategies such as Nash Equilibria. This will be vital for bringing AI into real-world situations, especially with the rise of agentic AI. The paper "Game-theoretic LLM: Agent Workflow for Negotiation Games" (link in comments) examines the performance of LLMs in strategic games and how to improve them. Highlights from the paper: 💡 Strategic Limitations of LLMs in Game Theory: LLMs struggle with rationality in complex game scenarios, particularly as game complexity increases. Despite their ability to process large amounts of data, LLMs often deviate from Nash Equilibria in games with larger payoff matrices or sequential decision trees. This limitation suggests a need for structured guidance to improve their strategic reasoning capabilities. 🔄 Workflow-Driven Rationality Improvements: Integrating game-theoretic workflows significantly enhances the performance of LLMs in strategic games. By guiding decision-making with principles like Nash Equilibria, Pareto optimality, and backward induction, LLMs showed improved ability to identify optimal strategies and robust rationality even in negotiation scenarios. 🤝 Negotiation as a Double-Edged Sword: Negotiations improved outcomes in coordination games but sometimes led LLMs away from Nash Equilibria in scenarios where these equilibria were not Pareto optimal. This reflects a tendency for LLMs to prioritize fairness or trust over strict game-theoretic rationality when engaging in dialogue with other agents. 🌐 Challenges with Incomplete Information: In incomplete-information games, LLMs demonstrated difficulty handling private valuations and uncertainty. Novel workflows incorporating Bayesian belief updating allowed agents to reason under uncertainty and propose envy-free, Pareto-optimal allocations. However, these scenarios highlighted the need for more nuanced algorithms to account for real-world negotiation dynamics. 📊 Model Variance in Performance: Different LLM models displayed varying levels of rationality and susceptibility to negotiation-induced deviations. For instance, model o1 consistently adhered more closely to Nash Equilibria compared to others, underscoring the importance of model-specific optimization for strategic tasks. 🚀 Practical Implications: The findings suggest LLMs can be optimized for strategic applications like automated negotiation, economic modeling, and collaborative problem-solving. However, careful design of workflows and prompts is essential to mitigate their inherent biases and enhance their utility in high-stakes, interactive environments.

Explore categories