Why Can’t AI Understand That a Border Goes Both Ways? In the world of #ArtificialIntelligence, we often marvel at language models that can write poems, answer questions, and even simulate philosophical debates. Yet, beneath this eloquence lies a surprising cognitive blind spot: the understanding of relational logic. A recent study by Moy Yuan and Andreas Vlachos at the University of Cambridge brings this to light with remarkable clarity. The researchers asked a simple question: Can large language models understand when a relationship is symmetric (e.g., “France borders Germany” implies “Germany borders France”) or antisymmetric (“Alice is the parent of Bob” does not imply the reverse)? Surprisingly, even the most advanced language models, including those fine-tuned on powerful #inference datasets like #MNLI, performed no better than chance. This highlights a critical weakness in today’s #AI systems: their struggle to grasp the basic structure of the world they describe. But the study does more than reveal a problem. The authors introduce a symmetry-aware retraining approach based on a dataset they constructed from #Wikidata, a rich source of factual and relational knowledge. Instead of just fine-tuning classification layers (as is common), they retrained the core encoder itself using contrastive learning and k-nearest neighbors. This allowed the model to internalize relational structures, rather than relying on shortcuts or memorization. The results are striking: • 100% accuracy on both lexicalized and delexicalized relational tasks • Minimal training data required to generalize • Significant resistance to catastrophic forgetting (where models lose prior knowledge after retraining) What does this mean? Real intelligence involves knowing not just what words say, but what they imply, how they relate, and what structure they reveal. Perhaps, in this journey to teach machines how to reason, we’re learning something about the structure of our own understanding too. You can explore the dataset and methodology here: https://lnkd.in/eQeCTmWm #AI #NLP #MachineLearning #Symmetry #KnowledgeGraphs #Wikidata #Reasoning #ContrastiveLearning #LanguageModels #LLMs #CambridgeResearch
Addressing Rare Knowledge Gaps in AI Language Models
Explore top LinkedIn content from expert professionals.
Summary
Addressing rare knowledge gaps in AI language models means finding ways to help these systems understand concepts or relationships that are missing or poorly represented in their training data, such as unique facts, new ideas, or subtle logical connections. This effort focuses on improving how AI models reason, retrieve knowledge, and express information they “know” but don’t reliably show in their answers.
- Spot hidden knowledge: Encourage methods that help AI models reveal correct answers they already “know” internally but struggle to express in outputs.
- Retrain for relationships: Use specialized retraining techniques and relational datasets to help models grasp complex connections, like symmetry or context, that typical training might miss.
- Bridge new frontiers: Recognize that human creativity and new frameworks often go beyond what AI models have learned, so ongoing updates and custom model development can help close the gap between old data and emerging possibilities.
-
-
Many companies have started experimenting with simple RAG systems, probably as their first use case, to test the effectiveness of generative AI in extracting knowledge from unstructured data like PDFs, text files, and PowerPoint files. If you've used basic RAG architectures with tools like LlamaIndex or LangChain, you might have already encountered three key problems: 𝟭. 𝗜𝗻𝗮𝗱𝗲𝗾𝘂𝗮𝘁𝗲 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗠𝗲𝘁𝗿𝗶𝗰𝘀: Existing metrics fail to catch subtle errors like unsupported claims or hallucinations, making it hard to accurately assess and enhance system performance. 𝟮. 𝗗𝗶𝗳𝗳𝗶𝗰𝘂𝗹𝘁𝘆 𝗛𝗮𝗻𝗱𝗹𝗶𝗻𝗴 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀: Standard RAG methods often struggle to find and combine information from multiple sources effectively, leading to slower responses and less relevant results. 𝟯. 𝗦𝘁𝗿𝘂𝗴𝗴𝗹𝗶𝗻𝗴 𝘁𝗼 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗮𝗻𝗱 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻𝘀: Basic RAG approaches often miss the deeper relationships between information pieces, resulting in incomplete or inaccurate answers that don't fully meet user needs. In this post I will introduce three useful papers to address these gaps: 𝟭. 𝗥𝗔𝗚𝗖𝗵𝗲𝗸𝗲𝗿: introduces a new framework for evaluating RAG systems with a focus on fine-grained, claim-level metrics. It proposes a comprehensive set of metrics: claim-level precision, recall, and F1 score to measure the correctness and completeness of responses; claim recall and context precision to evaluate the effectiveness of the retriever; and faithfulness, noise sensitivity, hallucination rate, self-knowledge reliance, and context utilization to diagnose the generator's performance. Consider using these metrics to help identify errors, enhance accuracy, and reduce hallucinations in generated outputs. 𝟮. 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁𝗥𝗔𝗚: It uses a labeler and filter mechanism to identify and retain only the most relevant parts of retrieved information, reducing the need for repeated large language model calls. This iterative approach refines search queries efficiently, lowering latency and costs while maintaining high accuracy for complex, multi-hop questions. 𝟯. 𝗚𝗿𝗮𝗽𝗵𝗥𝗔𝗚: By leveraging structured data from knowledge graphs, GraphRAG methods enhance the retrieval process, capturing complex relationships and dependencies between entities that traditional text-based retrieval methods often miss. This approach enables the generation of more precise and context-aware content, making it particularly valuable for applications in domains that require a deep understanding of interconnected data, such as scientific research, legal documentation, and complex question answering. For example, in tasks such as query-focused summarization, GraphRAG demonstrates substantial gains by effectively leveraging graph structures to capture local and global relationships within documents. It's encouraging to see how quickly gaps are identified and improvements are made in the GenAI world.
-
Large Language Models face a critical challenge: how to enhance factual accuracy without sacrificing either inference speed or general capabilities. Current solutions fall short-RAG systems suffer from high latency and shallow integration, while fine-tuning methods like LoRA risk catastrophic forgetting. Researchers from Shanghai Jiao Tong University and Shanghai AI Laboratory Lab propose MLP Memory, a parametric memory module that learns retrieval patterns during pretraining without requiring explicit document access at inference time. How it works: The system trains a lightweight MLP network to mimic the behavior of k-nearest neighbor retrieval across an entire pretraining corpus. During training, the MLP learns to map hidden representations from a frozen language model to probability distributions that match what a kNN retriever would produce-essentially compressing 40TB of datastore information into a 4GB parametric module. The architecture uses stacked feed-forward layers without token-mixing operations, leveraging recent findings that FFN layers function as key-value memories within transformers. The training objective combines KL divergence loss to match retrieval distributions with cross-entropy loss to maintain grounding in actual next-token predictions. At inference, the MLP Memory processes hidden states from approximately 70% network depth (not the final layer, as conventional kNN-LM does) and interpolates its output with the base model's predictions through simple probability mixing. Performance gains: On question-answering benchmarks, MLP Memory achieves 12.3% relative improvement over base models, outperforming both RAG and continued pretraining. On HaluEval, it reduces hallucinations by up to 10 points. Critically, it delivers 2.5x faster time-to-first-token than RAG and maintains constant inference speed regardless of corpus size-a fundamental advantage over retrieval-based methods whose latency scales with datastore size. The approach demonstrates that learning retrieval patterns parametrically bridges the efficiency-effectiveness gap, offering a practical alternative that combines the knowledge access benefits of RAG with the speed of purely parametric methods.
-
Did you notice that LLMs sometimes hide their actual knowledge? Here rises an important question: Do LLMs really "know" more facts deep inside their systems than they actually "say" when answering questions? Researchers from Technion and Google Research defined what this "knowing" means and also explored it. They state that a model "knows" the answer to a question if it can rank the correct answer above wrong ones. They tested this in two ways: • External knowledge: what the model expresses through its output token-level probabilities. • Internal knowledge: what it "knows" based on deeper, hidden computations during processing. If a model scores better using internal signals than external, that means it has "hidden knowledge". ▪️ So what did the researchers find? LLMs often know more than they say, but getting them to "say" it can be surprisingly hard. 1. The models consistently know more internally than they show in their answers—on average, about 40% more. 2. In over half the questions, none of the 1,000 generated answers were right. 3. In 9% of questions, the model never said the right answer, even once, but its internal system clearly ranked the right answer as the best when it was added manually. This means that just answers from the model won’t always help—some correct answers are buried so deep they almost never come out. ▪️ Why are these findings important? • They are key to understanding how these models work inside. • If models secretly know more than they show, we might find better ways to help them express that knowledge and improve their answers. • From a safety angle, if hidden knowledge suddenly shows up in the wrong context, it could reveal sensitive info. Overall, understanding why, and how to fix this model's functioning, could lead to more accurate, transparent, and trustworthy AI systems.
-
At the edge of the intersection of AI capabilities and human intelligence lies an interesting frontier I fought with this weekend… In spending the weekend trying to get Claude to understand my PVP framework (permissionless value prop), to prompt it consistently, one thing emerged for me… It’s that these things default to existing paradigms hard. And, because new knowledge that could only be created because of AI has yet to proliferate the web—an intelligence gap has emerged. The gap that’s a bit meta and funny. Let me explain. 1. AI is trained on all the web 2. The web existed before AI 3. AI lets us do things never before possible. 4. You ask AI to do something BRAND new, something only possible because of its capabilities. 5. AI is not yet intrinsically “aware” of #3. It has no default understanding of a world embedded with AI capabilities. 6. AI has a hard time with 4, because it’s trained on 2, and 3 isn’t a big part of its knowledge yet. 7. AI sort of defaults back to what’s possible from most of the dataset acquired from 1. So there is a beautiful new knowledge frontier that the LLMs have a VERY hard time defining, getting their heads around, and it’s basicallly damn near impossible to get them to reliably produce this form of knowledge that has no precedent before it, that’s not written about, that’s not really talked about relative to their data set training. Maybe this is fixed with fine tuning or custom model development or a small language model that’s like only training on all the cool stuff AI can do, I don’t know. But at least for now, and at least with these models today (even the frontier ones), human taste making is still absolutely necessary to generate new ideas, new framework, and new possibilities that literally could never have come before. That gap is your value, in an AI first world. And, you, knowledge worker, are not yet obsolete. P.S. My new course got some progress, and some regression this weekend—so it will likely have to end with a note: “you, kind reader, must still think, must still evaluate, must still invent. For you are the large thinking model this course is designed to train.”