AI can now write your resume in 20 seconds. But can it get you the job? I’ve trained 2L+ professionals, and let me tell you that AI-generated resumes are everywhere. Perfect formatting. Strong action verbs. Zero typos. Everything looks flawless. But the major issue? They all start to look the same. AI can handle structure, but it misses the personal touch. The essence of who you are, the stories, the real results, the unique personality, gets lost in those polished but cookie-cutter templates. And guess who notices that? >>> Recruiters. To understand the impact, I recently asked 12 HR managers: “Would you trust a resume written entirely by AI?” ➡️ 7 said, “No.” ➡️ 5 said, “Only if it sounds real.” So, here’s my advice: Use AI to support your resume, but don’t let it replace you. Let AI help with structuring, bullet points, and fine-tuning the language. But don’t let it strip your resume of the real you. HR managers don’t just want a flawless resume—they want one that feels real. Have you tried using an AI-generated resume? Did it help or hurt your job search? Let me know.👇🏻 #AI #resume #interviewcoach #CV #resumetips
AI Limitations Overview
Explore top LinkedIn content from expert professionals.
-
-
You know all those arguments that LLMs think like humans? Turns out it's not true 😱 In our new paper we put this to the test by checking if LLMs form concepts the same way humans do. Do LLMs truly grasp concepts and meaning analogously to humans, or is their success primarily rooted in sophisticated statistical pattern matching over vast datasets? We used classic cognitive experiments as benchmarks. What we found is surprising... 🧐 We used seminal datasets from cognitive psychology that mapped how humans actually categorize things like "birds" or "furniture" ('robin' as a typical bird). The nice thing about these datasets is that they are not crowdsourced, they're rigorous scientific benchmarks. We tested 30+ LLMs (BERT, Llama, Gemma, Qwen, etc.) using an information-theoretic framework that measures the trade-off between: - Compression (how efficiently you organize info) - Meaning preservation (how much semantic detail you keep) Finding #1: The Good News LLMs DO form broad conceptual categories that align with humans significantly above chance. Surprisingly (or not?), smaller encoder models like BERT outperformed much larger models. Scale isn't everything! Finding #2: But LLMs struggle with fine-grained semantic distinctions. They can't capture "typicality" - like knowing a robin is a more typical bird than a penguin. Their internal concept structure doesn't match human intuitions about category membership. Finding #3: The Big Difference Here's the kicker: LLMs and humans optimize for completely different things. - LLMs: Aggressive statistical compression (minimize redundancy) - Humans: Adaptive richness (preserve flexibility and context) This explains why LLMs can be simultaneously impressive AND miss obvious human-like reasoning. They're not broken - they're just optimized for pattern matching rather than the rich, contextual understanding humans use. What this means: - Current scaling might not lead to human-like understanding - We need architectures that balance compression with semantic richness - The path to AGI ( 😅 ) might require rethinking optimization objectives Our paper gives tools to measure this compression-meaning trade-off. This could guide future AI development toward more human-aligned conceptual representations. Cool to see cognitive psychology and AI research coming together! Thanks to Chen Shani, Ph.D., who did all the work and Yann LeCun and Dan Jurafsky for their guidance
-
🚨 BIG! Apple's latest paper argues that Large Reasoning Models have significant limitations and COLLAPSE at high complexity. Has AI hit a wall? Was it AI hype all the time? 😱 What the paper says: "Our findings reveal fundamental limitations in current models: despite sophisticated self-reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds. We identified three distinct reasoning regimes: standard LLMs outperform LRMs at low complexity, LRMs excel at moderate complexity, and both collapse at high complexity. Particularly concerning is the counterintuitive reduction in reasoning effort as problems approach critical complexity, suggesting an inherent compute scaling limit in LRMs. Our detailed analysis of reasoning traces further exposed complexity-dependent reasoning patterns, from inefficient “overthinking” on simpler problems to complete failure on complex ones. These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning. Finally, we presented some surprising results on LRMs that lead to several open questions for future work. Most notably, we observed their limitations in performing exact computation; for example, when we provided the solution algorithm for the Tower of Hanoi to the models, their performance on this puzzle did not improve. Moreover, investigating the first failure move of the models revealed surprising behaviors. For instance, they could perform up to 100 correct moves in the Tower of Hanoi but fail to provide more than 5 correct moves in the River Crossing puzzle. We believe our results can pave the way for future investigations into the reasoning capabilities of these systems" - 👉 Read the paper below. 👉 Never miss my updates and recommended papers: join my newsletter's 63,400+ subscribers.
-
I’ve never had two editorials in top-tier journals focused on the same paper. It’s flattering, of course — even a pig blushes when praised, as Yattaman once wrote. But what really struck me about the PNAS commentary is something else: it got the point. Not the easy one. Not the marketable one. Not the usual talk about “how good” or “how human” or “how enabling” these systems are. The real question is more uncomfortable: if the promise is delegation, how do LLMs actually construct a judgment? Our work is not about accuracy. It is about architecture. Not what they answer, but how answers are produced. And when you look at that closely, the illusion of equivalence collapses. Humans and models may produce similar sentences, similar ratings, similar decisions — but they do not get there through the same cognitive structure. And this is not a technical detail. It’s an ontological fracture. Human judgment is embodied. It emerges from experience, memory, emotion, context, intention. It is grounded in a life. LLM “judgment” is not. It has no experience, no time, no world. It operates on statistical regularities among symbols, not on events in reality. And yet — this is the trap — the outputs can look the same. When radically different processes generate indistinguishable language, the problem is no longer technological. It becomes epistemological. This is what we call Epistemia: the moment when linguistic plausibility starts replacing verification, and the form of knowledge substitutes for the labor of knowing. Not because models lie — but because they simulate judgment so well that we forget what judgment actually is. https://lnkd.in/dcu7cuZm
-
LLMs struggle with rationality in complex game theory situations, which are very common in the real world. However integrating structured game theory workflows into LLMs enables them to compute and execute optimal strategies such as Nash Equilibria. This will be vital for bringing AI into real-world situations, especially with the rise of agentic AI. The paper "Game-theoretic LLM: Agent Workflow for Negotiation Games" (link in comments) examines the performance of LLMs in strategic games and how to improve them. Highlights from the paper: 💡 Strategic Limitations of LLMs in Game Theory: LLMs struggle with rationality in complex game scenarios, particularly as game complexity increases. Despite their ability to process large amounts of data, LLMs often deviate from Nash Equilibria in games with larger payoff matrices or sequential decision trees. This limitation suggests a need for structured guidance to improve their strategic reasoning capabilities. 🔄 Workflow-Driven Rationality Improvements: Integrating game-theoretic workflows significantly enhances the performance of LLMs in strategic games. By guiding decision-making with principles like Nash Equilibria, Pareto optimality, and backward induction, LLMs showed improved ability to identify optimal strategies and robust rationality even in negotiation scenarios. 🤝 Negotiation as a Double-Edged Sword: Negotiations improved outcomes in coordination games but sometimes led LLMs away from Nash Equilibria in scenarios where these equilibria were not Pareto optimal. This reflects a tendency for LLMs to prioritize fairness or trust over strict game-theoretic rationality when engaging in dialogue with other agents. 🌐 Challenges with Incomplete Information: In incomplete-information games, LLMs demonstrated difficulty handling private valuations and uncertainty. Novel workflows incorporating Bayesian belief updating allowed agents to reason under uncertainty and propose envy-free, Pareto-optimal allocations. However, these scenarios highlighted the need for more nuanced algorithms to account for real-world negotiation dynamics. 📊 Model Variance in Performance: Different LLM models displayed varying levels of rationality and susceptibility to negotiation-induced deviations. For instance, model o1 consistently adhered more closely to Nash Equilibria compared to others, underscoring the importance of model-specific optimization for strategic tasks. 🚀 Practical Implications: The findings suggest LLMs can be optimized for strategic applications like automated negotiation, economic modeling, and collaborative problem-solving. However, careful design of workflows and prompts is essential to mitigate their inherent biases and enhance their utility in high-stakes, interactive environments.
-
How much do language models actually think? A recent paper from Apple, 'The Illusion of Thinking', explores this question by probing the limits of Large Reasoning Models (LRMs) such as Claude 3.7 Sonnet Thinking and DeepSeek-R1. These models aim to improve reasoning by generating long Chain-of-Thought (CoT) traces before producing an answer. Instead of relying on traditional math benchmarks, the authors designed controlled puzzle environments (like Tower of Hanoi and River Crossing), that allow them to systematically vary problem complexity and analyze model behavior step by step. Key takeaways from the paper: 🔹Three performance regimes: → At low complexity: non-thinking models often outperform LRMs in both accuracy and token efficiency. → At medium complexity: LRMs show benefits thanks to more elaborate reasoning traces. → At high complexity: both model types collapse (accuracy drops to zero). 🔹As problems grow more complex, models actually use fewer thinking tokens, despite having sufficient budget which highlights a possible inference-time scaling limitation. 🔹On simple tasks, models often reach the correct solution early but then continue generating incorrect or redundant reasoning. 🔹Even when the correct algorithm is provided in the prompt, models still fail at execution as complexity increases. The authors raise an important question: are today's LRMs truly engaging in reasoning or just producing more elaborate pattern completions? You can read the paper here: https://lnkd.in/dn3GTT66 The image used in the post is taken from the same paper. Curious to hear your take, especially if you work on reasoning, interpretability, or evaluation design. #technology #generativeai #artificialintelligence #llms #innovation
-
DON’T rely on AI to do your research… Large language models (LLMs) are often praised for their ability to process information and assist with problem-solving, but can they really reason like ourselves? The latest study by Apple researchers reveals significant limitations in their capacity for genuine mathematical reasoning - and raises important questions about their reliability in research contexts. What Apple Found: 1. Inconsistent results: LLMs struggle with variations of the same problem, even at a basic grade-school math level. This variability challenges the validity of current benchmarks like GSM8K, which rely on single-point accuracy metrics. 2. Fragility to complexity: As questions become slightly more challenging, performance drops drastically, exposing a fragile reasoning process. 3. Susceptibility to irrelevant information: When distracting but inconsequential details were included in problems, model performance plummeted by up to 65%. Even repeated exposure to similar questions or fine-tuning couldn’t fix this. 4. Pattern matching ≠ reasoning: The models often “solve” problems by sophisticated pattern matching, not genuine logical understanding. What this means for research: While LLMs are powerful tools for speeding up certain tasks, their inability to discern critical from irrelevant information, and their reliance on pattern recognition, makes them unreliable for rigorous, logic-based research. This is particularly true in fields like mathematics, engineering, and data-driven sciences, where accuracy and reasoning are non-negotiable. As exciting as these tools are, they’re not ready to replace human critical thinking (yet?). How do you see AI evolving in research applications? #research #chemicalengineering #scientist #engineering #professor PS. Full paper available on ArXiv under 2410.05229
-
Why I stopped using ChatGPT on LinkedIn (Confession) I ran an experiment over the last 8 months on using ChatGPT to improve LinkedIn #engagement, here were the results: My understanding of LinkedIn is that the platform wants to be your digital business community. As I've said in the past, unlike other SM platforms, LinkedIn rewards community building, and engaging with each other in a meaningful and productive way. So, I embarked on a journey to determine if using GPT to create meaningful replies to people would help. Six months ago, I hired someone to go through my feed daily, copy relevant posts into GPT, and then create meaningful, supportive comments. We tweaked the prompt to think and respond as closely as I would. Every day, the team spent an hour doing this manually, and we managed to engage with 10X more connections than otherwise. And guess what happened to my engagement rate? Nothing. In addition, as ChatGPT got dumber, the quality of the comments declined majorly and it just felt disingenuous. So we stopped altogether for the last two months, I engage less, but each engagement is real and personal. I know many connections in my feed are still using the GPT method for LinkedIn engagement, would love to hear your thoughts. Haley Deming Beth Kutscher Graham Walker, MD Spencer Dorn
-
The Illusion of Thinking in LLMs - Apple researchers have spilled the beans on the strengths and limitations of reasoning models. Reasoning models "collapse" beyond certain task complexities. "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity" highlights several limitations of Large Language Models (LLMs) and their specialized variants, Large Reasoning Models (LRMs), particularly in the context of reasoning and problem-solving. Below is a list of the key limitations of LLMs identified by Apple researchers: (1) Poor Performance on Reasoning Benchmarks: Earlier iterations of LLMs exhibited poor performance on reasoning benchmarks, indicating fundamental challenges in reasoning capabilities (Page 4, Section 2). (2) Lack of Generalizable Reasoning: Despite advancements, LLMs and LRMs fail to develop generalizable problem-solving capabilities, especially for planning tasks. Performance collapses to zero beyond certain complexity thresholds in controlled puzzle environments (Page 3, Section 1; Page 11, Section 5). (3) Data Contamination Issues: Established mathematical and coding benchmarks suffer from data contamination, where models may have been exposed to similar problems during training, skewing performance evaluations (Page 2, Section 1; Page 5, Section 3). (4) Inefficiency in Low-Complexity Tasks: For simpler, low-compositional problems, standard LLMs demonstrate greater efficiency and accuracy compared to LRMs, suggesting that additional "thinking" mechanisms in LRMs may introduce unnecessary overhead (Page 3, Section 1; Page 7, Section 4.2.1). (5) Complete Collapse at High Complexity: Both LLMs and LRMs experience complete performance collapse when problem complexity exceeds a critical threshold, indicating a fundamental limitation in handling highly complex, compositionally deep tasks (Page 3, Section 1; Page 8, Section 4.2.2). (6) Counterintuitive Scaling Limitation: LRMs reduce their reasoning effort (measured by inference-time tokens) as problem complexity increases beyond a certain point, despite having ample token budgets, revealing a scaling limitation in reasoning capabilities (Page 3, Section 1; Page 8, Section 4.2.2). (7) Overthinking Phenomenon: In simpler problems, LLMs and LRMs often identify correct solutions early but continue exploring incorrect alternatives, wasting computational resources in an "overthinking" pattern (Page 3, Section 1; Page 9, Section 4.3)
-
Hallucination in large language models (LLMs) has been widely studied, but the key question remains: Can it ever be eliminated? A recent paper systematically dismantles the idea that hallucination can be fully eradicated. Instead, it argues that hallucination is not just an incidental flaw but an inherent limitation of LLMs. 1️⃣ Hallucination is Unavoidable The paper establishes that LLMs cannot learn all computable functions, meaning they will inevitably generate incorrect outputs. Even with perfect training data, LLMs cannot always produce factually correct responses due to inherent computational constraints. No matter how much we refine architectures, training data, or mitigation techniques, hallucination cannot be eliminated—only minimized. 2️⃣ Mathematical Proofs of Hallucination They use concepts from learning theory and diagonalization arguments to prove that any LLM will fail on certain inputs. The research outlines that LLMs, even in their most optimized state, will hallucinate on infinitely many inputs when faced with complex, computation-heavy problems. 3️⃣ Identifying Hallucination-Prone Tasks Certain problem types are guaranteed to trigger hallucinations due to their computational complexity: 🔹 NP-complete problems (e.g., Boolean satisfiability) 🔹 Presburger arithmetic (exponential complexity) 🔹 Logical reasoning and entailment (undecidable problems) This means that asking LLMs to reason about intricate logic or mathematical problems will often lead to errors. 4️⃣ Why More Data and Bigger Models Won’t Fix It A common assumption is that hallucination can be mitigated by scaling—adding more parameters or training data. The paper challenges this notion: While larger models improve accuracy, they do not eliminate hallucination for complex, unsolvable problems. 5️⃣ Mitigation Strategies and Their Limitations Various techniques have been introduced to reduce hallucinations, but none can completely eliminate them: ✅ Retrieval-Augmented Generation (RAG) – helps provide factual grounding but does not guarantee accuracy. ✅ Chain-of-Thought Prompting – improves reasoning but does not fix fundamental hallucination limits. ✅ Guardrails & External Tools – can reduce risk but require human oversight. They suggest LLMs should never be used for fully autonomous decision-making in safety-critical applications. The Bigger Question: How Do We Build Safe AI? If hallucination is an unavoidable reality of LLMs, how do we ensure safe deployment? The research makes it clear: LLMs should not be blindly trusted. They should be integrated into workflows with: 🔹 Human in the loop 🔹 External fact-checking systems 🔹 Strict guidelines Are we designing AI with realistic expectations, or are we setting ourselves up for failure by expecting perfection? Should LLMs be used in high-stakes environments despite their hallucinations, or should we rethink their applications? #ai #artificialintelligence #technology