LLMs in the service of Active Perception Perception and action (e.g. planning and control) are fundamental to intelligence, yet they are often studied separately in robotics. Traditionally, perception transforms sensory signals into symbolic representations, while action relies on symbolic models to generate movement. This approach assumes perception provides a static 3D model for planning, but it does not reflect biological reality. In nature, perception and action co-evolve, forming an interwoven process. For example, when cutting a tomato, one must first perceive and locate the knife, then grasp the knife, then perceive the tomato, then bring the knife to the tomato, perform the cutting motion, and finally verify the cut. These interleaved sequences of perception and action can be captured in Perception-Action-Coordination (PAC) programs (consisting of both perceptual and motor functions), which serve as modular, compositional building blocks of intelligence, enabling counterfactual reasoning and control. By shifting from traditional planning to PAC programs, we can integrate active perception and visual reasoning, which Large Language Models (LLMs) can help structure. Unlike natural language, programs have a clear grammar, making them ideal for LLMs. A key application is in attribute learning, where robots learn about object properties (e.g., weight, size) by interacting with them rather than relying on static datasets, which do not scale. Vision-Language Models (VLMs) may align linguistic instructions with visual information but fail to grasp non-visual attributes like weight. To address this, the proposed framework combines LLMs, VLMs, and robotic control functions to generate PAC programs that actively explore object attributes. These programs invoke sensory, manipulation, and navigation functions, allowing robots to reason beyond visual perception and understand an object’s properties through interaction. This approach moves towards more intelligent robots that can see, think, and act to explore their environments. The work, titled “Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs” by @Angelos Mavrogiannis, @Dehao Yuan, and @Yiannis Aloimonos, will be presented at ICRA 2025 in Atlanta. Project Page: https://lnkd.in/eff8grGq arXiv: https://lnkd.in/e2dDEFz3 #AI #Robotics #LLM #VLM #ActivePerception
Role of Perception in Robotic Planning
Explore top LinkedIn content from expert professionals.
Summary
The role of perception in robotic planning refers to how robots use sensors and artificial intelligence to interpret their surroundings and make decisions about movement, tasks, or interactions, blending sight, touch, and reasoning into smarter actions. This concept is reshaping robotics, allowing machines to adapt in real time and learn about their environment, much like humans do.
- Combine sensing and action: Encourage robotic systems to use both sensory input and movement together, so they can handle complex, unpredictable tasks.
- Prioritize adaptive strategies: Guide robots to update their plans as they receive new information, making them more flexible and reliable in changing environments.
- Integrate semantic reasoning: Support the use of AI models that help robots not just see objects but also understand their meaning and context for better decision-making.
-
-
From Rigid Paths to Adaptive Perception: How Robots Learn to See Not long ago, industrial robots relied on rigidity: fixed jigs, calibrated paths, and zero tolerance for variation. Efficient, yes, but brittle. Today, perception has changed that. Robots now see and adapt. Cameras, depth sensors, and AI models allow them to align, grasp, and inspect without human correction. At Black Coffee Robotics, we have worked across this evolution: • Using ICP for millimeter-level AMR docking and part alignment in welding systems • Applying RANSAC to identify waterlines and docks from noisy marine LiDAR data • Combining SIFT/ORB with geometric solvers for cluttered object localization • Deploying YOLO and ClipSeg for assistive and construction robots • Experimenting with Vision-Language-Action models for zero-shot task execution in simulation Our latest blog explains these techniques, how they are used in practice, and how teams can select the right perception approach for their own robotic applications. It concludes with a case study of a vision-guided welding system that adapts its weld path in real time using perception. Read the full post and watch the welding demo: https://lnkd.in/gp6jQS3r #robotics #AI #perception #ros2 #isaacsim
-
From Foundation to Reflex: The New Standard for Next-Gen, Scalable, and Assured Autonomy For more than a decade, autonomous vehicle development has been dominated by two opposing design philosophies: modular pipelines and end-to-end learning. Modular architectures, which break autonomy into separate perception, planning, and control blocks, offer explainability and task-specific optimization. However, they often suffer from integration complexity, brittle hand-offs between modules, and challenges in generalizing across domains. In contrast, pure end-to-end models simplify the system, directly mapping sensor inputs to driving commands. But such systems lack semantic understanding and are often unable to explain or adapt their behavior in unstructured or novel scenarios. A new architectural paradigm is emerging that aims to combine the strengths of both approaches. This involves integrating global scene understanding models with reflexive control layers. At its core, the architecture leverages semantic understanding of the environment, extracted via large-scale pre-trained models capable of processing multimodal inputs such as images, LiDAR, and radar. These models generalize across environments and provide high-level interpretations like lane topology, object classification, and actor behaviors. Crucially, this semantic representation becomes the structured input for control layers responsible for direct action generation. In essence, autonomous systems begin to drive based on what they understand, not just what they sense. Equally critical is the control side of the architecture. Reflexive control networks designed for low-latency, real-time actuation process environmental context and generate precise motion commands. These networks are optimized for direct control objectives such as collision avoidance, energy-efficient routing, and path stability. By decoupling scene interpretation from control logic, this approach allows perception and control components to evolve independently, while their fusion ensures coordinated, real-time decision-making. This structure also supports adaptive behaviors in complex, partially structured environments like logistics yards, ports, and industrial campuses, where fixed HD maps or conventional rule-based planners are inadequate. This architectural shift from “foundation to reflex” is no longer theoretical. It represents a practical, scalable strategy for deploying autonomous systems in real-world environments where both generalized intelligence and specialized control are required. In the years ahead, the autonomous systems that lead will not be those that think the best, nor those that react the fastest, but those that can do both—thinking like a human, reacting like a machine. #AutonomousVehicles #FoundationModels #ScalableAutonomy #ReflexiveControl #EndToEndAI #NextGenAutonomy #RealTimeAI #SemanticPerception #AIInnovation #IntelligentMobility