For robot dexterity, a missing piece is general, robust perception. Our new Science Robotics article combines multimodal sensing with neural representations to perceive novel objects in-hand. See it on the cover of the November issue! https://lnkd.in/ezZRs5dN We estimate pose and shape by learning neural field models online from a stream of vision, touch, and proprioception. The frontend achieves robust segmentation and depth prediction for vision and touch. The backend combines this information into a neural field, while also optimizing for pose. Vision-based touch (digit.ml/digit) perceives contact geometries as images, and we train an image-to-depth tactile transformer in simulation. For visual segmentation, we combine powerful foundation models (SAMv1) with robot kinematics. It doubles up as a multimodal pose tracker, when provided CAD models of the objects at runtime. For different levels of occlusion, we find that “touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation." We release a large dataset of real-world and simulated visuo-tactile interactions and tactile transformer models on Hugging Face: bit.ly/hf-neuralfeels This has been in the pipeline for a while, thanks to my amazing collaborators from AI at Meta, Carnegie Mellon University, University of California, Berkeley, Technische Universität Dresden, and CeTI: Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Maroje Lambeta, Jitendra MALIK, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, Joseph Ortiz, and Mustafa Mukadam Paper: https://lnkd.in/ezZRs5dN Project page: https://lnkd.in/dCPCs4jQ #ScienceRoboticsResearch
Artificial Sensory Systems for Robotics Applications
Explore top LinkedIn content from expert professionals.
Summary
Artificial sensory systems for robotics applications refer to the technology that enables robots to “see,” “feel,” and “hear” through sensors that mimic human senses, allowing for more precise and intelligent interactions with their environment. These systems combine data from various sensors—such as cameras, tactile pads, and microphones—to help robots perform delicate tasks, understand objects, and adapt their actions in real time.
- Mix sensory data: Combine visual, tactile, and auditory information to teach robots to recognize and handle objects with greater accuracy and care.
- Refine grip control: Use advanced fingertip sensors that detect both quick touches and sustained pressure so robots can safely grasp fragile items without damage.
- Train with feedback: Guide robotic hands to improve their dexterity by using human demonstrations and feedback, helping them learn how to gently and adaptively manipulate different objects.
-
-
Ever wondered what robots 🤖 could achieve if they could not just see – but also feel and hear? We introduce FuSe: a recipe for finetuning large vision-language-action (VLA) models with heterogeneous sensory data, such as vision, touch, sound, and more. We use language instructions to ground all sensing modalities by introducing two auxiliary losses. In fact, we find that naively finetuning on a small-scale multimodal dataset results in the VLA over-relying on vision, ignoring much sparser tactile and auditory signals. By using FuSe, pretrained generalist robot policies finetuned on multimodal data consistently outperform baselines finetuned only on vision data. This is particularly evident in tasks with partial visual observability, such as grabbing objects from a shopping bag. FuSe policies reason jointly over vision, touch, and sound, enabling tasks such as multimodal disambiguation, generation of object descriptions upon interaction, and compositional cross-modal prompting (e.g., “press the button with the same color as the soft object”). Moreover, we find that the same general recipe is applicable to generalist policies with diverse architectures, including a large 3B VLA with a PaliGemma vision-language-model backbone. We open source the code and the models, as well as the dataset, which comprises 27k (!) action-labeled robot trajectories with visual, inertial, tactile, and auditory observations. This work is the result of an amazing collaboration at Berkeley Artificial Intelligence Research with the other co-leads Joshua Jones and Oier Mees, as well as Kyle Stachowicz, Pieter Abbeel, and Sergey Levine! Paper: https://lnkd.in/dDU-HZz9 Website: https://lnkd.in/d7A76t8e Code: https://lnkd.in/d_96t3Du Models and dataset: https://lnkd.in/d9Er5Jsx
-
I often get asked — how can robots sense and control the pressure needed to grab different objects? The answer lies at the intersection of vision models, VLA systems, and tactile sensing. SpikeATac combines two complementary types of sensing: ✨ Dynamic sensing — using a PVDF film that detects rapid pressure changes (the instant of contact). ✨ Static sensing — using capacitive sensors to measure sustained forces (the firmness of a grip). The result? A fingertip that can distinguish between brushing against glass and holding it firmly — just like a human finger. In tests, this system could grasp fragile materials (like seaweed sheets) at high speed without damage — something traditional pressure sensors fail to do. Even more impressive: the robot hand was trained using reinforcement learning with human feedback. A base policy learned from demonstrations, then refined its tactile sensitivity through human-labeled feedback (“good” vs “bad” grasps) — learning over time to make softer, more adaptive grips. This kind of breakthrough is what bridges the gap between perception and action — a step closer to robots that don’t just see, but also feel.
-
Tactile and visual perception are essential for humans to interact precisely with their surroundings. Robots, too, can benefit from similar sensing abilities to improve their manipulation skills. Researchers from Columbia University, University of Illinois Urbana-Champaign, and the University of Washington introduced 𝟑𝐃-𝐕𝐢𝐓𝐚𝐜, a system that uses both touch and sight to help robots handle objects skillfully with two hands. The system includes tactile sensors that cover small areas, each only 3 mm², making them low-cost and flexible. These sensors provide detailed information on physical contact, adding to what the robot "sees" visually. The system combines touch and sight into a single 3D space to maximize this data, keeping their shapes and positions intact. This combined information can then guide the robot to learn through imitation. Through tests, researchers show that even affordable robots can handle objects with great precision, doing better than robots that rely only on sight, especially when handling delicate items and performing tasks that require careful manipulation. 📝 Research Paper: https://lnkd.in/gdGXk2xN 📊 Project Page: https://lnkd.in/g9tX-DYM 🎥 Project Video: https://lnkd.in/gnnZ8B2T #robotics #research
-
Rodney Brooks recently reminded us why today’s humanoids are still unable to perform basic tasks.. and how dexterity doesn’t “emerge” from scale. Rather, it demands structure, new sensors, new AI, new datasets that go beyond what everyone is doing. At Analog Devices, we agree with this basic observation and are taking on that challenge with a full-stack tactile intelligence platform built to collapse the boundary between sensing, compute, AI, and action. Unlike the rest of the field, we are not treating sensors as 'passive observers' and pushing AI models that never touch the physics. An example is embedded reflexes, or systems that feel before they think, and act without waiting for neither central compute, or the cloud. #PhysicalIntelligence #Robotics #EmbeddedAI #TactileSensing #RodneyBrooks #AnalogDevices https://lnkd.in/efBAuATr Tao Yu