AI-generated summary
Artificial intelligence (AI) is undergoing a significant transformation from purely data-driven digital systems to embodied models that interact physically with their environment. This shift, discussed at the Bankinter Innovation Foundation’s Future Trends Forum, opens new opportunities and challenges across multiple sectors. A central issue highlighted by Ramón López de Mántaras, a leading AI researcher, is AI’s current lack of common sense and genuine understanding of the world, unlike humans who learn through direct sensory experience and interaction. Present AI systems excel in specific tasks but rely solely on data correlations, lacking the ability to reason about causal relationships or generalize knowledge to unfamiliar situations. This limitation is particularly problematic in unstructured, real-world environments where flexible learning and autonomous knowledge generation are essential.
To overcome these challenges, López de Mántaras advocates for the development of physical or embodied AI, which integrates sensory interaction with the environment to build structured knowledge and understanding akin to human cognition. This approach aligns with philosophical perspectives emphasizing the body’s role in intelligence and is supported by projects like DeepMind’s PLATO, which learns causal concepts through observation. While promising as a path toward artificial general intelligence (AGI), embodied AI faces technical hurdles including hardware constraints, sensory integration, and the need for flexible abstraction and knowledge transfer. Although still in early stages, advancing physical AI is crucial for creating systems that move beyond data processing to genuinely understand and interact with the world.
Artificial intelligence has made amazing advances, but it still does not solve one of its fundamental problems: the lack of common sense. Ramón López de Mántaras discusses why AI needs a physical body to understand the world and how embodied AI could be the key to achieving more advanced intelligence
Artificial intelligence is in the midst of a transformation. From being a purely digital system based on data, it is evolving towards models that interact with the physical world, which opens up new possibilities and challenges in multiple sectors. In this context, the Bankinter Innovation Foundation’s
In this series of articles, we’ve explored the perspectives of leading specialists such as Jeremy Kahn and Antonio Damasio on the future of physical AI. Now, we delve into the vision of Ramón López de Mántaras, who analyzes one of the great limitations of current artificial intelligence: its lack of common sense and its inability to understand the world as humans do.
You can see the full presentation by Ramón López de Mántaras at the Future Trends Forum here:
Can AI understand the world like humans?
Artificial intelligence has achieved impressive levels of performance in specific tasks, from natural language processing to computer vision and decision-making in defined environments. However, a fundamental difference persists between how humans understand the world and how AI does. Ramón López de Mántaras, Research Professor Emeritus at the CSIC and former director of the CSIC’s Institute for Research in Artificial Intelligence, warns that AI lacks an essential element in human cognition: common-sense knowledge.
Common sense allows humans to interpret reality without the need for explicit prior data. We don’t need to process millions of examples to know that if we drop a glass in the air, it will fall to the ground and probably break. This ability arises from the combination of our direct experience with the environment and the integration of multiple sources of sensory information. Instead, AI systems rely entirely on the data they have been trained with and lack a real understanding of the causal relationships between world events.
López de Mántaras stresses that the problem is not only technical, but also philosophical and epistemological. While humans build our knowledge through interaction with the environment, AI relies on statistical correlations within a dataset. This generates obvious limitations. A language model can generate sophisticated answers, but if asked about a situation outside of its training, it is likely to fail miserably or simply produce a plausible but incorrect answer.
The researcher mentions that this lack of understanding becomes critical in applications where AI must operate in unstructured environments. A human can quickly infer that, if there is water spilled on the ground, it is dangerous to walk on it. A traditional AI robot, on the other hand, would need to be specifically programmed to recognize that danger or have been trained with enough data to include similar cases. The ability to generalize in AI remains an unsolved challenge.
In addition, López de Mántaras stresses that current AI cannot build knowledge autonomously. In humans, learning is not based only on the accumulation of data, but on the ability to experiment, abstract concepts and apply what has been learned to new situations. In contrast, current AI systems cannot make discoveries on their own or generate hypotheses based on direct experience. They can only extrapolate previous data patterns, which limits their ability to respond flexibly to unprecedented situations.
To overcome these limitations, the researcher proposes that AI must move towards systems that integrate a model of the world based on physical interaction with its environment. Without this capability, AI will remain a powerful tool, but fundamentally different from human intelligence.
The problem of common sense in AI
Large language models have created the illusion that AI understands the world. However, as López de Mántaras explains, these systems are limited to recombining linguistic patterns learned from large volumes of text, without a true reasoning capacity. This phenomenon, known as data contamination, calls into question whether these models really “think” or simply repeat information.
The researcher mentions recent studies that show that when faced with questions about hypothetical events not included in your training, AI performance drops dramatically. This suggests that current models do not have a structured knowledge of the world, but instead rely on the presence of patterns in their training data.
This problem, known as the AI measurement problem, has led experts such as Demis Hassabis and Yann LeCun to recognize that current models are not the path to artificial general intelligence. Instead, López de Mántaras proposes an alternative: integrating AI into the physical world. Another avenue that has gained interest in recent years is the development of neuro-symbolic architectures, which combine the power of machine learning with traditional symbolic approaches to improve AI’s reasoning ability. These architectures seek to integrate knowledge representation and adaptive learning, allowing systems not only to recognize patterns, but also to reason about them. This merging of approaches could help overcome the limitations of purely connectionist models and bring AI closer to a true understanding of the world.
Embodied AI: The Role of the Body in Intelligence
The idea that intelligence requires physical interaction with the environment is not new. John Locke, in the seventeenth century, stated that human knowledge is based on sensory experience, which implies that without direct interaction with the world, it is not possible to develop a real understanding. This view was reinforced centuries later by Maurice Merleau-Ponty, one of the leading exponents of phenomenology, who stressed that the body is not just an object in the world, but the medium through which we understand it. In a similar vein, Ludwig Wittgenstein posited that “the limits of my language are the limits of my world,” suggesting that our ability to conceptualize reality is directly linked to the way we experience it.
For AI to achieve a true understanding of the world, it must have a multisensory body that allows it to interact with its environment and learn in a similar way to humans. Inspired by this idea, López de Mántaras developed a project in which a humanoid robot learned cause-effect relationships by touching a keyboard and listening to the resulting sounds. This study showed that robots can acquire structured knowledge about the world from physical interaction, a fundamental principle of human learning according to Jean Piaget.
This concept is aligned with what has been proposed in other presentations at the Future Trends Forum. Jeremy Kahn discussed how world models allow AI to interpret its environment and act more autonomously. Antonio Damasio went further, pointing out that without a biological basis, AI cannot develop genuine consciousness or emotions.
Is physical AI the path to general intelligence?
The concept of artificial general intelligence (AGI) has been an ambitious goal in the field of AI since its inception. It refers to an artificial intelligence that not only excels at specific tasks, but can reason, adapt, and learn flexibly in different contexts, just like a human being. So far, progress in AI has been impressive, but limited to well-defined domains. Current systems can defeat the best chess players or generate coherent text, but they are still unable to extrapolate knowledge to new scenarios or understand the world with the depth and flexibility of the human mind.
Ramón López de Mántaras argues that physical AI could be a fundamental step on this path, as it would allow machines to develop knowledge through direct experience with the world. An example of this approach is PLATO (Physics Learning through Auto-encoding and Tracking Objects), a system developed by DeepMind that learns basic causal relationships by watching videos, inspired by child developmental psychology. In experiments, PLATO has been able to infer concepts such as object persistence without the need for training with explicit labels. These types of advances suggest that endowing AI with a perceptual capacity similar to that of humans in their childhood could be key to bringing us closer to a more generalized intelligence.
This approach presents a key difference from current AI models. While neural network-based systems rely entirely on the data they’ve been trained on, a physical AI equipped with advanced sensors and interaction capabilities could generate its own knowledge as it experiences the world. This would bring it closer to more human-like learning, where direct experience shapes understanding and decision-making.
However, López de Mántaras warns that this path is not without obstacles. Building autonomous robots with widespread learning capabilities is still in its early stages. Although there are advances in robotics and reinforcement learning, the integration of multiple sources of sensory information and its translation into structured knowledge remains an open challenge. In addition, physical AI introduces additional problems in terms of hardware, power consumption, and adaptation to complex and dynamic environments.
Another critical challenge is the development of AI architectures that allow information from diverse experiences to be combined and flexibly applied in novel situations. Currently, most physical AI systems are designed for specific tasks, such as manipulating objects or navigating in familiar environments. To approach general intelligence, these systems would need a capacity for abstraction and knowledge transfer that is still limited today.
López de Mántaras concludes that, although physical AI represents an important advance towards artificial general intelligence, we are still far from reaching a system that can match the cognitive versatility of humans. However, exploring this path is critical to overcoming the limitations of current models and moving towards an AI that not only processes data, but also understands the world through experience.