The first time I visited New York City I arrived at Kennedy airport and promptly got lost. If you’ve been through that airport, you can relate. The layout and signage aren’t as efficient as you might hope— with multiple terminals and inconsistent connections between them. It took a bit of walking around, checking signs and asking questions before I could figure out how to get out and on my way to the city.
Now imagine a robot trying to find its way around JFK. How would it get its bearings? Indeed, how would it even know that it’s in a busy airport and not a corn field? And what if it didn’t have any access to external data like a previously constructed map or GPS? Most robots today would fail to work at all, and the reason is because of a challenge in robotics called simultaneous localization and mapping (SLAM).
SLAM is a difficult computational problem that when solved, allows mobile autonomous units, such as wheeled or legged robots and flying drones, to figure out where they are and where they’re going within space. It’s especially tricky when the robot or drone has no access to external references like GPS or a pre-built map.
SLAM systems work by calculating a robot’s position and orientation relative to other objects in a space while concurrently creating a map of its surroundings. The paradox is that this is a chicken or egg problem. It seems counter-intuitive to be able to find your location before you have a map of your surroundings, and likewise to build a map if you haven’t a clue where you are. So how does our robot stuck at JFK terminal 5 go about doing this?
Let’s go back to my own experience of being lost at JFK. In order to get my bearings I used my eyes to look around. Additionally I walked around to get a feel for the space, using my vestibular system (a set of organs between the ears that sense motion and keep us balanced) to orient myself spatially. Just like our robot, I would have used sensors to gather data about the space around me.
For robots, there are a number of different sensors that can be used, from the relatively cheap to the very expensive. Cameras can be used quite effectively as well as LIDAR (just like RADAR, but using light instead of radio waves) and a unit that measures acceleration and tilt called an IMU (inertial measurement unit) is also useful. And when it comes to SLAM, the more data the better, so often these sensors can be used in conjunction to improve accuracy and robustness (which in robotics means the extent to which a robot can deal with adverse situations).
All of these sensors gather data. Lots of data. And just like you use your brain to crunch all the data coming in from your senses, our robot needs the right algorithms to turn all that sensor data into spatial awareness.
The miracle of your own biology is that even your brain doesn’t know exactly where it is or even what it’s seeing. The brain is constantly taking shortcuts and making guesses that evolution has honed and tweaked over eons to get to the point where it’s actually really accurate. But that’s why it’s so easy to fool the brain with optical illusions and tricks of motion— where you are in space is only ever your brain’s best guess.
The brain is an enigma, and it’s not certain how it processes sensor data it receives from your body. But for robots, scientists have come up with clever algorithms that process data from one or multiple sensors to allow them to build their own sense of space.
Our robot at the airport is equipped with a camera so it will receive a stream of 2D images of the surrounding scene. By taking lots of geometric measurements of features (which can be anything that stands out such as a corner on a table or a nail in a wall), sophisticated algorithms make probabilistic calculations to estimate its position relative to them. The closer a feature is to the camera, the more it will appear to move, and it uses this fact to both improve the accuracy of the calculated position and estimate the direction and distance that it has travelled.
The more features available to our robot, and the more it can move around and view them at different angles, the more confident it becomes about its position. And as it's calculating it’s position relative to these points it’s also building a map of those points as it goes along (putting the S in SLAM!). What’s more, augmenting the SLAM system with additional sensors, such as a second camera, an IMU and a device to measure the number of times the wheels rotate, can provide it with additional visual and motion data that can greatly improve the accuracy of the estimated position and resulting map.
SLAM systems are made up of two essential components— sensors and algorithms. The synthesis of many measurements and calculations from these components results in a single perceptive paradigm for our robot which it will use to understand its surroundings, its location, and to judge its next move. In order for it to make better decisions as it moves through space, we need to provide it with the richest map possible, ideally being able to understand what it is seeing beyond simply an arbitrary group of objects in its field of view.
The simplest maps are sparse ‘point clouds’ of features that can be used mostly for localization but don’t provide much of a sense of the shape and size of objects. But by taking measurements based on every single pixel in its field of view our robot can build a ‘dense’ map of its surroundings, giving it a full 3D rendering of the space. This is called dense scene mapping and it represents a second level of SLAM competence that provides the shape, size, color and texture of the objects in its space. This enables much more accurate obstacle detection, avoidance, and the ability to plan movements in advance.
But further to that, by augmenting the SLAM system with deep learning, our robot can gain a sense of what it’s actually seeing. This third level of SLAM competence is called semantic understanding, or the extent to which it can identify what the objects in its surroundings are. It’s at this level that our robot lost at Kennedy starts to become aware that the objects around it collectively represent something resembling an airport, as opposed to simply a set of arbitrary objects.
Spatial AI is the culmination of these three levels of SLAM competence, (localization, dense scene mapping and semantic understanding). It is the end result of solving the SLAM problem and allows robots to truly understand their space. The robotics industry is nearing a tipping point where the cost of hardware as well as computational power are reaching a critical point where mass production of robots and drones will become commercially viable.
But in order for the industry to flourish robots need to be able to sense and interpret their surroundings in a way that enables them to carry out tasks that will make a difference to the human lives that they will touch. Such as carrying out stock checks in real time so that retail workers can focus more on customers, keeping the house clean so we can spend more time with our families, or dispatching an AED via drone to someone having a heart attack.
It is SLAMcore’s mission to make full-stack Spatial AI available as an easily insertable software module for the robotics industry. We aim to lower the intellectual and financial costs of market entry and empower everyone from garage roboticists to multinational conglomerates to build solutions that have a positive impact on how we live our lives. The wide range of robots that will be designed and developed is likely beyond the scope of our imagination. Eventually, you’ll likely even see them helping passengers with luggage at airports. And thanks to spatial AI, they won’t get lost.