Probabilistic inference techniques, such as simultaneous localization and mapping (SLAM), are critical to use in dynamic real-world applications. Inference problems in robotics are formulated as probabilistic graphical models. First, even modest environments lead to large graphical models with thousands of edges and nodes but with sparse structure. This motivates large-scale optimization that exploits sparsity. Second, many estimation variables belong to a manifold rather than Euclidean space (e.g., orientation has to satisfy rotation matrix constraints), requiring probability distributions over manifolds and constrained optimization to enforce constraints. Third, the estimation needs to be performed online, in real-time, as the robots explore the environment. This requires dynamic, time-varying updates of the graphical models that scale with the environment complexity.
An essential part of making robots do intelligent actions is the ability to plan actions over time. When considering both motion and dynamic constraints the state space is too large to allow for exhaustive search. The main progress has been through sampling based methods, which are known to generate suboptimal results. In addition, kino-dynamic approaches are known to be NP-hard. Addition of constraints, e.g., limiting the motion to constant acceleration, can make the problem tractable. However, such techniques are ad hoc and there is a need to formalize how kino-dynamic planning can be made tractable for single and multi-agent systems.
It is very challenging to train Reinforcement Learning (RL) agents with high dimensional states and actions, for example, in the tasks of large-scale multi-robot systems and complex dexterous manipulation. Given this high complexity in optimization, it will be data inefficient to completely rely on online training in the traditional RL pipeline. Instead, we should utilize the large-scale dataset collected from other robots (vision, touch and sound) and even human demonstrations to train our robot first in an offline manner, and then perform online optimization to adapt to the real world in dynamic and time-varying scenes. To better utilize large-scale diverse offline data, we will perform optimization across data in multiple modalities and domains, and for multiple objectives at the same time.
Current RL has low sample efficiency and cross-task generalizability. Modularity and compositionality have been hypothesized to be the crucial features underlying the efficiency and generalizability systems. Modules are local experts, which can be recombined to form more to solve different tasks. Building a compositional modularized inference machinery is a challenge – without the modules defined, it is difficult to select and chain them together. Such chicken-and-egg issues are common in optimization, often addressed by inventing novel objective functions. Specifically, the objective here has to involve discrete variables in the design space of module chaining and continuous variables of parameters. Finally, the composition has to be dynamic and dependent on the data. Thus, optimization on multiple time scales and time-varying graphs will be explored.