Skip to content

Chapter 8: Multimodal Perception and Intelligent Decision-Making

Visual perception refers to the process by which machines acquire environmental information through sensors and analyze and interpret images using computer vision techniques. It encompasses tasks such as object detection and recognition, endowing the system with environmental awareness. Obstacle avoidance decision-making relies on environmental information obtained through visual perception, employing environment modeling, path planning, and intelligent decision-making algorithms to formulate behavior strategies that avoid obstacles, prevent collisions, and achieve predefined objectives. These two components are interdependent: visual perception provides environmental data for obstacle avoidance decision-making, while the latter executes actions based on this data. They play a critical role in fields such as autonomous driving and robot navigation, promoting intelligent applications and development of unmanned systems in complex environments.


8.1 Background and Theory

Multimodal perception and intelligent decision-making technologies constitute the core pillars for intelligent unmanned systems to achieve efficient collaboration, autonomous operation, and safety assurance, forming a closed-loop mechanism of “perception–cognition–action.”

alt text

8.1.1 Multisource Information Fusion and Robust Perception

Traditional positioning and control schemes for unmanned systems often rely on single-sensor inputs, such as GNSS satellite positioning. In highly interfered environments—including forests, urban canyons, over sea surfaces, or indoors—integrating multimodal sensor data (e.g., vision, LiDAR, IMU) has become inevitable. Techniques such as SLAM algorithms enable the construction of robust, continuous, and dynamically updatable environmental models.

8.1.2 Obstacle-Avoidance Path Planning and Intelligent Decision-Making

Obstacle-avoidance planning depends on fused perception data and leverages advanced methods—including deep learning, reinforcement learning, behavior trees, graph search, and optimization algorithms—to enable autonomous assessment of obstacle risks, dynamic adjustment of speed and heading, and real-time generation of safe and efficient paths.


8.2 Framework and Interfaces

The RflySim toolchain, combined with typical development cases, provides a detailed introduction to the supporting capabilities for intelligent perception and decision-making tasks, including sensor interfaces, data acquisition and processing workflows, and typical task algorithm architectures.

alt text

8.2.1 Image Acquisition in Virtual Environments

RflySim offers a high-fidelity virtual sensor simulation environment, supporting the generation of multimodal sensor data—including RGB vision, depth images, LiDAR, and IMU—providing realistic test data sources for visual perception algorithms.

8.2.2 Object Detection and Tracking

The platform supports algorithm validation for typical vision tasks, including object detection and tracking, path planning, and obstacle-avoidance strategies. It provides standardized interface frameworks to help developers efficiently transition from simulation validation to real-machine deployment.

alt text


8.3 Showcase of Outstanding Cases

Five-UAV Visual Shared SLAM Hardware-in-the-Loop Simulation:

Simulation Algorithm Development and Validation:


8.4 Course-Linked Video Lectures

Public Lecture Replay (Session 7: Multimodal Perception and Intelligent Decision-Making):

8.5 Chapter Experiment Cases

The related verification experiments and guided cases for this chapter are located in the [Installation Directory]\RflySimAPIs\8.RflySimVision folder.

8.5.1 Interface Learning Experiments

Stored in the 8.RflySimVision\0.ApiExps folder, these experiments cover foundational platform interface tutorials and general introductions to various tools.