Skip to content

vision (Vision)


The vision module provides visual and image-related interfaces, including image acquisition, camera control, screen capture, 3D point cloud visualization, ROS vision control, and more, supporting the development and validation of computer vision algorithms.


Module Features

Feature Description
Image Acquisition Acquire image data from cameras, screen, or simulation environments
Camera Control Configure camera parameters, control viewing angle, and motion
Screen Capture Real-time capture and recording of screen content
3D Point Cloud Visualization Visualize and process point cloud data
ROS Vision Control Integrate and control with ROS vision systems
Open3D Integration Point cloud and mesh processing via the Open3D library

Core Classes and Interfaces

Class/Interface Description
VisionCaptureApi Core class for visual acquisition, supporting multiple image sources
CameraCtrlApi Camera control interface, supporting parameter and motion control
ScreenCaptureApi Screen capture interface, supporting region and full-screen capture
Open3DApi Open3D integration interface, supporting point cloud and mesh processing
VisionConfig Vision configuration management, supporting parameter persistence
RosCtrl ROS vision control interface

Supported Image Sources

Image Source Description
USB Camera Real-time video stream from local USB cameras
Network Camera Network video stream from IP cameras
Simulation Camera Virtual camera in RflySim3D/UE
Screen Capture Real-time desktop screen screenshots
Image Files Read and playback of local image files
ROS Images Image data from ROS topics

Use Cases

Computer Vision Algorithm Development

  • Training and validation of object detection and recognition algorithms
  • Image classification and semantic segmentation experiments
  • Development of visual SLAM and localization algorithms

Drone Vision Applications

  • Processing of drone aerial imagery
  • Vision-based navigation and obstacle avoidance systems
  • Vision-based target tracking

Simulation Environment Vision

  • Acquiring training data from simulation environments
  • Domain randomization for data augmentation
  • Sim-to-real transfer of vision algorithms

Data Recording and Playback

  • Recording visual data during flight
  • Creating and managing datasets
  • Playback and analysis of historical data

Code Examples

Basic Image Acquisition

from RflySimSDK.vision import VisionCaptureApi

# Create a vision acquisition instance
vision = VisionCaptureApi()

# Connect to a simulation camera (RflySim3D)
vision.connect(
    source='rflysim3d',
    ip='127.0.0.1',
    port=9999
)

# Set image resolution
vision.setResolution(640, 480)

# Start image acquisition
vision.startCapture()

# Capture a single frame
frame = vision.getFrame()

# Save the image
frame.save('capture.png')

# Stop acquisition
vision.stopCapture()

Camera Parameter Control

from RflySimSDK.vision import CameraCtrlApi

# Create a camera control instance
camera = CameraCtrlApi()

# Connect to the camera
camera.connect('127.0.0.1', 9999)

# Set camera parameters
camera.setExposure(0.01)  # Exposure time: 10ms
camera.setGain(1.5)       # Gain: 1.5
camera.setWhiteBalance(5500)  # Color temperature: 5500K

# Control camera motion
camera.moveTo([10, 5, 3])           # Move to position
camera.rotateTo([0, -30, 45])       # Rotate to angle (pitch -30°, yaw 45°)
camera.setFOV(90)                   # Set field of view

# Follow mode
camera.followTarget(
    target_id=1,
    offset=[-5, -5, 2],
    smooth=True
)

Point Cloud Visualization (Open3D)

from RflySimSDK.vision import Open3DApi
import numpy as np

# Create Open3D interface
o3d = Open3DApi()

# Create point cloud data
points = np.random.rand(1000, 3)  # 1000 random points
colors = np.random.rand(1000, 3)  # Random colors

# Create point cloud object
point_cloud = o3d.createPointCloud(points, colors)

# Display point cloud
o3d.show(point_cloud)

# Add coordinate frame
o3d.addCoordinateFrame(size=1.0)

# Create point cloud from Lidar data
lidar_data = o3d.readLidarData('lidar_scan.bin')
o3d.visualizeLidar(lidar_data)

Performance Optimization

Optimization Item Description
Image Compression Use JPEG/PNG compression to reduce transmission bandwidth
Adaptive Resolution Dynamically adjust resolution based on network conditions
Frame Rate Control Limit maximum frame rate to avoid excessive resource usage
Hardware Acceleration Use GPU acceleration for image processing
Caching Mechanism Local caching and preloading of image data


Note: This document serves as the index page for the vision module. For detailed API descriptions of each interface, please refer to the corresponding standalone documentation pages.