Guide to Multi-Sensor Fusion for Follow Me System: Architecture and Challenges of UWB + IMU + Vision

In the development of autonomous following vehicles, relying on a single sensor is rarely sufficient. UWB provides precise positioning but suffers from signal blockage; IMU can capture rapid motion but accumulates drift over time; vision enables human and environment recognition but depends heavily on lighting and computation.

As a result, the combination of UWB + IMU + Vision has become a mainstream solution in the industry. This article explores its architecture, workflow, and the key challenges faced in real-world deployments.

1. Why Fusion is Necessary

  • UWB: Accurate but unstable under obstruction
    UWB can deliver 510 cm positioning accuracy, yet signals degrade in reflective or obstructed environments.
  • IMU: High-frequency but drifts over time
    IMU (Inertial Measurement Unit) provides acceleration and angular velocity at high frequency, but its cumulative drift grows quickly without correction.
  • Vision: Intelligent but environment-dependent
    Vision-based systems recognize humans and obstacles, but suffer in low-light, occlusion, or crowded environments.

By combining all three: UWB offers stable baseline positioning, IMU provides high-frequency motion data, and vision enhances user recognition and environmental perception

2. System Architecture Design

A typical UWB + IMU + Vision fusion architecture includes the following layers:

(1) Perception Layer (Sensor Input)

  • UWB module: Outputs absolute/relative position coordinates (x, y, z).
  • IMU module: Provides acceleration and angular velocity for pose estimation.
  • Vision module: Uses detection/tracking algorithms (YOLO, SORT, DeepSORT) to identify the user or target.

(2) Fusion Layer (Data Processing)

  • Time Synchronization: Align timestamps across heterogeneous sensors.
  • Coordinate Alignment: Map sensor outputs into a unified reference frame.
  • Filtering & Fusion:
    • Extended Kalman Filter (EKF): Merges UWB and IMU to suppress drift.
    • Vision-based Correction: Compensates when UWB data is lost or inaccurate.

(3) Decision & Control Layer

  • Path Planning: Generates following trajectories while avoiding obstacles.
  • Motion Control: Differential drive and motor control for smooth following.

3. Workflow Overview

PSICV System Architecture

Autonomous Following System Architecture

    graph TD
        subgraph Sensing [Sensing Layer]
            UWB["UWB Positioning
(User → Tag)"] IMU["IMU Motion Data
(Inertial)"] Vision["Vision Recognition
(Object Detection)"] end Sensing --> Fusion["Fusion Module (EKF/UKF)
State Estimation"] Fusion --> Position["Stable Position Output"] Position --> Control["Path Planning & Control"] Control --> Result["Autonomous Following"] style Sensing fill:#fafafa,stroke:#666,stroke-dasharray: 5 5 style Fusion fill:#e3f2fd,stroke:#1565c0,stroke-width:2px style Result fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

4. Key Challenges in Real Deployments

(1) Time Synchronization

Different sampling rates and latencies:

  • UWB: 10–50 Hz
  • IMU: 100–200 Hz
  • Vision: 20–60 Hz
    Without proper synchronization, fusion outputs suffer from “misalignment.”

(2) Coordinate Transformation

  • UWB: Local coordinate frame
  • Vision: Camera coordinate frame
  • IMU: Body coordinate frame
    Requires calibration and mapping into a global/world coordinate frame.

(3) Noise & Drift Handling

  • UWB: Susceptible to multipath effects
  • IMU: Long-term drift
  • Vision: Misclassification and false positives
    Robust filtering (EKF/UKF) and redundancy mechanisms are essential (e.g., fallback to IMU + vision when UWB fails).

(4) Computation & Power Consumption

  • Vision requires high compute resources (e.g., running YOLO models),
  • UWB/IMU are low-power and lightweight.
    Common strategy: low-frequency full fusion + high-frequency lightweight prediction.

5. Application Scenarios

  • Smart baby strollers / wheelchairs: UWB tracks the caregiver, IMU ensures smooth motion, vision adds safety and obstacle avoidance.
  • Smart shopping carts: Vision recognizes the shopper, UWB provides correction, IMU maintains stability.
  • Industrial material carts: UWB anchors to factory stations, vision perceives the environment, IMU bridges continuity.

6. Conclusion

The fusion of UWB, IMU, and vision is currently the most versatile approach for autonomous following:

  • UWB provides stable baseline positioning
  • IMU supplements short-term high-frequency motion data
  • Vision enriches understanding of users and the environment

Real-world deployment, however, requires solving time synchronization, coordinate alignment, noise suppression, and computation optimization.

Once these challenges are addressed, autonomous following vehicles will be ready to operate seamlessly across indoor and outdoor, personal and industrial scenarios.

Leave a Comment

Your email address will not be published. Required fields are marked *