Video Processing Latency Correction Algorithm


An approach is described to correct errors in the amount of axial rotation required to align a device with a target, where the target angle was initially calculated by a processing pipeline whose pipeline delay is large enough that the robotic device may have rotated after the sample used to calculate the target angle was acquired. This approach may be used in distributed processing environments and is based on globally-accessible Timestamped IMU Orientation Histories.

This approach is then extended to also correct for similarly-caused errors in linear displacement to a target, and to interpolate both axial rotation and displacement errors at a resolution exceeding that of the IMU Orientation samples.


Well-trained modern Video Processing Algorithms (e.g., Haar Classifiers, Neural Networks) can yield robust detection of target objects even when significant image noise exists. Modern Video Processing (VP) Hardware (e.g., NVIDIA Jetson TX1) enable these calculations in real time. However, using current technology the required VP execution time limits the effective rate at which real-time frames are processed to far less (e.g., on the order of 5 fps, or 200ms) than the typical robot motor control update frequency (e.g., on the order of 50Hz, or 20ms).

This order-of-magnitude frequency difference translates directly to a delay between the point in time when a real-world event is observed/detected (VP_t), and the time (RCS_t) when a Robot Control System (RCS) can initiate device motion (e.g., rotation of a turret, or a drive train) towards the detected target position.

If we first assume the VP algorithm derives angle to the target of interest (Angle_vp) from the image, and if we further assume the robot has physically rotated between VP_t and RCS_t, the implied error between Angle_vp and the true target angle will equal this amount of rotation.


An accurate estimate of the orientation change during VP_t and RCS_t is used to correct Angle_vp and yield the true current target angle.

Via an IMU with a high-resolution (1ms) timestamp and dual communication interfaces (e.g. nav-X MXP), both VP detection events and IMU orientation data are timestamped based on a common clock, and thus VP_t and RCS_t are globally known. The Orientation at each moment is retrieved from a timestamped Orientation History in the RCS, and used to calculate Orientation error (Orientation_err). The Orientation_err term is finally added to the target angle originally calculated by the VP, yielding the true current target angle.

Detailed Description


  • A Neural Network VP algorithm detects and calculates an angle to target at an effective rate of 5 Hz (200ms/video frame)
  • IMU Orientation samples are generated (navX-MXP) at 100Hz (10ms/sample)
  • IMU (navX-MXP) is connected (via USB) to VP (Jetson TX1), and (via SPI) to RCS (RoboRIO).
  • RCS maintains an n-second history of IMU Orientation/timestamp tuples.
  • VP acquires the current timestamp immediately as each video frame is capture.

Immediately after each video frame is captured (i.e., before decoding/pre-processing), the Timestamp corresponding to the moment when video frame integration began is appended to the frame is calculated; when detection occurs and target angle is computed, the computed angle and Orientation Timestamp are sent to the RCS.  Note:  further accuracy improvements may be achieved by characterizing the camera integration start time to camera frame delivery to VP time, and subtracting that amount from the Timestamp.

When the RCS receives the detection event description, it locates the “Event Orientation” within the Orientation History which most closely matches that timestamp.
Next, the RCS subtracts the most recently-acquired “Current Orientation” from the “Event Orientation”, and the resulting “Delta” Orientation (Orientation Error) fully describes in 3D-space the orientation change occurring during this period.

Orientation_err = Orientation_at_VPS_t - Orientation_at_RCS_t

This Orientation Error if finally used to adjust the angle calculated by the VP, as follows:

Angle_target_adjusted = Angle_target_vp + Yaw(Orientation_err)

Note 1:  Quaternions

In order to avoid “gimbal lock” during orientation error calculation, unit quaternions are used to represent Orientation in 3D-space. This approach implies that errors in Pitch, Roll and Yaw (Tait-Bryan) angles may all be corrected using this method.

Note 2: Extending Approach to Displacement

Although the above discussion relates to changes in relative robot rotation, this concept can be extended further to correct for changes in target range ocurring between VP_t and RCS_t, by incorporating in the RCS additional data from wheel encoders which are also timestamped with the same high accuracy clock. When fused with heading-change data from the Orientation_error term, the displacement error can be estimated as:

Displacement_err = Displacement_at_VPS_t - Displacement_at_RCS_t

And corrected as:

Range_target_adjusted = Range_target_vp + Dispacement_err

As an aside, in these cases another possible approach would be to employ a fast ranging (e.g., LIDAR) sensor (as before, sampled at both VP_t and RCS_t), assuming the range did not differ significantly between the LIDAR and the Target at VP_t and RCS_t).

Note 3: Orientation Error Interpolation

If we continue to assume the time period from camera frame integration start to camera frame capture by the VP can be well-characterized, and if the temporal resolution derived from the discrete IMU orientation samples received at the 10ms quaternion update rate is insufficient, intermediate quaternion position estimates up to the timestamp clock resolution are achievable via spherical linear interpolation (SLERP) as follows:

Orientation_err_interpolated = Orientation_err * (IMU_sample_resolution_ms / (n * Timestamp_resolution_ms))

Given the navX-MXP 1ms timestamp accuracy, orientation may be interpolated to a 1ms resolution.