Our Blog

Object Tracking in Drone Cinematography


Authors: I. Pitas, V. Mygdalis, University of Bristol


 

How to keep moving objects in focus

Drone platforms are a recent addition to the cinematographer’s arsenal allowing for important media production applications, including the coverage of outdoor (e.g. sports) events. One part of the basic MultiDrone mission is to film a moving target during such an event (e.g. a cyclist, a rowing boat or a football player). AV shooting of a target typically requires that the drone follows it, as shown in Figure 1. A way to perform target following includes computer vision-based techniques, which first requires tracking the target as depicted in a video stream.

In the research community, this task is called “2D-Video-Tracking“. The overall objective is to track a moving target without failure and with good 2D-target-localisation accuracy for the entire duration of a drone mission (typically less than 30 minutes). Furthermore, in terms of execution time, a good 2D-video-tracker should achieve performances above the nominal video recording rate, i.e. 25-50 frames per second.

Figure 1: Two drones tracking and following a rowing boat

In computer vision applications, 2D-tracking has been a massive research area throughout the last 20 years. The current state can be summarised as follows: For a given 2D-video-input and a specified region of interest (ROI) in the tth frame containing the target to be tracked, the algorithm should be able to re-detect/track the target in the next (t+1)th frame, as shown in Figure 2. This procedure is repeated for a number of subsequent frames until the initial annotated ROI is not detected/tracked anymore.

Figure 2: Illustration of 2D-video-tracking

 

Finding the right tracker for MultiDrone

A state-of-the-art approach in 2D-video-tracking is exploiting the so-called “Correlation Filter-based tracking”. It means that the 2D-tracker creates a representation of the target ROI and then trains a correlation filter able to detect this ROI in the following frame(s). Next, the correlation filter is adopted and trained iteratively by a series of subsequent video frames, applying the results of the previously processed frames at each new step. The representation of the initial ROI and the training method of the correlation filter are the core parts which characterise this family of algorithms.

To achieve its objectives, MultiDrone partners started to benchmark the various existing 2D-tracking-algorithms, for use in filming sports in outdoor environments. Existing 2D-video-tracking benchmarking efforts were only partially useful, as they did not focus on sports videos. The initial approach of the University of Bristol (UoB) was to implement and evaluate the performance of all state-of-the art 2D-trackers on videos captured by commercial drones. To this end, 26 sports videos filmed by drones were collected and ground truth (target ROIs) has been created for every video frame, in a joint effort of UoB and Aristotle University of Thessaloniki (AUTH).

Figure 3: 2D video tracking precision plot

In total, 14 different trackers were benchmarked. In Figure 3, we include the tracker precision plots, which measure the percentage of frames whose estimated location is within 20px distance from the ground truth. Among the top performing trackers during our evaluation, was the STAPLE tracker. The modified version of STAPLE, namely STAPLE2, was modified by UoB in order to incorporate some additional parameter thresholds. Both implementations STAPLE and STAPLE2 run above real-time (43 and 49 fps respectively).

Overall, it was found that there are 2D-trackers that are dependable and fast enough to be used in drone cinematography, particularly, if they are combined with periodic target re-detection. The coming weeks will show, which one really works best for MultiDrone. Stay posted for more updates.

 


Tags: , , , , ,

This is a unique website which will require a more modern browser to work! Please upgrade today!

LinkedIn Auto Publish Powered By : XYZScripts.com