Home > General > All Computer Vision Tracking Algorithms Are Not Created Equal

All Computer Vision Tracking Algorithms Are Not Created Equal

We receive a lot of positive feedback from customers who are impressed with how well our Ikena ISR and Spotlight software applications detect and track moving objects from both aerial and ground-based cameras. A common question we hear is why our tracking is so much better than what is found in mobile phones, drone cameras and even expensive gimbals, and whether our tracking algorithms could be embedded on those kinds of small devices.

The short answers are:

  • Our trackers are better because our R&D team spent years researching, developing and refining them. It certainly didn’t happen overnight.

 

  • Robust, reliable tracking requires more compute than most low-powered mobile embedded processors have today. However, better processors are on the horizon from companies like NVIDIA, Qualcomm, and ARM, opening up new possibilities for running high-performance image processing and computer vision software like ours, on embedded devices.

 

The Challenges with Open Source Tracking Algorithms and Limited Compute

One might assume that you can just take an open source algorithm like meanshift from OpenCV, put it into an android app, and voilà, you have a great tracker. Unfortunately, it’s not that simple.

Nearly every company that makes video software or cameras advertises some sort of tracking technology. Motion detection and object tracking from video have been around since video from security cameras was first digitized. You can find video-based trackers in security cameras, mobile phones and 600 pound aircraft gimbals that cost north of $1.5 million.

As one might expect, not all trackers are created equal. Devices such as security cameras, mobile phones, or drones have small, embedded processors that offer the least amount of compute to conserve power. A drone needs to devote most of its power to staying in the air, and a mobile phone needs to last a day without the battery dying. Thus, the chips that enable these devices to operate need to draw very little power, usually under a few watts. This limits what you can do in terms of computation.

Many mobile phone apps and small cameras use meanshift, the aforementioned open source tracking algorithm. It’s lightweight and can run on a small embedded CPU such as an ARM processor found in smartphones. It uses about 1/8th the compute that our video redaction software, Ikena Spotlight, uses. However, as you can see in the video below, it can’t properly track the license plate which is the the requirement for this particular redaction use case.

In the video below, a camera mounted on a manned aircraft has an onboard tracker which is part of the gimbal. Again, because this is a low-powered processor, it fails to track the boat (see the red box), whereas MotionDSP’s tracker (in yellow) tracks the object properly.

Conditions Vastly Affect Tracking

Aside from compute requirements, a variety of other conditions can increase the complexity of tracking. For example:

  • Complexity of Motion
    If your camera is fixed and doesn’t move, such as a wall-mounted security camera, then it is far simpler to do tracking. Tracking objects becomes exponentially more difficult when the camera itself is moving, like those mounted on a drone which can also be buffeted by wind. While there are lighter-weight trackers embedded in the image processing chips of many aerial gimbals, they often fail, even in conditions they commonly operate in.

 

  • Video Resolution
    Video tracking on standard definition video (640×480, or 1/3 of a megapixel) requires a fraction of the compute (under 4%) that 4K video (3840×2160) needs. In other words, 4K video requires 27x the compute that SD video does. So while an embedded processor might be able to track low-resolution video, tracking with 4K video is well out of range of most embedded processors.

 

  • Size of Moving Objects
    Most video examples where tracking algorithms are demonstrated contain moving objects that are large, upwards of 1/8th of the video frame. That’s relatively easy to do with something like meanshift. What happens if your movers are tiny? For example, if your drone is flying high in the area, or if your mobile phone is far away from your subject, then the things you want to detect and track can be as small as a few pixels. This requires your tracker to be far more sensitive which will likely lead to getting things wrong more often, causing your filtering to work harder.

 

We are familiar with these challenges as we’ve put a lot of work into being able to track moving objects as small as one to two pixels wide as seen in the video below that demonstrates our wide-area tracker. It can simultaneously track hundreds of pixel-sized objects from a very high-resolution video source.

What We’ve Learned Over the Years

Tracking moving objects from moving cameras is extremely difficult. Open source software is great, but it’s not an end solution. Embedded processors are getting better, but don’t expect robust tracking from lightweight algorithms. When you’re researching this type of technology, keep in mind that simply checking the box for “tracking” on a capabilities list doesn’t always mean you’ve solved your tracking problem.