Posts Taged image-recognition

Using MotionDSP’s video processing filters to improve deep learning-based image classification results

In April, MotionDSP announced Ikena® Cloud, our new API for cloud-based processing of video files and live video streams. Since then we’ve tested it with developers in various use cases, one of which is: pre-processing video to improve image classification results. The results have been promising.

Amazon Rekognition, Google Cloud Vision API, and Microsoft Azure’s Computer Vision API all offer high quality image classification — you give their APIs an image, and they return tags which describe what’s in it.

One challenge we see with our customers is that a lot of their real-world video doesn’t match the video that these training sets were trained on. Like this example:

If you look carefully, you can see what’s in the video, but a classification engine may not. Google/Amazon/Microsoft could expand their training sets to include dark versions of all their images, but that would increase the complexity of their training. And then even more complexity if you want to add all the other possible environmental conditions — overly bright scenes, haze, etc.

Solution: use video pre-processing

Ikena Cloud offers any combination of MotionDSP’s 20+ GPU-accelerated image processing filters for video pre-processing, and can output enhanced still images at user-selected intervals, sending the enhanced images to various image classification APIs. Here is one result:

MotionDSP video filters used: automatic brightness, contrast, super-resolution


In the above example, a lot more of the scene was recognized — “guitar,” “musician,” etc., enough that you could argue the tags on the right describe the scene better than the tags on the left.

Here is another example showing what our super-resolution algorithm does to improve OCR accuracy in the Google Cloud Vision API.

 MotionDSP video filters used: super-resolution, de-blurring


In the above example, it is not a perfect OCR result, but you can certainly see it is improved — “Oppenheim Schafer” vs “ForReno sTHAFER”

Benefits of video pre-processing

  1. Benefit to Image Classification APIs: more tags, more accurate results, stronger confidence
  2. Benefits to Deep Learning Training: reduce the amount of data required for training and inference/prediction


The workflow

  1. Send video (file or live stream) to the Ikena Cloud API
  2. Process the video with a desired preset (combination of one or many of our 20+ GPU-accelerated image processing filters and their settings)
  3. Output JPEG still images at user-specified time-points
  4. Send these enhanced images to the Amazon Rekognition API or Google’s Cloud Vision API (we also plan to add Microsoft)
  5. Compare the classification tag results of the enhanced images vs their un-enhanced originals


Last Example

This image came from an aircraft using a large gimbal (camera) taking standard definition video over a distance of a few miles. It suffers from atmospheric haze, lighting issues, and compression artifacts. See the difference in tags reported from Amazon’s Rekognition.

To better see what our filters are doing, watch the video below. Notice the difference that the super-resolution filter does to the noise and compression artifacts.

We are just in the early stages of testing, but so far the results are promising. We have also seen benefits to other image classification features, for example using super-resolution to increase detail to improve face sentiment results, face recognition and optical character recognition.

If you have data you are interested in pre-processing with our cloud API, Ikena Cloud, get in touch with us by signing up for our private beta.












Read More