In April, MotionDSP announced Ikena® Cloud, our new API for cloud-based processing of video files and live video streams. Since then we’ve tested it with developers in various use cases, one of which is: pre-processing video to improve image classification results. The results have been promising.
Amazon Rekognition, Google Cloud Vision API, and Microsoft Azure’s Computer Vision API all offer high quality image classification — you give their APIs an image, and they return tags which describe what’s in it.
One challenge we see with our customers is that a lot of their real-world video doesn’t match the video that these training sets were trained on. Like this example:
Solution: use video pre-processing
Ikena Cloud offers any combination of MotionDSP’s 20+ GPU-accelerated image processing filters for video pre-processing, and can output enhanced still images at user-selected intervals, sending the enhanced images to various image classification APIs. Here is one result:
MotionDSP video filters used: automatic brightness, contrast, super-resolution
In the above example, a lot more of the scene was recognized — “guitar,” “musician,” etc., enough that you could argue the tags on the right describe the scene better than the tags on the left.
Here is another example showing what our super-resolution algorithm does to improve OCR accuracy in the Google Cloud Vision API.
MotionDSP video filters used: super-resolution, de-blurring
In the above example, it is not a perfect OCR result, but you can certainly see it is improved — “Oppenheim Schafer” vs “ForReno sTHAFER”
Benefits of video pre-processing
- Benefit to Image Classification APIs: more tags, more accurate results, stronger confidence
- Benefits to Deep Learning Training: reduce the amount of data required for training and inference/prediction
- Send video (file or live stream) to the Ikena Cloud API
- Process the video with a desired preset (combination of one or many of our 20+ GPU-accelerated image processing filters and their settings)
- Output JPEG still images at user-specified time-points
- Send these enhanced images to the Amazon Rekognition API or Google’s Cloud Vision API (we also plan to add Microsoft)
- Compare the classification tag results of the enhanced images vs their un-enhanced originals
This image came from an aircraft using a large gimbal (camera) taking standard definition video over a distance of a few miles. It suffers from atmospheric haze, lighting issues, and compression artifacts. See the difference in tags reported from Amazon’s Rekognition.
To better see what our filters are doing, watch the video below. Notice the difference that the super-resolution filter does to the noise and compression artifacts.
We are just in the early stages of testing, but so far the results are promising. We have also seen benefits to other image classification features, for example using super-resolution to increase detail to improve face sentiment results, face recognition and optical character recognition.