Image and Video Understanding: 3 Practical Applications


Image and Video Understanding: 3 Practical Applications


In the previous installation of our series, we explored what image and video understanding is, how it works, and why it’s improved so much in recent years. However, technology on its own rarely improves social and economic outcomes. Rather, it is the useful application of technology and its ability to improve our current processes that make the biggest difference.

Today’s AI and machine learning methods work by automatically inferring relationships between items in a dataset. Although we traditionally think of datasets in terms of excel spreadsheets, machine learning and deep learning have allowed us to adopt a wider definition. We can describe an image using the location and colour of each pixel, and build a dataset by attaching labels to each picture. Based on these relationships, the AI model learns that specific patterns of pixels correspond to cats and others to dogs.

In this article, we will explain some exciting applications of computer vision, and give insight into state of the art performance.

Medical Image Classification

Although it can be fun to sort pictures of cats and dogs into different buckets, the most impactful applications of image classification lie at the edge — in teaching machines to complete classification tasks that currently require human experts.

Let’s take ophthalmology as an example, specifically the process of diagnosing diabetic retinopathy (DR). Is it estimated that up to 40% of Americans diagnosed with diabetes will be affected by DR, and up to 15% will suffer from macular edema, which can result in visual impairment. Like many diseases, DR is best treated if detected early. Despite this, many patients with diabetes don’t consult their eye doctor on a yearly basis.

Computer Vision for Medical Image Analysis

This is a problem well suited for computer vision as the process of screening for DR involves examining images of the retina for abnormalities. A panel of ophthalmologists can create an annotated dataset of eye scans and label each image as either positive or negative for DR. These are then used by the AI model to learn to detect the disease in future scans.

Earlier this year, the FDA granted regulatory approval for a device capable of screening for early symptoms without the involvement of an opthamologist. The device will be able to detect mild DR and refer the patient to see an eye specialist for further treatment. In clinical trials, the company reported an accuracy rate of over 85%, which is higher than that of current generalist physicians. There are already initiatives for incorporating this technology into smartphone cameras to make diagnosing DR even easier.

Semantic Segmentation For Self-Driving Cars

Sometimes just being able to classify an image isn’t all that valuable. Perhaps there are multiple objects in a picture and you’d like to understand where they are in relation to one another, or maybe you’re analyzing a video and would like to understand how things move from one frame to the next. More detailed tasks like these require the computer to recognize and distinguish objects within an image from each other and assign them appropriate labels. This is called semantic segmentation.

The datasets used to train computers on how to perform semantic segmentation are labelled either at the pixel level to capture their shape, or by drawing bounding boxes around objects to indicate their presence. Performance on the most widely used public dataset in this space, Common Objects in Context (COCO), has doubled in accuracy since 2015.

One of the most exciting applications for this technology is in self-driving cars. Human drivers deal with an immense amount of incoming objects when driving and have to know how to react to each of these differently. Autonomous vehicles will need to differentiate trucks from small cars and adult pedestrians from children. These kind of segmentation tasks are enabled through datasets such as Berkeley Deep Drive. In addition to segmentation labels, these datasets contain GPS information about the location of the car and inertial measurement units (IMUs) with information about motion.

Computer Vision for Self-driving Cars

Automakers, technology companies and universities are all racing to be the first to market. The 2018 Audi A8 will be the first car in the world to sport Level 3 autonomy, meaning that in certain conditions drivers will be able to take their hands off the wheel and truly relax. Other systems, such as Tesla’s Autopilot, require the driver to keep their hands on the wheel and remain attentive at all times.

Facial Recognition and Identity Verification

One application of artificial intelligence that’s most frequently seen in science fiction movies is the ability to verify someone’s identity by using facial recognition. Urbanites of the future approach shiny sliding doors, which open and greet them by name. We may not be too far away from this reality, at least in certain situations.

To train this model, researchers feed the AI many images of the same person, so that the algorithm can learn to spot similarities between faces in different angles and lighting. In last year’s Face Recognition Vendor Test, which acts as a benchmark for facial recognition algorithms, the winning team achieved an accuracy rate of 95%. This result is significant, as the dataset contained 1 million previously unseen images, so researchers wouldn’t have been able to optimize their algorithm for this specific dataset.

Facial recognition technology is already being deployed at scale in places like airports to help security officers perform identity verification. In China, facial recognition has been used by 80% of the country’s airports, handling over 30 million passengers a year. The vision is that security checks could one day be completely automatically, creating a more streamlined experience and allowing security officials to focus on high risk cases.

Computer Vision for Identity Verification

Although impressive advances have been made in facial recognition, we should be mindful of its limitations. It is imperative for machine learning practitioners to build robust and appropriate datasets that are representative of the problems they are trying to solve. This is particularly important in facial recognition, as datasets that are inadequately balanced or incomplete can result in bias. Using datasets and algorithms specific to your problem will result in superior results.

Key Takeaways

Although we only touched on 3 concrete use cases in this article, the applications for computer vision are practically limitless. What is important for those looking to implement AI going forward is a strong framework for building appropriate datasets and employing models that are suitable for their own specific problem.

In the next and final article in our series, we will provide a structure on how to think about building these datasets and implementing computer vision solutions.

Stay tuned!

Interested in starting your AI journey? Contact us today.

This site is registered on as a development site.