14 Jan Computer Vision: What It Is, How It Works and What It Does
What is Computer Vision?
Computer Vision is a rapidly advancing technology that aims to enable computers to automatically gain a high-level understanding of image and video content through Machine Learning and Deep Learning techniques, so they can see and interpret things the way humans do.
This technology goes beyond standard image processing and can be used to identify, track, detect, and classify objects in images and videos. Thus, the amount of information Computer Vision can provide about the world around us is massive, and that’s why computer scientists have been hard at work, extending vision to computers for over half a century.
So, computers being able to see as we do is pretty nifty and all, but to truly grasp how big a deal this is, it really helps first to understand how we do this.
How Human Vision Works and Carrying that Over to Machines
While we usually take it for granted, Human Vision is as intricate as it is immaculate. To keep things simple, we have eyes that capture light, photo-receptors that convert light into a form the brain can process, and finally, the visual cortex in the brain that does all the processing.
Thus, a lot is going on behind the scenes, even for doing something as simple as watching a movie, stepping across obstacles, or admiring the sunset. Over the years, we’ve successfully transferred components of Human Vision over to machines. The pinhole camera dates back to the 11th century, then came the first photographic cameras (19th century), and presently we have highly advanced versions that can capture photos digitally.
However, capturing images is only one piece of the puzzle. Now comes the tricky part of having machines understand what’s actually in the said image. As opposed to traditional computers, our visual cortex can interpret images in a snap, and we instinctively know what’s happening in the image. This boils down to context, and we edge out computers by an eon’s worth of evolutionary context. Thus, bringing machines up to speed is one of the major challenges of Computer Vision.
How Computer Vision Works
When a computer sees an image of, let’s say, a cat, it only sees a massive array of numbers that represent intensities and colors of the image, but not the image itself. As we said, there’s no context, and that’s what’s needed to get algorithms to comprehend an image the same way a human does.
Enter Machine Learning. Machine Learning leverages various statistical techniques to allow algorithms to learn and improve as they come across new data, so they’re eventually able to decipher what numbers in a particular sequence actually mean. In the case of Computer Vision, this data is image data.
By feeding a Machine Learning model enough data, we can make it highly accurate. Presently, Computer Vision accuracy goes toe-to-toe with humans when it comes to image recognition. These typically rely on deep neural network models like Convolutional Neural Networks and Recurrent Neural Networks. Let’s talk about them.
Convolutional Neural Networks (CNN)
Like all neural network architectures, CNNs are modeled loosely after biological neural networks and learn to perform tasks through examples. Typical applications of CNNs in the context of Computer Vision include image classification, object detection, and object tracking.
CNNs are so called because of their various hidden convolution layers. Each convolution layer can have several filters for detecting distinct features in the image. The input image is broken down into pixels and passed through the filters in each convolution layer. Layers at the start of the CNN usually detect simpler features (i.e., shapes, edges, curves, etc.) while deeper layers identify highly sophisticated objects (eyes, ears, faces, cats, dogs).
Labeled training data is used to improve a CNN’s accuracy iteratively. When a CNN starts, initially, all filter values are randomized, and predictions tend to be way off. However, with each iteration, filter values are adjusted, so any error is minimized, and predictions are able to approach the image’s actual label.
Recurrent Neural Networks (RNN)
On the surface, videos are basically a series of moving images. However, to analyze them, it’s essential to consider the context in a single image frame (spatial) as well as the context between frames (temporal). Architectures like CNNs can only pick up on spatial features, and this renders them unsuitable for videos where temporal features are also important.
This is where RNNs come in. They essentially build upon CNNs and are used to analyze videos. Unlike a CNN, an RNN can retain information that it’s already processed and use this temporal context to make better predictions.
Real-World Applications of Computer Vision
Computer Vision is crucial to a number of current and emerging technologies. Let’s take a look at some of its most popular real-world applications.
According to the WHO Global status report on road safety 2018, roughly 1.35 million people died due to road traffic injuries. Automobile engineers have been busy making our cars safer, and many modern accident prevention features rely on Computer Vision. Most new vehicles have specialized sensors built-in that can detect pedestrians, motorists, and other objects in the vehicle’s vicinity. Computer Vision helps them better comprehend their surroundings, warns the driver when an accident is likely, and can even automatically brake if there’s an obstacle or pedestrian up ahead.
Some companies like Tesla, Uber, and Waymo are taking this even further with self-driving cars. While we’ve seen already seen autonomous cars in action, and they’re pretty cool, immediate widespread adoption seems unlikely owing to regulatory approval processes, safety concerns, and cultural norms, among other reasons.
Currently, Computer Vision benefits various areas of healthcare, including medical imaging, active health monitoring, and diagnostics. The technology serves as a valuable tool for healthcare personnel, as it augments their capabilities and equips them to provide better care for their patients.
Computer Vision helps doctors track their patient’s condition more accurately and helps take out the guesswork. Gauss Solutions has developed an FDA-approved blood monitoring solution that scans images of blood-soaked sponges and suction canisters and uses facial recognition to actively predict the onset of postpartum hemorrhage, one of the most preventable causes of maternal death, in real-time.
In medical imaging, Computer Vision can be used to make images, MRI, and X-Ray scans more interactive, and provide doctors with more context to better understand and interpret their patient’s condition.
Whether it’s textiles, food processing, or pharmaceuticals, Computer Vision has quickly found its place in industry settings. It has completely revolutionized production lines and supply chains and is helping manufacturers digitize and automate their processes to be safer, more efficient, and intelligent.
Besides ensuring product quality and inspecting packaging, Computer Vision also helps manufacturers keep a close eye on the health of critical infrastructure and equipment, and empowers them to proactively take corrective measures before a breakdown occurs and results in expensive downtime.
Computer Vision has the potential to breathe new life into brick-and-mortar retail stores by allowing them to deliver the personalized and hassle-free shopping experiences that we usually associate with online shopping. From the automation of processes like inventory management, checkout, and customer compliance to providing contextual information about products and tailored product recommendations for each customer, the possibilities are nearly endless.
If you live in Manhattan, San Francisco, Chicago, Seattle, or San Francisco, you can check out your nearest Amazon Go store to see Computer Vision’s applications in retail in all their glory. There are sensors and overhead cameras that detect when items are taken or returned to shelves, and your virtual cart is updated accordingly. Payments are processed automatically, and you get charged via your Amazon account after you leave.
Intagleo Systems has extensive hands-on experience in developing Deep Learning and Computer Vision software solutions for various industry applications. Contact us today for free consultation and learn more about what we can do for you.