In a world where information is increasingly comprised of video, image and audio content, Artificial Intelligence that can take action based on image content are increasingly important.
What is Vision AI?
Vision AI is a subset of AI addressing the ability of computers to be visually enabled, so they can support--and even make--decisions based on the content of images.
Vision AI typically provides two distinct types of AI models:
- Classification enables machines to "see" what type of content contained within an image--often classifying images with some taxonomy of categories or tags.
- Object Detection focuses on training a machine to detect specific, predefined objects within an image--typically identifying the presence of an object and specifically where in the image the object(s) was detected.
Off-loading image review and analysis from humans to machines vastly increases the amount of image analysis that can be economically done. Example applications include:
- Image classification
- Image search engines
- Quality assurance
- Text extraction
Microsoft Azure Custom AI Vision
This post uses the Microsoft Azure AI Custom Vision service to develop an example solution relevant to quality assurance.
Azure AI Custom Vision is a version of AI Vision that allows us to train our own models. In this case, we want a model tuned for the types of eggs our egg packaging plant uses, so training our own model will provide the most accurate results.
This rest of this post is available in video format! The written post continues after the embedded video.
Demo Solution
In this post, we'll create a demo solution that uses Azure AI Vision to analyze cartons of eggs to determine how many eggs are contained within.
The following are the types of images we'll train an AI model to analyze:
Once the AI Vision model is developed, we'll use it to detect and count the eggs contained within a carton.
In the US, a carton of eggs should have 12 eggs, and the solution will create an AI model that can count how many eggs are in an image. The practical application may be to identify packing problems without the need for human evaluation.
This approach could easily be extended to other use cases involving other types of products--but using eggs keeps it fun!
Creating an Azure Resource
The first step in creating any Azure AI Vision solution is to create a Custom Vision resource in the Azure Portal.
Custom Vision can be provisioned in a Free tier (F0), which allows a single project and enough capacity for proof-of-concept projects. The Standard tier (S0) provides for multiple projects and greater capacity (10 requests/second) and is suitable for production workloads.
Creating an AI Vision Project
The Custom Vision resource is used to dispatch training and prediction resources, and as a mechanism to meter usage and billing of Azure AI compute.
Creating models and endpoints can be done using custom code (e.g. C#, Python), or using a web-based AI Vision portal. For this project, we'll use the portal so we can completely evaluate how AI Custom Vision will work in this scenario without writing a single line of code!
A project provides a container to collect training data, desired model types (Classification or Object Detection), and orchestrate training of the model and subsequent prediction requests.
In this example, we create an object detection model using the General (Compact) domain.
Tagging Objects
As with most supervised machine learning, a pre-training step is done to add labels to the input data. In this case, the input data are images, and the tags are the object names with the location of each object relative to the top corner of the image. During training, Azure AI Custom Vision uses these regions to learn all the different ways an egg can look in our custom domain.
The Vision portal UI provides an easy to use front-end for selecting and identifying objects in each training image.
In a large-scale solution, custom code could be used to inject labeled training images into the training data repository. But for solutions with modest training data volumes, the portal UI is both convenient and efficient.
Training the Model
Once labeled training images are prepared, the next step is to train the vision model.
Model training options are Quick or Advanced. While advanced training may result in a more accurate AI model, it consumes more compute resources and is thus more expensive. Often, it's advisable to first start with quick, and graduate to advanced as needed.
Training Performance
After training completes, AI Vision reports its internal testing of the model it created. As with any machine learning algorithm, the precision and recall of the model must be considered before putting it into productive use.
For our relatively simple input data (eggs), even a quick training run against a small number of images (17) resulted in a highly accurate object detection model.
Testing the Model
Finally, we can make some quick tests of the AI Custom Vision model before deploying it from the portal.
By uploading images from the local desktop--or providing an image URL--we can see the output of the trained model against the unseen test images we have, and decide whether to deploy the model for production use.
Summary
Vision AI is an important subset of the broader set of AI tools available today. The combination of elastic cloud compute and relatively easy-to-use tooling for design, training and prediction make Vision AI technologies a compelling option when considering how to use AI to deliver impactful, cost-effective data solutions.