Artificial Intelligence

The Complete Guide To Artificial Intelligence and Deep Learning For Machine Vision Systems – Metrology and Quality News

Author: Earl Yardley – Industrial Vision System

Artificial intelligence (AI) for machine vision systems is now a readily available technology. This is an area with much hype due to the introduction of AI language models such as ChatGPT and AI image models like DALLE. For vision systems, the mass use of AI is starting to take hold, but what’s it all about? And is it viable?

Where are we at?

Let’s start at the beginning. The term ‘AI’ is an addendum for a process that has been around for the last twenty years in machine vision but has only now become more prolific (or should we say, the technology has matured to the point of mass deployment). We provided vision systems in the early 00’s using low-level neural networks for feature differentiation and classification. These were primarily used for simple segmentation and character verification. These networks were basic and had limited capability. Nonetheless, the process we used then to train the network is the same as it is now, but on a much larger scale. Back then, we called it a ‘neural network’; now it’s termed artificial intelligence.

The term artificial intelligence is coined as the network has some level of ‘intelligence’ to learn by example and expand its knowledge on iterative training levels. The initial research into computer vision AI discovered that human vision has a hierarchical structure on how neurons respond to various stimuli. Simple features, such as edges, are detected by neurons, which then feed into more complex features, such as shapes, which finally feed into more complex visual representations. This builds individual blocks into ‘images’ the brain can perceive. You can think of pixel data in industrial vision systems as the building blocks of synthetic image data. Pixel data is collected on the image sensor and transmitted to the processor as millions of individual pixels of differing greyscale or colour. For example, a line is simply a collection of like-level pixels in a row with edges that transition pixel by pixel in grey level difference to an adjacent location.

Neural networks AI vision attempts to recreate how the human brain learns edges, shapes and structures. Neural networks consist of layers of processing units. The sole function of the first layer is to receive the input signals and transfer them to the next layer and so on. The following layers are called ‘hidden layers’, because they are not directly visible to the user. These do the actual processing of the signals. The final layer translates the internal representation of the pattern created by this processing into an output signal, directly encoding the class membership of the input pattern. Neural networks of this type can realise arbitrarily complex relationships between feature values and class designations. The relationship incorporated in the network depends on the weights of the connections between the layers and the number of layers – and can be derived by special training algorithms from a set of training patterns. The end result is a ‘web’ of connections between the networks in a non-linear fashion. All of this is to try and mimic how the brain operates in learning.

Neural network learning

With this information in hand, vision engineering developers have been concentrating their efforts on digitally reproducing human neural architecture, which is how the development of AI came into being. When it comes to perceiving and analysing visual stimuli, computer vision systems employ a hierarchical approach, just like their biological counterparts do. Traditional machine vision inspection continued in this vein, but AI requires a holistic view of all the data to compare and develop against.

So, the last five years have seen an explosion in the development of artificial intelligence, deep learning vision systems. These systems are based on neural network development in computer vision in tandem with the development of AI in other software engineering fields. All are generally built on an automatic differentiation library to implement neural networks in a simple, easy-to-configure solution. Deep learning AI vision solutions incorporate artificial intelligence-driven defect finding, combined with visual and statistical correlations, to help pinpoint the core cause of a quality control problem.

To keep up with this development, vision systems have evolved into manufacturing AI and data gathering platforms, as the need for vision data for training becomes an essential element for all AI vision systems compared to deploying traditional machine vision algorithms. Traditional vision algorithms still need image data for development, but not to the same extent as needed for deep learning. This means vision platforms have developed to integrate visual and parametric data into a single digital thread from the development stage of the vision project all the way through production quality control deployment on the shop floor.

In tandem with the proliferation of the potential use of AI in vision systems is the explosion in suppliers of data-gathering “AI platforms”. These systems are more of a housekeeping exercise for image gathering, segmentation and human classification before the submission to the neural network, rather than being a quantum leap in image processing or a clear differential compared to traditional machine vision. Note: most of these companies have ‘AI’ in their titles. These platforms allow for clear presentation of the images and the easy submission to the network for computation of the neural algorithm. Still, all are based on the same overall architecture.

Deep learning frameworks – Tensorflow or PyTorch – what are they?

Tensorflow and PyTorch are the two main deep learning frameworks developers of machine vision AI systems use. Each was developed by Google and Facebook, respectively. They are used as a base for developing the AI models at a low level, generally with a graphical user interface (GUI) above it for image sorting.

TensorFlow is a symbolic math toolkit that is best suited for dataflow programming across a variety of workloads. It provides many abstraction levels for modelling and training. It’s a promising and rapidly growing deep learning solution for machine vision developers. It provides a flexible and comprehensive ecosystem of community resources, libraries, and tools for building and deploying machine-learning apps. Recently, Tensorflow has integrated Keras into the framework, a precursor to Tensorflow.

PyTorch is a highly optimised deep learning tensor library built on Python and Torch. Its primary use is for applications that use graphics processing units (GPUs) and central processing units (CPUs). Vision system vendors favour PyTorch over other Deep Learning frameworks Keras as it employs dynamic computation networks and is entirely written in Python. It gives researchers, software engineers, and neural network debuggers the ability to test and run sections of the code in real-time. Because of this, users do not have to wait for the entirety of the code to be developed before determining whether or not a portion of the code works.

Whichever solution is used for deployment, the same pros and cons apply in general to machine vision solutions utilising deep learning.

AI Vision Training

What are the pros and cons of using artificial intelligence deep learning in machine vision systems?

Well, there are a few key takeaways when considering using artificial intelligence deep learning in vision systems. These offer pros and cons when considering whether AI is the appropriate tool for industrial vision system deployment.


Industrial machine vision systems must have high yield, so we are talking 100% correct identification of faults, in substitute for accepting some level of false failure rate. There is always a trade-off in this process in setting up vision systems to ensure you err on the side of caution so that real failures are guaranteed to be picked up and automatically rejected by the vision system. But with AI inspection, the yields are far from perfect, primarily because the user has no control of the processing functionality that has made the decision on what is deemed a failure. A result of pass or fail is simply given for the outcome. The neural network having been trained from a vast array of both good and bad failures. This “low” yield (though still above 96%) is absolutely fine for most requirements of deep learning (e.g. e-commerce, language models, general computer vision), but for industrial machine vision, this is not acceptable in most application requirements. This needs to be considered when thinking about deploying deep learning.

It takes a lot of image data. Sometimes thousands, or tens of thousands of training images are required for the AI machine vision system to start the process. This shouldn’t be underestimated. And think about the implications for deployment in a manufacturing facility –you have to install and run the system to gather data before the learning part can begin. With traditional machine vision solutions, this is not the case, development can be completed before deployment, so the time-to-market is quicker.

Most processing is completed on reduced-resolution images. It should be understood that most (if not all) deep learning vision systems will reduce the image size down to a manageable size to process in a timely manner. Therefore, resolution is immediately lost from a mega-pixel resolution image down to a few hundred pixels. Data is compromised and lost.

There are no existing data sets for the specific vision system task. Unlike AI deep learning vision systems used in other industries, such as driverless cars or crowd scene detection, there are usually no pre-defined data sets to work from. In those industries, if you want to detect a “cat”, “dog”, or “human”, there is available data sources for fast-tracking the development of your AI vision system. Invariably in industrial machine vision, we look at a specific widget or part fault with no pre-determined visual data source to refer to. Therefore, the image data has to be collected, sorted and trained.

You need good, bad and test images. You need to be very careful in selecting the images into the correct category for processing. A “bad” image in the “good” pile of 10,000 images is hard to spot and will train the network to recognise bad as good. So, the deep learning system is only as good as the data provided to it for training and how it is categorised. You must also have a set of reference images to test the network with.

You sometimes don’t get definitive data on the reason for failure. You can think of the AI deep learning algorithm as a black box. So, you feed the data in and it will provide a result. Most of the time you won’t know why it passed or failed a part, or for what reason, only that it did. This makes deploying such AI vision systems into validated industries (such as medical devices, life sciences and pharmaceuticals) problematic.

You need a very decent GPU & PC system for training. A standard PC won’t be sufficient for the processing required in training the neural network. While the PC used for the runtime need be nothing special, your developers will need a high graphics memory PC with a top-of-the-range processer. Don’t underestimate this.


They do make good decisions in challenging circumstances. Imagine your part requiring automated inspection has a supplier constantly supplying a changing surface texture for your widget. This would be a headache for traditional vision systems, which is almost unsolvable. For AI machine vision, this sort of application is naturally suited.

They are great for anomaly detection which is out of the ordinary. One of the main benefits of AI-based vision systems is the ability to spot a defect that has never been seen before and is “left field” from what was expected. With traditional machine vision systems, the algorithm is developed to predict a specific condition, be it the grey scale of a feature, the size in pixels which deems a part to fail, or the colour match to confirm a good part. But if your part in a million that is a failure has a slight flaw that hasn’t been accounted for, the traditional machine vision system might miss it, whereas the AI deep learning machine vision system might well spot it.

They are useful as a secondary inspection tactic. You’ve exhausted all traditional methods of image processing, but you still have a small percentage of faults which are hard to classify against your known classification database. You have the image data, and the part fails when it should, but you want to drill down to complete a final analysis to improve your yield even more. This is the perfect scenario for AI deep learning deployment. You have the data, understand the fault and can train on lower-resolution segments for more precise classification. This is probably where AI deep learning vision adoption growth will increase over the coming years.

They are helpful when traditional algorithms can’t be used. You’ve tried all conventional segmentation methods, pixel measurement, colour matching, character verification, surface inspection or general analysis – nothing works! This is where AI can step in. The probable cause of the traditional algorithms not operating is the consistency in the part itself which is when deep learning should be tried.

Finally, it might just be the case that deep learning AI is not required for the vision system application. Traditional vision system algorithms are still being developed. They are entirely appropriate to use in many applications where deep learning has recently become the first point of call for solving the machine vision requirement. Think of artificial intelligence in vision systems as a great supplement and a potential tool in the armoury, but to be used wisely and appropriately for the correct machine vision application.

For more information:



This website uses cookies. By continuing to use this site, you accept our use of cookies.