Machines start to Learn without Supervision was Written By: John Cowles, Senior Director of Engineering and Technology at Analog Devices, on January 17, 2023.
This is a guest blog, NSTXL takes no ownership of the ideas presented. It is intended to stimulate conversation and discussion within the community.
Recap of Our Journey
In the last post, we discussed Supervised Learning, in which labeled data sets are used to teach a model how to predict outcomes from new data. The more diverse the training set, the better the predictions will be. Since it has been a while, let’s review highlights of Supervised Learning before moving on to Unsupervised Learning.
Let’s say we want to create an application that predicts the price of a home in a neighborhood based on its specific attributes or features, such as number of rooms, size of yard, proximity to the center of town, etc. After deciding on the structure of the model (such as linear regression in this case), a training set is needed to “teach” the model. What does this training data look like? To train the model, we need a curated set of example homes that sold in the area with a wide variety of combinations of attributes AND the actual sale price of each house. This labeled data “teaches” the model the relationship between features and the value that needs predicting. The iterative process of gradient descent tweaks mathematical weights in the model until the cost-function, or the error between its predicted price and the labels, is minimized. The algorithm is then validated with a second training and is ready to be deployed to predict the unknown price of a home in the region given a new set of the desired attributes.
In the previous example, the goal of the algorithm was to predict the house price based on its attributes. What if we wanted instead to predict who would be a better fit for the house: a group of friends, a family of 4 or a retired couple? This becomes a classification problem where the training label is now a class rather than a continuous value like price. The training set would teach the model which features are more or less important to the class. Once trained and validated, the model could predict whether a home is more suitable to one group or another. This type of Supervised Learning algorithm is often used to classify email as span or not based on its content or a movie genre as adventure, romantic comedy, or documentary based on its plot description.
The idea of classification hints at a fundamentally different type of ML algorithms: Unsupervised Learning. In Unsupervised Learning, there are no longer pre-labeled data sets for training; the algorithms instead identify natural groupings within the data, known as Clusters. The process of learning still adjusts model Weights to minimize Cost Functions but the errors are no longer between predictions and labels but instead rely on correlation strengths hidden within the data set. Unsupervised Learning is looking to identify natural relationships that might not be obvious to us, rather than predicting values or classifying. Because the algorithms are looking for underlying pattens in the data without any guidance, it is even more important to have diverse and representative data sets in Unsupervised Learning than in Supervised Learning. When a new data set element appears, the algorithm identifies which cluster is the best fit.
Let’s take a slightly different slant on the Supervised Learning example of predicting house prices based on a set of features and a price-labeled training data set. This time there are no price labels, just a list of houses with their features from which we want to identify patterns. A Clustering algorithm might find that houses naturally fall in 3 categories, perhaps because yard size, number of rooms and other features are highly correlated and fall into what we might intuitively guess are cheap, moderately priced, and expensive homes. Visually, you might think of points on a graph that naturally fall in distinct groups or clusters. However, the algorithms do not know ahead of time what clusters will emerge, nor what they mean. It is up to the human intelligence to interpret what the clusters represent and give them meaningful labels, if needed.
Besides identifying associations, the process of clustering usually exposes a set of strong correlations that allows data sets to be reduced without losing important information. Reducing data sets either in terms of features or number of entries means that less computational power is needed to extract useful information. Culstering algorithms are heavily used by e-businesses to identify classes of customer shopping behaviors. Interesting extensions of these models identify associations or correlations in the data as well as spotting exceptions, known as anomaly detection. Correlations between what shoppers buy is the basis for recommender systems that tailor product suggestions based on past shopping behaviors. Note that Unsupervised Learning can be used to assist Supervised Learning, for example, when labeled data might be hard to collect. The clustering algorithm extracts labels and attaches them to the data set to then train a classification or regression algorithms.
Now that we have talked about Supervised and Unsupervised Learning, we reach the last major category of ML known as Reinforcement Learning. We now enter the realm of true Artificial Intelligence. This is the world of adventure games and autonomous driving where the models exist within and interact with a larger environment. We will explore the basics of Reinforcement Learning in the next post.