CIO

A practical guide to machine learning in business

Machine learning is poised to have a profound impact on your business but the hype is sowing confusion. Here’s a clear-eyed look at what machine learning is and how it can be used today.

Machine learning is transforming business. But even as the technology advances, companies still struggle to take advantage of it, largely because they don’t understand how to strategically implement machine learning in service of business goals. Hype hasn’t helped, sowing confusion over what exactly machine learning is, how well it works and what it can do for your company.

Here, we provide a clear-eyed look at what machine learning is and how it can be used today.

What is machine learning?

Machine learning is a subset of artificial intelligence that enables systems to learn and predict outcomes without explicit programming. It is often used interchangeably with the term AI because it is the AI technique that has made the greatest impact in the real world to date, and it's what you're most likely to use in your business. Chatbots, product recommendations, spam filters, self-driving cars and a huge range of other systems leverage machine learning, as do “intelligent agents” like Siri and Cortana.

[ Find out whether your organization is truly ready for taking on artificial intelligence projects and which deep learning network is best for your organization. | Get the latest insights with our CIO Daily newsletter. ]

Instead of writing algorithms and rules that make decisions directly, or trying to program a computer to “be intelligent” using sets of rules, exceptions and filters, machine learning teaches computer systems to make decisions by learning from large data sets. Rule-based systems quickly become fragile when they have to account for the complexity of the real world; machine learning can create models that represent and generalize patterns in the data you use to train it, and it can use those models to interpret and analyze new information.

Machine learning is suitable for classification, which includes the ability to recognize text and objects in images and video, as well as finding associations in data or segmenting data into clusters (e.g., finding groups of customers). Machine learning is also adept at prediction, such as calculating the likelihood of events or forecasting outcomes. Machine learning can also be used to generate missing data; for example, the latest version of CorelDRAW uses machine learning to interpolate the smooth stroke you’re trying to draw from multiple rough strokes you make with the pen tool.

At the heart of machine learning are algorithms. Some, such as regressions, k-means clustering and support vector machines, have been in use for decades. Support vector machines, for example, use mathematical methods for representing how a dividing line can be drawn between things that belong in separate categories. The key to effective use of machine learning is matching the right algorithm to your problem.

Neural networks

A neural network is a machine learning algorithm built on a network of interconnected nodes that work well for tasks like recognizing patterns.

Neural networks aren’t a new algorithm, but the availability of large data sets and more powerful processing (especially GPUs, which can handle large streams of data in parallel) have only recently made them useful in practice. Despite the name, neural networks are based only loosely on biological neurons. Each node in a neural network has connections to other nodes that are triggered by inputs. When triggered, each node adds a weight to its input to mark the probability that it does or doesn’t match that node’s function. The nodes are organized in fixed layers that the data flows through, unlike the brain, which creates, removes and reorganizes synapse connections regularly.

Deep learning

Deep learning is a subset of machine learning based on deep neural networks. Deep neural networks are neural network that have many layers for performing learning in multiple steps. Convolutional deep neural networks often perform image recognition by processing a hierarchy of features where each layer looks for more complicated objects. For example, the first layer of a deep network that recognizes dog breeds might be trained to find the shape of the dog in an image, the second layer might look at textures like fur and teeth, with other layers recognizing ears, eyes, tails and other characteristics, and the final level distinguishing different breeds. Recursive deep neural networks are used for speech recognition and natural language processing, where the sequence and context are important.

There are many open source deep learning toolkits available that you can use to build your own systems. Theano, Torch and Caffe are popular choices, and Google’s TensorFlow and Microsoft Cognitive Toolkit let you use multiple servers to build more powerful systems with more layers in your network.

Microsoft’s Distributed Machine Learning Toolkit packages up several of these deep learning toolkits with other machine learning libraries, and both AWS and Azure offer VMs with deep learning toolkits pre-installed.

Machine learning in practice

Machine learning results are a percentage certainty that the data you’re looking at matches what your machine learning model is trained to find. So, a deep network trained to identify emotions from photographs and videos of people’s faces might score an image as “97.6% happiness 0.1% sadness 5.2% surprise 0.5% neutral 0.2% anger 0.3% contempt 0.01% disgust 12% fear.” Using that information means working with probabilities and uncertainty, not exact results.

Probabilistic machine learning uses the concept of probability to enable you to perform machine learning without writing algorithms at all. Instead of the set values of variables in standard programming, some variables in probabilistic programming have values that fall in a known range and others have unknown values. Treat the data you want to understand as if it was the output of this code and you can work backwards to fill in what those unknown values would have to be to produce that result. With less coding, you can do more prototyping and experimenting; probabilistic machine learning is also easier to debug.

This is the technique the Clutter feature in Outlook uses to filter messages that are less likely to be interesting to you based on what messages you’ve read, replied to and deleted in the past. It was built with Infer.NET, a .NET framework you can use to build your own probabilistic systems.

Cognitive computing is the term IBM uses for its Watson offerings, because back in 2011 when an earlier version won Jeopardy, the term AI wasn't fashionable; over the decades it’s been worked on, AI has gone through alternating periods of hype and dismissal.

Watson isn't a single tool. It's a mix of models and APIs that you can also get from other vendors such as Salesforce, Twilio, Google and Microsoft. These give you so-called “cognitive” services, such as image recognition, including facial recognition, speech (and speaker) recognition, natural language understanding, sentiment analysis and other recognition APIs that look like human cognitive abilities. Whether it's Watson or Microsoft's Cognitive Services, the cognitive term is really just a marketing brand wrapped around a collection of (very useful) technologies. You could use these APIs to create a chatbot from an existing FAQ page that can answer text queries and also recognise photos of products to give the right support information, or use photos of shelf labels to check stock levels.

Many “cognitive” APIs use deep learning, but you don’t need to know how they’re built because many work as REST APIs that you call from your own app. Some let you create custom models from your own data. Salesforce Einstein has a custom image recognition service and Microsoft’s Cognitive APIs let you create custom models for text, speech, images and video.

That’s made easier by transfer learning, which is less a technique and more a useful side effect of deep networks. A deep neural network that has been trained to do one thing, like translating between English and Mandarin, turns out to learn a second task, like translating between English and French, more efficiently. That may be because the very long numbers that represent, say, the mathematical relationships between words like big and large are to some degree common between languages, but we don’t really know.

Transfer learning isn't well understood but it may enable you to get good results from a smaller training set. The Microsoft Custom Vision Service uses transfer learning to train an image recognizer in just a few minutes using 30 to 50 images per category, rather than the thousands usually needed for accurate results.

Build your own machine learning system

If you don’t want pre-built APIs, and you have the data to work with, there’s an enormous range of tools for building machine learning systems, from R and Python scripts, to predictive analytics using Spark and Hadoop, to specific AI tools and frameworks.

Rather than set up your own infrastructure, you can use machine learning services in the cloud to build data models. With cloud services you do not need to install a range of tools. Moreover, these services build in more of the expertise needed to get successful results.

Page Break

Amazon Machine Learning offers several machine learning models you can use with data stored in S3, Redshift or R3, but you can’t export the models, and the training set size is rather limited. Microsoft’s Azure ML Studio has a wider range of algorithms, including deep learning, plus R and Python packages, and a graphical user interface for working with them. It also offers the option to use Azure Batch to periodically load extremely large training sets, and you can use your trained models as APIs to call from your own programs and services. There are also machine learning features such as image recognition built into cloud databases like SQL Azure Data Lake, so that you can do your machine learning where your data is.

Supervised learning

Many machine learning techniques use supervised learning, in which a function is derived from labelled training data. Developers choose and label a set of training data, set aside a proportion of that data for testing, and score the results from the machine learning system to help it improve. The training process can be complex, and results are often probabilities, with a system being, for example, 30 percent confident that it has recognized a dog in an image, 80 percent confident it’s found a cat, and maybe even 2 percent certain it’s found a bicycle. The feedback developers give the system is likely a score between one and zero indicating how close the answer is to correct.

It’s important not to train the system too precisely to the training data; that’s called overfitting and it means the system won’t be able to generalize to cope with new inputs. If the data changes significantly over time, developers will need to retrain the system due to what some researchers refer to as “ML rot.”

Machine learning algorithms — and when to use them

If you already know what the labels for all the items in your data set are, assigning labels to new examples is a classification problem. If you’re trying to predict a result like the selling price of a house based on its size, that’s a regression problem because house price is a continuous rather than discrete category. (Predicting whether a house will sell for more or less than the asking price is a classification problem because those are two distinct categories.)

If you don’t know all the labels, you can’t use them for training; instead, you score the results and leave your system to devise rules that make sense of the answers it gets right or wrong, in what’s known as unsupervised learning. The most common unsupervised learning algorithm is clustering, which derives the structure of your data by looking at relationships between variables in the data. Amazon’s product recommendation system that tells you what people who bought an item also bought uses unsupervised learning.

With reinforcement learning, the system learns as it goes by seeing what happens. You set up a clear set of rewards so the system can judge how successful its actions are. Reinforcement learning is well suited to game play because there are obvious rewards. Google’s DeepMind AlphaGo used reinforcement learning to learn Go, Microsoft’ Project Malmo system allows researchers to use Minecraft as a reinforcement learning environment, and a bot built with OpenAI’s reinforcement learning algorithm recently beat several top-ranked players at Valve’s Dota 2 game.

The complexity of creating accurate, useful rewards has limited the use of reinforcement learning, but Microsoft has been using a specific form of reinforcement learning called contextual bandits (based on the concept of a multi-armed slot machine) to significantly improve click-through rates on MSN. That system is now available as the Microsoft Custom Decision Service API. Microsoft is also using a reinforcement learning system in a pilot where customer service chatbots monitor how useful their automated responses are and offer to hand you off to a real person if the information isn’t what you need; the human agent also scores the bot to help it improve.

Combining machine learning algorithms for best results

Often, it takes more than one machine learning method to get the best result; ensemble learning systems use multiple machine learning techniques in combination. For example, the DeepMind system that beat expert human players at Go uses not only reinforcement learning but also supervised deep learning to learn from thousands of recorded Go matches between human players. That combination is sometimes known as semi-supervised learning.

Similarly, the machine learning system that Microsoft Kinect uses to recognize human movements was built with a combination of a discriminative system — to build that Microsoft rented a Hollywood motion-capture suite, extracted the position of the skeleton and labelled the individual body parts to classify which of the various known postures it was in — and a generative system, which used a model of the characteristics of each posture to synthesize thousands more images to give the system a large enough data set to learn from.

Predictive analytics often combines different machine learning and statistical techniques; one model might score how likely a group of customers is to churn, with another model predicting which channel you should use to contact each person with an offer that might keep them as a customer.

Navigating the downsides of machine learning

Because machine learning systems aren't explicitly programmed to solve problems, it’s difficult to know how a system arrived at its results. This is known as a “black box” problem, and it can have consequences, especially in regulated industries.

As machine learning becomes more widely used, you’ll need to explain why your machine learning-powered systems do what they do. Some markets — housing, financial decisions and healthcare — already have regulations requiring you to give explanations for decisions. You may also want algorithmic transparency so that you can audit machine learning performance. Details of the training data and the algorithms in use isn’t enough. There are many layers of non-linear processing going on inside a deep network, making it very difficult to understand why a deep network is making a particular decision. A common technique is to use another machine learning system to describe the behavior of the first.

You also need to be aware of the dangers of algorithmic bias, such as when a machine learning system reinforces the bias in a data set that associates men with sports and women with domestic tasks because all its examples of sporting activities have pictures of men and all the people pictured in kitchens are women. Or when a system that correlates non-medical information makes decisions that disadvantage people with a particular medical condition.

Machine learning can only be as good as the data it trains on to build its model and the data it processes, so it’s important to scrutinize the data you’re using. Machine learning also doesn't understand the data or the concepts behind it the way a person might. For example, researchers can create pictures that look like random static but get recognized as specific objects.

There are plenty of recognition and classification problems that machine learning can solve more quickly and efficiently than humans, but for the foreseeable future machine learning is best thought of as a set of tools to support people at work rather than replace them.

Related articles