It may seem hard to believe, but, not so long ago, gathering enough data to provide useful insights was a difficult problem. Today, many businesses have more data than they can deal with. It arrives in enormous volume from customer interactions, social media, sensors, and numerous other sources.
Machine learning — the application of algorithms and systems that can spot patterns and insights in data — helps companies sift through their data for valuable insights. The major benefit of machine learning is that analysts don’t have to know which patterns in the data are useful; they don’t have to tell the machine exactly what to look for.
“All of these things mean it’s possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results – even on a very large scale. The result? High-value predictions that can guide better decisions and smart actions in real time without human intervention.”
Just as with big data, many executives associate machine learning with the cloud, but while virtualized platforms have their strengths, machine learning with large volumes of data is often faster and more economical on bare metal.
Machine learning is being used throughout industry and enterprise, tackling real-time data analytics problems that would have been more expensive and time-consuming to solve with earlier technologies.
When you search Google, the advertising you see has been processed by machine learning algorithms which churn through staggering amount of data to match you with a product. Machine learning technology is used to spot suspicious activity on video, to diagnose diseases with levels of accuracy equal to experienced doctors, to discover new medicines, to improve customer service and marketing, and to automatically make decisions that improve the efficiency of everyday business operations in thousands of companies.
All of that requires extensive server and network infrastructure. However smart the algorithms are, they’re not useful without high-performance computing power and fast IO — neither of which are strengths of infrastructure as a service cloud platforms.
As with all infrastructure platforms, virtual cloud servers have advantages and disadvantages. They’re fast to deploy, but there’s a price to be paid — both financially and in performance. Virtual servers run on physical servers, often many virtual servers to each physical server. No one server can fully take advantage of the capabilities of the physical server — processes are often left waiting for CPU access, which isn’t great if you’re using the server for real time machine learning.
Additionally, most cloud platforms use Storage Area Networks — storage is connected to the server via a network connection. SAN network connections can be fast, but not nearly as fast as storage connected to the server’s internal buses. As with CPU access, network throughput can run into contention issues, seriously degrading performance.
If you’ve ever wondered why a cloud server you were using showed unpredictable degradations in IO performance, this is the likely cause.
Machine learning and big data analytics applications like Apache Spark rely on extremely fast data throughput to provide the best results. The greater the volume of data that can be analyzed in a given time-frame, the better the decisions machine learning tools can make.
The amount of data companies have to deal with isn’t going to decrease any time soon, especially as the Internet of Things grows ever larger. Over the next few years, hundreds of millions of smart devices will provide an influx of data of vast proportions. Machine learning technologies are perfect for handling data of this magnitude.
To get the best bang for their buck, businesses adopting machine learning should ask themselves if it makes sense to pay more for lower performance in the cloud, or to maximize their infrastructure investment by colocating infrastructure they own, they control, and that they can rely on to provide the best performance for machine learning.