AlphaGo, Artificial Intelligence, and a Machine-Learning Renaissance
A recent AlphaGo victory is being touted as a Big Win for artificial intelligence (AI). It's a no less important win for machine learning, which might be in the midst of a renaissance.
- By Steve Swoyer
- February 25, 2016
Last month, machine learning (ML) notched another big win in the form of Google Inc.'s AlphaGo, a product of its London-based DeepMind subsidiary, which specializes in artificial intelligence (AI).
AlphaGo got the better of a professional human player in the ancient game of Go, humbling European Go champion Fan Hui in five games (5-0) under tournament conditions. This wasn't just unprecedented, it was unexpected. Most AI watchers believed it would be a decade or more before a computer could reliably beat a top-flight human opponent in Go, which has many more possible positions (2.082 × 10^170) than there are atoms in the universe (approximately 10^80).
AlphaGo isn't technically AI -- or, more precisely, isn't artificial general intelligence (AGI), a kind of posited higher AI that theorists say would be able to perform any human intellectual task, including self-awareness. The case for AGI goes back as far as the 1930s and 1940s. It's grounded in the seminal work of mathematicians Alonzo Church and Alan Turing.
It wasn't until thirty years ago that AGI achieved theoretical coherence, however. In 1985, physicist David Deutsch published a proof of the so-called Church-Turing Conjecture, which holds that any cardinal function that can be computed by a universal Turing machine (i.e., a general-purpose computer) can also be computed by a human being -- and vice-versa (see Note 1). Deutsch attributes this to the law of the "universality of computation." The general-purpose computer he has in mind is (or will be) of the quantum variety.
In computer science, a stronger form of this conjecture has since been codified. It's known as the Church-Turing-Deutsch principle. It says that any physical process -- such as human cognition -- can be emulated (in what we would call "software") on a general-purpose computer. In other words, not just a kind of consciousness but a capacity for reflective self-consciousness, too.
AlphaGo has neither. According to Google's DeepMind researchers, it combines machine learning in the form of Monte Carlo tree search algorithms and "deep" neural networks that have been "trained by supervised learning." (AlphaGo's "training" regimen also involved "learning" from human expert play and self-play.)
AlphaGo achieves a kind of "AI" in a loose sense. For example, it has the capacity to "learn" from its own mistakes, as well as from the mistakes and successes of others. It likewise has the capacity to change and improve itself by virtue of its self-training ML algorithms. This doesn't make it an AGI, but, instead, a specific program designed for a specific task. AlphaGo doesn't actually "learn" because "it" doesn't actually think. There's no "it-self" -- i.e., a reflective awareness of itself-as-object, a prerequisite for conceptual knowledge -- in AlphaGo's world. It doesn't have a world because "world" is an abstract concept.
That said, AlphaGo is an impressive achievement as well as an endorsement for machine learning, which -- thanks to the twin forces of commodification and increasing specialization -- might be in the midst of a renaissance. ML, once the province of the black-bespectacled, pocket-protected geek class, has a kind of irresistible cachet: from hot startups (such as BigML, H20.ai, MetaMind, Predixion, and Skytree, to name just a few) to deep-pocketed computing giants (HP Labs, IBM Research, and Microsoft Research, among others) to established business intelligence (BI) and data warehousing (DW) players -- e.g., Actian, IBM Cognos, MicroStrategy, Pivotal, SAS Institute, and Teradata -- just about everybody has some kind of presence in machine learning.
The latest wrinkle is highly parallelized ML. All of the big, massively parallel processing (MPP) data warehouse vendors tout the ability to run in-database machine learning algorithms in parallel, across 10, 100, or in some cases thousands of clustered nodes. Late last year, Pivotal (which has since been acquired by Dell) donated its MADlib machine learning framework to the Apache Software Foundation. Pivotal says it developed MADlib to run in parallel on its Greenplum MPP database, as well as on HAWQ, a port of Greenplum to the Hadoop general-purpose parallel processing environment.
ML isn't just an MPP data warehousing play, however. Many machine-learning start-ups tout similar capabilities. Take H2O.ai, which says the combination of machine learning and general-purpose parallel-processing technology can radically accelerate the process of building, testing, and training machine-learning models. H2O.ai can run on standalone (in its own parallel context), in the context of a Hadoop cluster, or use the Spark cluster computing framework's machine-learning library, Mlib.
"We took a lot of the classic algorithms written by academics and mathematicians in the 1980s and 1990s and we rewrote them in Java and rewrote them in in-memory MapReduce. What this does is it lets you basically throw more machines at the problem. Memory is cheap, storage is cheap. Nowadays, you can actually spin up a 100-node cluster, take a terabyte of data, and build, test, and train your [machine-learning] models really quickly," he says.
Best of all, Iyengar claims, this kind of ML parallelism isn't a technology "solution" in search of a use case. Many organizations -- H2O.ai counts "tens" of customers, he says -- are already heavy-duty users of parallel ML.
"That's the art of data science. You're looking at thousands of variables and it's impossible to know what a lot of these features mean. Back in the predictive age, you had a much smaller [set of variables] to work with: age, demographics, and gender, for example, but if you're looking at social data or interaction data, every swipe could mean something. We have deep-running [neural networks], ensembles, GBM [i.e., gradient-boosting machine learning] -- we have all of these algorithms," he concludes.
"Traditionally, if you had to do this on a large data set, it would take a very long time. Now, it will take minutes. You can try a whole bunch of models -- typically it's not uncommon for a data scientist to have hundreds of models -- and rapidly test them [to see which ones fail]. One of our customers, Cisco, built a modeling factory. They're producing 60,000 models every single day to find the best one for their needs. They can do that because building the models is very cheap. They can actually afford to build a lot of models and throw most of them away."
Note 1: This ignores finite human limitations such as mortality. Factoring large prime numbers, such as the 22-million digit Mersenne prime just discovered, would be a practically impossible task for a finite human being or team of human beings.
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at firstname.lastname@example.org.