TDWI Upside - Where Data Means Business

At re:Invent, Amazon Enhances AWS Yet Again

At the re:Invent conference, Amazon announced a slew of new products, services, and features, including a technology for querying against its S3 storage service and a new AI framework.

At Amazon's re:Invent conference, Amazon announced a slew of new products, services, and features, including a technology (Athena) for querying against its S3 storage service, a new artificial intelligence (AI) framework -- which includes "Lex," the brains behind its popular Alexa personal assistant -- and the availability of FPGA and GPU compute instances for Amazon Web Services (AWS).

By all accounts, AWS is the heavyweight champ of cloud services, but this year's re:Invent showed Amazon isn't taking anything for granted.

At Long Last: Interactive Query for Amazon S3

Data warehouse and big data vendors tell Upside that Amazon's Scalable Storage Service (S3) is used for a range of data management-like use cases, including data archiving and, increasingly, elastic persistence for data lake initiatives. Some vendors (e.g., Greenplum and Teradata) already support live query against S3. In Teradata's case, it uses the open source Presto SQL interpreter to query data in S3. Amazon -- presto! -- is taking a similar approach with Athena.

Athena supports interactive SQL query on a pay-per-query basis at the rate of $5 per TB scanned. It isn't a turnkey query service, however. Someone (i.e., an IT-someone) still has to define a schema. (It is SQL query, after all.)

As cloud services go, Athena isn't unique, either: it bears a striking resemblance to Google's BigQuery, which -- not coincidentally -- is also priced at $5 per TB scanned. There's at least one critical difference, however. BigQuery users must first import their data from Google Storage into BigQuery. With Athena, once you define schema, you can query immediately against S3.

Athena could cannibalize some of Amazon's other offerings: namely Redshift, its massively parallel processing (MPP) database in the cloud, Aurora, its cloud-optimized MySQL clone. Prior to Athena, if analysts or data scientists wanted to query against data in S3, they had to spin up a Redshift or Aurora instance and move data from S3 into either database engine. Now, they can query S3 directly.

Athena gives users a means to easily get data out of S3, too. Teradata uses Presto to extract data from S3 -- and, in effect, to perform data processing (joins and other transformations) in situ, using Amazon's Elastic MapReduce (EMR) service. In other words, instead of moving a large set of data en bloc from S3 to another cloud service (or context) or to the on-premises enterprise, Teradata uses a SQL query to extract a much smaller subset of data.

It's possible to code your own solution for something like this (as Greenplum, Teradata, and other vendors did); in Athena, however, Amazon offers a Web service that does the same thing. It's a more elegant solution to the problem.

That said, it isn't completely clear how Athena handles data processing workloads. S3 is a storage-only service. It doesn't have a baked-in execution engine. Presto, by contrast, requires a separate execution engine. Teradata's Presto-based implementation uses EMR – Amazon's Hadoop-as-a-service offering – as a compute engine. A cursory look reveals Athena is using Apache Hive data definition statements, which means it, too, is probably using Hadoop/Hive as a compute engine via EMR.

AI-as-a-Service

In Greek mythology, Athena sprang fully formed from Zeus' head. The same can't be said of Amazon's Athena, which relies on a technology (Presto) that first incubated at Facebook.

It's a mostly apt description of Amazon's new AI offerings, however.

In a single week, Amazon went from AI-zero to AI-hero. Before re:Invent, Amazon had a single AI-oriented service, Amazon Machine Learning. After re:Invent, it had four AI-oriented services.

The three newcomers are:

  • Amazon Polly: AI text-to-speech. Amazon's offering it for free for up to five million characters each month. Thereafter, each character will cost you a vanishingly small fraction of a cent. Polly isn't Stephen Hawking-like text to speech. Hers is a human-sounding voice, and not just a single voice, either. Amazon says Polly can speak in any of 47 "lifelike" voices in 24 languages.

  • Amazon Rekognition: AI image recognition. Amazon says Rekognition is capable of identifying faces and objects in different scenes. Rekognition is priced on a per-objects-analyzed basis; Amazon also charges for face metadata.

  • Amazon Lex: The brains behind Amazon's hit Alexa personal assistant. The Lex service will permit developers to incorporate conversational capabilities into apps that use voice and text.

FPGA and GPU Instances, Too

FPGAs (field-programmable gate arrays) are special-purpose processors that instantiate in hardware features or capabilities that would run much slower in software. GPUs (graphics-processing units) are used for certain kinds of massively parallel processing workloads.

As of re:Invent, existing users of Amazon's Elastic Compute Cloud (EC2) can augment their existing EC2 instances with a new instance type -- "F1," in Amazon's lexicon -- featuring FPGA capabilities. Virtual GPU -- "G1" -- instances are available, too.

FPGAs, in particular, have applications for conventional and advanced analytics, e.g., crunching massive data sets, accelerating iterative cycles, and powering deep learning and/or neural network analytics. Data warehouse appliance specialist Netezza used to use FPGAs (or "Snippet Processing Units") in its appliance systems, for example.

An FPGA service would permit developers -- and/or data scientists -- to program their own custom FPGAs for specific workloads. Amazon touts several proposed use cases for its F1 instances, including financial analytics and big data search/analysis.

Amazon made one other notable data management-related announcement at re:Invent. Aurora, Amazon's MySQL database service, now supports PostgreSQL. It was a long time coming, inasmuch as Amazon already supports PostgreSQL via its Relational Database Service (RDS).

Collectively, Amazon's new feature and service blitz at re:Invent should make AWS an even more compelling destination for traditional data management workloads and advanced analytics.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.