Cloud Data Integration and the Role of REST
The writing's on the wall: the application infrastructures of the not-so-distant future will be largely REST-ified. This means on-premises IT resources will be augmented by RESTful middleware to expose them to other (internal and external) REST services
- By Stephen Swoyer
- September 22, 2015
In the cloud, it's all about APIs, or application programming interfaces. Rich and robust APIs can make or break a successful cloud platform.
In the same way, rich and robust APIs can make or break how (or in what ways) enterprises consume data from and exchange data with cloud services. As enterprise applications and data processing workloads increasingly shift to the cloud, the focus of enterprise application and data integration (DI) will likewise shift with them. This has significant ramifications for DI, which has traditionally relied on direct, stateful connections between and among apps and services.
The dominant application architecture of the cloud is REST, or "representational state transfer." Unfortunately, there isn't anything analogous to direct, stateful connectivity in the REST paradigm, which prescribes loosely-coupled, stateless connections between requesters and providers.
Data management (DM) practitioners might prefer not to have to deal with REST, but the writing's on the wall: the application infrastructures of the not-so-distant future will be largely REST-ified. This means on-premises IT resources will be retrofitted with RESTful interfaces or RESTful middleware in order to expose them to other (internal and external) REST services.
In this regard, the right APIs and (just as important) the management and orchestration of these APIs can make the difference between smooth, seamless data flows and broken jobs, brittle processes, exception-ridden logs, and the like. This is the raison d'être behind RESTlet, a framework for designing, developing, and managing RESTful APIs. It's likewise the raison d'être behind RESTlet Inc., a company that both provides commercial support for the RESTlet toolkit and markets a platform-form-as-a-service implementation -- viz., "APISpark" -- of a RESTlet backend server.
RESTlet-the-company aims to simplify and accelerate the process of RESTful application development. As RESTlet's Jonathan Michaux points out, a huge proportion of REST development is taking place inside enterprise IT organizations; of this, a similarly large proportion involves exposing traditional IT resources as RESTful services. Finally, Michaux argues, much of this RESTful development or retrofitting encompasses traditional data management (DM) workloads and practices, such as data integration, data replication, metadata management, and the like.
"In many cases you will have someone on-premises who wants to expose a stream of data to [a cloud] application that wants to consume that data. You are not going to be able to do this the same way [you traditionally would]; you're going to need [to develop] your own [RESTful] APIs, otherwise you're going to have to rely on what [your cloud services provider] give you, which in most cases will limit what you can do," says Jonathan Michaux, an "API Mad Scientist" with RESTlet-the-company. (RESTlet Inc. encourages its employees to customize their job titles.)
RESTlet-the-framework gives developers a toolkit they can use to manage the API life cycle, says Michaud; it spans API design, API development, and, most important, API obsolescence. (In the REST-scape, especially, even the most elegant or efficient of APIs will at some point be deprecated and phased out. Obsolescence, then, is expected, if not required, to such an extent that it isn't uncommon for commercial or custom-built software projects to support several different versions of the same API at the same time.)
"Rather than just starting to code an API from scratch using whatever framework they have available, it makes more sense [for developers] to start with an API-first design approach. This means thinking about what resources are being exposed, what URLs they're going to use, who's going to consume this [data], what are their needs, etc. You want to be able to fine-tune your APIs [to suit the specific situation]," Michaux explains.
"Once you've actually designed your API, you want to implement it. You might want to implement that API from scratch, using something like the RESTlet framework. Another alternative is to do save yourself some heavy-duty coding and use APISpark, our back-end-as-a-service [cloud offering]."
Michaux uses the example of data exchange between and among an on-premises SAP system and a salesforce automation service -- say, Salesforce.com -- in the cloud. Salesforce itself provides tools and APIs to address scenarios of this kind, but -- as with any on-premises data integration effort -- an organization will likely have its own specific data selection, formatting, and periodicity requirements. In the background, SAP data will likely have to be extracted and transformed prior to being exchanged with Salesforce; this means using RESTful APIs to kick off an internal ETL job -- some ETL vendors, such as Talend, offer connectors for RESTlet's APISpark platform -- as well as to automate the requisite exchange of data with Salesforce in the cloud.
This presupposes a great deal of preparation and orchestration in the background, Michaux argues, starting with the design and instantiation of a REST application infrastructure and RESTful APIs. "I can start off by writing a description or descriptor of my API and send that to Salesforce so that they can review it. They might say, 'I don't understand what this means,' or perhaps they'll determine that my API exposes a huge list of, say, purchase orders, but [their APIs] might want to paginate through these," he says.
In any case, Michaux says, you're going to have to have some way to manage the life cycle of those APIs, which means being able to support multiple, simultaneous revisions of the same API. This is particularly important as conditions change -- either in your own environment or, more likely, with an external provider -- or if you want to add new/enhanced features to an application, service, or job.
"The obvious way to change or update your APIs is to update both the API and the app itself; [that is,] to create new versions of both," Michaud says. "So you might want to support both the new version [of the API] and the deprecated one, too, so that you don't break support for older [versions of your] applications, or so that others [i.e., applications or APIs] will still work."
In any case, he explains, you can use the open source RESTlet framework to build a RESTlet back-end to manage this life cycle. RESTlet Inc.'s APISpark, by contrast, is a commercial PaaS implementation of such a RESTlet back-end. (The RESTlet framework is designed for any of six different development environments, including Java Standard Edition, Java Enterprise Edition, the Google Web Toolkit, and Android.) Even though it's possible to use the RESTlet framework to build feature-rich APIs, and even though said APIs could incorporate data integration-like capabilities -- e.g., light transformations, simple data de-duping or matching routines -- the RESTlet framework, like REST architecture, is more of an application integration than a DM-oriented data integration solution.
"We don't do DI as such today, but we do have people who use our platform as part of a data integration chain. We have users who are in the media industry that have a lot of data about film production, active salaries, scripts, and so on. They have a lot of metadata about actual media files. These guys want to be able to exploit that data in a Web app that people can use to manage their film projects. They use an actual integration tool, like Talend ETL, to process the data and format it in a certain way, and then they send [the output of] this [ETL job] to APISpark."