RESEARCH & RESOURCES

Data Integration in a Snap?

SnapLogic says its REST-based approach to data movement is a superior alternative to tools such as ETL for data integration in hybrid on- and off-premises environments.

By now, Hadoop has become a kind of metonym for "big data," in that it's a single word often used as representative of an entire category. A better metonym might be "SnapLogic," at least from the perspective of the average, everyday data management (DM) practitioner. Unlike Hadoop, SnapLogic deals with the kinds of big data problems most DM technologists can identify with.

That's "SnapLogic" as in the name of a vendor, SnapLogic Inc., which markets a REST-based application integration service. (REST stands for "representational state transfer," the lingua franca -- or dominant API -- of the Web application world.)

SnapLogic calls what it does "application orchestration." An important caveat is that in SnapLogic's lexicon, the term "application" encompasses a very big category -- one that also includes databases.

Solutions engineer Rich Dill says that SnapLogic's technology supports one-to-one, one-to-many, many-to-one, and many-to-many orchestration scenarios. "What companies are using SnapLogic for is two processes: bulk movement of data from Application A to Application B, what we think of as old or traditional EAI or ETL. However, they're also using it for application orchestration," Dill explains. "For organizations that want to be able to have a hybrid environment of legacy apps and cloud-based apps, we simplify the ability to move data from wherever it is and put it wherever it needs to be."

SnapLogic is a cloud service. It's based on a hub-and-spoke architecture, such that integration or interoperability between on- or off-premises applications ("orchestration," in SnapLogic-speak) is coordinated by SnapLogic Server, its cloud-based hub. The "spokes" in SnapLogic's architecture consist of application- or source-specific "snaps." An integration or orchestration for Salesforce.com might make use of several different snaps, such as on-premises snaps for Oracle or SAP applications. In this scheme, integration or interoperability isn't simply one way, doesn't consist of simple transactions, and often requires interactive back-and-forth between and among systems or services. Hence SnapLogic's insistence on the term "orchestration." In this case, it fits.

"One of the most common things that we demo is that we have a pipeline that captures events in Salesforce [so] that when a salesperson closes a deal ... once they [record that in Salesforce] ... that generates an event that calls a pipeline and passes that on to SnapLogic. We then execute a parameter that connects to SAP ... and generates the invoice in SAP ... then returns the generated sales order number, which is then uploaded back into Salesforce," Dill says.

The SnapLogic Server hub facilitates interoperability, communication, and integration between applications, databases, and services. SnapLogic performs data integration (DI) by means of what it calls data services "pipelines." A "pipeline" used for DI describes a way of moving and conforming data from a source (or sources) to a destination (or destinations). Pipelines can be used to perform data integration, application messaging, service orchestration, and other tasks. SnapLogic bundles free interfaces for Oracle, DB2, and SQL Server -- along with ODBC and JDBC snaps -- as part of the SnapLogic service. DI-wise, SnapLogic's data services pipelines can be used to extract and join data, as well as to perform other kinds of data manipulation.

They're built and customized by means of the browser-based SnapLogic Designer tool, which is intended to be used by application developers, database administrators, and business analysts, Dill indicates. SnapLogic Designer exposes pipeline activities (or "objects") via drop-down boxes. These can include certain kinds of pre-defined transformations or manipulations. In that sense, savvy users can construct data services pipelines between resources.

"The value of our technology is a level of abstraction above the API that presents the user with objects [which] they consume and focus. We do the 'how,' you figure out the what, so this is a technology that can be used by the savvy business user," he says.

Realistically, even the savviest of users probably won't be able to use SnapLogic Designer to build pipelines that perform one-off or highly specialized kinds of data manipulation, or which require lots of orchestration between and among data sources, applications, and services. If an object (e.g., a transformation) isn't exposed in one of Designer's drop-down fields, it can't easily be implemented.

According to Dill, "There are two kinds of business analyst. There's the guys who are like, 'I don't care about the technology, give me a mouse and a spreadsheet and I'm happy.' There's another one who goes in and has to know the plumbing, has to figure out how to get the most out of [the technology]. The business user who writes the Excel macros, who enhances their computing experience by knowing what's going on, these are the kinds of individuals this [SnapLogic Designer] is designed for."

In addition to managing and monitoring pipeline data flows, SnapLogic Server performs additional functions: auditing and logging.

"Every time you execute a SnapLogic pipeline, it generates a log file. No customer data persists in SnapLogic after the pipeline is completed," Dill says. "We'll access DB2, pull that logic out, slice, dice, and transform it, put it where it needs to be, and it's gone. The log file documents that a pipeline has executed, has these characteristics, used these sources, and took this amount of time."

SnapLogic, which was founded in 2006, is still evolving. Take the Salesforce-to-SAP use-case described above. Right now, there's no built-in way to track or monitor whether the pipeline process that was supposed to kick off an SAP sales order number actually did so.

"That's the difference between the two-phased commit [e.g., with a database transaction] and a REST-based architecture. Right now, the pipeline has to have the functionality within itself to recognize that the process or the transaction has been successful. We are working on a version of our product right now … that will have guaranteed delivery, so that that will be an easier feature to add into a pipeline," Dill points out, noting that it's possible to build additional "steps" into a SnapLogic pipeline -- e.g., a timer or time-out period or a trigger that looks for a new transaction or which logs the creation of a new file -- to address the Salesforce-to-SAP use case.

Dill argues that SnapLogic's REST-based underpinnings are better suited than what he calls "primitive technology" tools (e.g., ETL or EAI) for the increasingly "hybridized" world of on- and off-premises applications. On top of this, he says SnapLogic's SaaS subscription model is another plus.

"We're SaaS, so it's not [a matter of] spending half a million or a million dollars to buy some software and then it's going to be $25,000 a year in maintenance," Dill says. "Our technology when compared to more primitive technology ... it's a SaaS price model, which reduces that million-dollar bite on day one and then you're not spending huge amounts of money on consultants to build the solution. [SnapLogic] is technology that transcends ETL, enables EAI, enables Web deployment, and does it all with a high-level development tool that allows a person to become productive very rapidly without being an expert on the low-level [interfaces] of the [system]."

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.