TDWI Articles

Q&A: Storing, Analyzing IT Operational Data Offers Deeper Insights

One growing use of analytics in business is capturing and analyzing operational data. Rocana captures it from any source and holds it in one scalable repository, then delivers it to stakeholders for further analysis.

Using machine learning, Rocana Ops enables users to collect and analyze large quantities of data across both IT and business sources, offering visibility into IT operations. The company was founded in 2014 by Omer Trajman, along with Eric Sammer and Don Brown -- three IT veterans with deep roots in IT operations and big data.

"We recognized that the Fortune 500 market desperately needed a purpose-built solution for total operational visibility," Trajman says. "Our mission was to combine the power of big data and machine learning to help technologists collect and analyze data across IT and business sources -- all data, from all sources, kept online and available for instant and future access."

In this interview, Trajman offers insights into the challenges of storing and analyzing huge volumes of data quickly -- and how his company is taking a fresh approach.

Upside: What are some of the challenges in searching event log data?

Omer Trajman: The number one challenge in searching event log data is that by the time you're doing it, you're already behind. Search in general is a powerful access mechanism for constrained data sets where operators know what they're looking for. As data volumes grow, though, search becomes less useful.

Search solutions generally have a bell-shaped ROI curve; they add value as data initially grows and then start to have negative returns as the increasing data volumes take longer to sift through, generating more false positives and keeping operators from addressing what may be more critical issues.

How are demands on IT changing in terms of working with big data?

Similar to other next-generation technologies, big data is creating both new opportunities and new burdens for IT. Big data can combine previously disparate data sets and provide a central repository for multiple lines of business, often dwarfing the breadth of data sources that feed even the data warehouse. Because of the technology's capabilities, big data is also able to keep more detailed data for longer.

For IT, all of these features mean requirements for new operating policies, additional operational visibility, and potentially higher risk. Big data may mean that more IT infrastructure becomes business critical instead of back office support.

The opportunities for big data open up when IT starts using it to support operations. In fact, IT organizations are finding that even with the overhead involved in managing the new technologies involved in big data, they realize ROI in a matter of months when applying it to IT operations.

What hardware and software evolutions have made it possible to collect, search, and analyze operational data effectively?

Big data technologies have unprecedented power to collect a wide variety of data from across many systems and combine that data with business metrics.

It started in the late 2000s with the introduction of scale-out, flexible data management systems such as the Apache Hadoop ecosystem. These technologies run on industry-standard hardware, which is cost-effective to scale out. New software has also emerged that is compatible with the existing ecosystem and introduces massive-scale, real-time analytics and rich, purpose-built visualizations.

When these technologies are combined, IT can visualize trillions of data points and quickly identify hot spots or emerging issues, then use search proactively when appropriate.

These concepts were previously applied to domains such as high-frequency trading and online advertising. They're now becoming cost effective for IT.

Where do you see the handling and analysis of event data heading in the next few years?

Just like other big data technologies, as the analytics systems are used more, we can learn where those capabilities are most effective. Although there is probably a class of problems that can be fully automated (for example, decommissioning a mirrored disk with excessive errors), most problems will still require human troubleshooting. The world of IT is getting more complex by the day, and we won't be able to anticipate all the problems that are to come.

What we will be able to do is proactively identify what information needs to be analyzed and gathered. Big data is very good at pouring over more data in seconds than a human could in a lifetime and bringing out the relevant pieces of information. Combined with operator interactions, we could predictively route information to the person best suited to solve a problem before it impacts service reliability or revenue.

Although merely applying big data analytics to IT can increase efficiency by ten times, this kind of intelligent assistance and predictive analytics has the potential to increase efficiency another ten times over. Now, IT will have individual operators supporting hundreds of thousands or millions of systems. With the widespread adoption of containers and microservices, the pressure will be on IT to step up.

Can you expand on how the "widespread adoption of containers and microservices" will impact IT?

In order to deliver ever-more complex business services, developers are embracing microservice architectures and container-based deployments. These small, lightweight processes operate in highly scalable and loosely integrated environments. Developers get the benefit of continuous deployment, but IT bears the burden of a continuously changing environment.

Even with continuous integration testing, the constantly changing nature of microservices means that IT no longer has time to perform a complete suite of acceptance testing and staging. This rapid deployment approach also makes it impossible to maintain a proper runbook or to identify which systems to monitor. Whereas IT once had time to "bake in" a new software release and identify proper thresholds and dashboards, they now need to operate in more of a DevOps fashion or risk becoming the bottleneck to business innovation.

Given what we've discussed, what does your company, Rocana, offer?

Rocana Ops is a total visibility system for IT operations. It enables the user to quickly and cost- effectively capture and analyze 100 percent of operational data from all sources into a single highly reliable and scalable repository. It can then be delivered directly to stakeholders across the organization through Rocana's open APIs. By subscribing to the curated data feeds they want, stakeholders downstream can perform real-time and historical analysis using the tools and methods of their choice.

We're all about breaking down the silos associated with traditional IT monitoring and completely redefining what operational visibility means for customers. We believe we're onto something unique, with a combination of unmatched scale, advanced analytics, and open accessibility.

Can you share some examples of how companies are using your solution?

One of our customers is a well-known U.S.-based big box retailer that needed to step up their digital transformation strategy in order to compete with the Amazons of the world. To do that, the decision makers realized they needed to collect, monitor, and analyze much more operational data than they had been -- for example, to maintain compliance and security while integrating IT systems with their thousands of suppliers.

IT wanted to make that information readily available to various internal departments so business users could extract value from that data -- for example, understanding and anticipating customer preferences and maximizing the shopping experience of users on the e-commerce site.

The decision to move forward was hastened by a large security breach. A postmortem revealed that inadequate monitoring and disconnected silos were to blame. However, the existing monitoring infrastructure and tools couldn't technically or cost-effectively scale from the three terabytes a day of operational data the retailer was currently collecting (with periodic data loss) to the 100 terabytes a day they wanted to collect without data loss. The existing system couldn't scale from the existing 6-month retention to the desired 18-month minimum retention -- up to five years retention for some data sources.

The company looked at building its own scalable operational data ingest and warehouse platform from open source big data technologies but quickly realized that building it would be a very complicated and expensive task -- and the company wanted IT resources focused on "Big Bang" work instead of plumbing.

With Rocana Ops, this customer is now well on the way to achieving its digital transformation ambitions and is experiencing greater business insight and agility across the organization. All departments, including IT operations, security, and marketing, now have access to full-fidelity current and historical data from a single source of operational data truth. Users' tools no longer restrict what data they can see. Departments can now ask the questions and perform the analysis they want in order to derive maximum benefit for the business.

Rocana Ops helped optimize existing tools and workflows. The retailer uses Rocana Ops to pull curated feeds into existing monitoring tools and reporting apps. That means no disruption to end users, and users can get more business value from those existing tools because they are being fed with more accurate, relevant, and timely data.

Ultimately, Rocana helps make IT a strategic partner in moving the business forward. With IT operational intelligence now at the core of the retailer's new strategy, IT operations has gone from cost center to strategic competitive advantage.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.