Operational Intelligence: Checking the Health of Your Data Center
Data is key to your enterprise, so it's important you know everything you can about your data center and monitor your data sources, including machine-generated data.
By Jorge Lopez, Director of Product Marketing, Syncsort
Data has become the lifeblood of businesses. Corporations, governments, and even non-profit organizations depend on accurate, relevant data to remain competitive. The data center is the heart that pumps information to all areas of the organization, but how much do you know about your data center?
When most people think about big data, they think about relatively new data sources such as mobile, social media, and clickstream data. However, big data also means analyzing existing sources that previously went untapped. One of these sources comes precisely from your data center itself -- the vast amounts of data generated by hundreds (if not thousands) of servers, network devices, databases, applications, operational systems, and mainframes that support all your data needs.
It only makes sense to monitor and analyze all these data sources. Most people wouldn't drive a car without a dashboard. We need to know if we're running out of gas, oil, or coolant; know how fast we're going; or see how many miles we've driven. Yet when it comes to the data center, many organizations are, in fact, driving blind.
We've all heard horror stories: an unauthorized user goes unnoticed for months, an application or server failure leaves the system nonoperational for one or two hours, or invalid transactions cost money and trigger frustration -- and more.
How can you avoid many of these nightmares? The good news is that many big data technologies are well suited to this task. After all, you're collecting, processing, distributing, analyzing, and visualizing data -- it's just a different type of data.
Machine-generated data generally has three key characteristics.
First, machine data comes in files of semi-structured, unformatted data. Of course, each device has its own way of logging data, which makes this task even more challenging. Even if your company has a corporate standard in place, you can expect dozens (if not hundreds) of different devices will each log data in its own way.
Second, this data is sequential. To fully understand it, you need to look at a chain of events. For example, an unauthorized access could be preceded by three unsuccessful login attempts from the same location.
Third, volumes are massive. With hundreds of servers, dozens of applications, and thousands of transactions, volumes can reach terabytes of data per day.
These three factors alone make things more complicated than your traditional business intelligence project. Therefore, it's important to keep them in mind when embarking on an operational intelligence initiative. Luckily, we're not looking at traditional tools or approaches. For starters, the sheer volume and unstructured nature of machine data make Hadoop the ideal platform, not just for storing but for transforming and preparing the data as well. High-performance sorting tools can help you increase compression ratios by up to 10x by sorting data before you compress it, helping you save even more.
Building your own solution is definitely not for everyone. We could write an entire book with tips, best practices, and pitfalls to avoid when building a solution to monitor and analyze the operational health of your data center. That's probably one of the key reasons these projects get sent to the back burner so often. Even then, there's no reason to do that.
If a do-it-yourself approach is not for you, there are a few pre-packaged solutions that will spare you from all the details, allowing you to invest your time in insights that will drive your business. When choosing the right approach, remember the most important thing is to get real-time, operational insights with a 360-degree view of your organization's IT infrastructure. In most organizations, data flows and integrates freely across a whole range of systems -- from Web servers and social media to mainframes to data warehouses, and even Hadoop. It's important to make sure "no system's left behind." After all, a chain is just as strong as its weakest link.
Jorge A. Lopez is the director of product marketing at Syncsort. You can contact the author at firstname.lastname@example.org
or follow him on Google+.