Business Intelligence in the Cloud: Getting Started

By Andrew Lampitt, Director of Business Development, Jaspersoft

For data warehouse and business intelligence architects seeking to deploy more effective BI solutions at less cost, cloud-based BI can help—especially for temporary, unpredictable, prototype, and proof of concept projects. Implemented correctly, cloud-based BI eliminates the need to fund or find new infrastructure or consume scarce data warehouse resources. For successful BI in the cloud, a number of preferred practices are emerging:

First, stack the deck. Every BI solution, cloud based or not, requires a number of components: reporting and/or analytics, ETL, database, and the tools to manage these components as well as the underlying computing infrastructure. Carefully choose each of these elements.

Pick your projects. To get started in the cloud, choose an application that requires a data footprint of less than 1 TB (you can scale out later if your application catches on). Consider BI projects shelved due to resource constraints—you may identify that gem of a project with the potential to make a big business impact. Making a business case may become a non-issue, as your potential cost may not hit the bureaucracy threshold. In any case, you can accurately predict and control costs, which will be low, and your project can be retired as easily and affordably as it can be started.

Insist on open source BI. Aside from architectural concerns and high costs, proprietary solutions can also lock you in—just when you’re trying to become more flexible. Premium, “commercial open source” versions will give you the support, stability, and extra features you may need while remaining highly affordable. You can also use a free trial version to build a complete solution before you ever spend a penny on BI.

Look for a full range of BI functionality: analytics, reporting and report serving, ad hoc query, and in-memory as well as disk-based analytic power. Your BI solution should be easy to administer and use, with a rich but easy-to-learn browser interface. It must be based on a modern architecture that clusters easily for high scalability and works seamlessly in virtualized environments.

Stay with open source for ETL. To minimize the need for busy administrators, use an ETL solution offering a full graphical environment for handling data transformation and integration tasks; scripting is too much trouble for do-it-quickly projects. Your ETL technology should scale well, leverage commodity hardware, play well in virtual environments, and offer a no-cost trial version.

Choose your database carefully. Depending upon your database size and query complexity, a traditional RDBMS may be suitable—or you may prefer a high-performance analytic engine that can ensure the query speed users require. Choose a column-oriented solution so you’re not wasting overhead retrieving entire rows; a massive parallel processing (MPP) architecture is a must for scalability and performance. Efficient, high compression technology is also essential; ideally your choice will perform data operations on compressed data. Administration should be browser based with a modern architecture that can scale linearly.

Keep management options open. Use a cloud management solution to provide for automatic scaling, monitoring, and notifications. Ideally, the solution will include preconfigured templates for push-button provisioning and launching the database, BI, and ETL components of your solution. Also, it should help you manage in and across multiple public and private cloud environments, as well as your on-premises environment.

Make it easy on yourself. Try to find all four components of your cloud-based BI stack preloaded and ready to use on one of the mainstream public clouds. This way, you can skip most of the administrative work and quickly get on with your proof of concept, prototype, or one-off analysis effort.

