LESSON - Maintaining Mixed-Workload Performance While Loading Data
By Jesse M. Fountain, Vice President, Pre-Sales Services, DATAllegro
As data warehouses become increasingly mainstream and mission critical, data warehouse administrators find themselves plagued with the following dilemma: How to keep data refreshed in a near-real-time manner without degrading overall query performance.
Historically, the cardinal rule of data warehouses has been to avoid updating the warehouse during traditional business hours. In recent years, thanks in part to the Internet marketplace, business hours are no longer 9-to-5 but run around the clock. Gone is the quiet window wherein batch data inserts/updates/deletes can take place.
The most popular workaround has beento maintain two separate instances of thedatabase using a “ping-pong” strategy, withone instance being devoted to queries whilethe other accepts data loads. Periodically,say every hour, the load database becomesthe query database and what was the querydatabase becomes the load database. Thisstrategy is not only expensive—having tokeep two instances of the database—butalso requires unnecessary ETL complexity.
In a recent proof of concept (POC), a large manufacturing company that uses Teradata required participating vendors to demonstrate their ability to allow simultaneous data loads without affecting concurrently running queries. On a single-rack system, DATAllegro set up the environment to run 200 concurrent users while injecting 50 million rows into the same table being queried. The stand-alone load job took approximately 26 seconds. However, in a concurrent scenario, the job ran in under 60 seconds with almost no detectable (< 10%) degradation to the performance of concurrent queries. At the conclusion of the POC, DATAllegro was the only vendor that fulfilled the simultaneous data load requirement to the manufacturing company’s satisfaction.
DATAllegro was the only vendor that fulfilled the simultaneous data load requirement.
The secret to DATAllegro’s capability lies in the manner in which we load data. First, our high-speed bulkloader capitalizes on speedy MPP architecture to move data onto the appliance in the form of a temporary table at blindingly fast speeds approaching 1.2TB per hour. Next, a standard transaction-safe “select into” is invoked to insert/update the data into the permanent table. DATAllegro’s high-speed FASTINSERT facility controls table locks in a manner that does not interfere with pending queries. This entire operation is handled automatically by the DATAllegro Autoloader and is completely failsafe.
Now, organizations with real-time and near-real-time data requirements can rest easy that their end users will not suffer poor query performance while enjoying the business benefits of up-to-the-minute data refreshes.