TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - Architecting a Modern Martech Stack for Speed, Scale, and AI Readiness August 28, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

LESSON - Improve Your Analysis Capabilities

April 26, 2005

By Craig Abramson, Senior Technical Analyst, Syncsort Incorporated

Today’s businesses are accumulating ever-increasing volumes of data from a variety of different sources at record paces. For example, a successful Web site alone can generate over a billion clickstream records a day. Now add to this the data generated daily from high-volume transaction processing, coupled with detailed historical transactions already stored in a data warehouse. Then there’s the inventory and billing data, detailed financial information, customer data, and much more. While the growth of the data results in longer processing time, increased hardware costs, and more administrative requirements, demands on the data have also been increasing. This lesson describes two ways in which you can prepare your data in order to meet these demands: by performing data transformations and utilizing metadata.

Transform Your Data for Faster Queries

Transforming your data makes it easier and more efficient for you to directly access just the information that you need, minimizing the elapsed time of your queries. There are three different levels in which the data transformation processes can take place in order to reduce the elapsed times of your applications and speed the analysis of information:

Source level operations. At this level, you can convert database tables to flat files and vice versa.

Record level operations. This would encompass such processes as joins, sorts, merges, or just copying records to the appropriate target(s). Before outputting the records, they can then be filtered or reformatted for faster access. You can also perform aggregations at this level. Aggregates are one of the best ways to speed warehouse queries. A query answered from base-level data can take hours and involve millions of data records and millions of calculations. With precalculated aggregates, the same query can be answered in seconds with just a few records and calculations.

Field level operations. This includes data type and format conversions, arithmetic operations, string operations, date-time operations, pattern matching, and conditional operations.

These kinds of transformations allow you to cleanse data, create business rules for data quality, and concentrate on only the information you need. The data can then be loaded into a data warehouse or other data-intensive application for analysis. Additional processing may have to be completed in order to move specific data to the appropriate data mart.

Utilize Metadata to Quickly Find the Information You Need

Another way to minimize the elapsed times of your queries is to use metadata. Metadata describes what information is contained in your data. It details the context, structure, and location of the data, making it much more manageable. This may not sound like an important feature when you don’t have a large amount of data… but when the gigabytes start adding up, metadata helps you quickly find and utilize the data you need for your specific applications. To highlight the benefit that metadata provides in this type of environment, imagine that you’re in a large warehouse that’s filled with wooden crates. None of the crates have labels. Now imagine that your boss asks you to find and collect all the crates that contain parts relating to the production of widgets. This would be a daunting and time-consuming task for anyone. But what if you approached the same task with crates that are all clearly labeled? You can imagine how much easier the labels would make your job and how much faster you could get it done. That’s one of the advantages that metadata provides.

Metadata is also critical for data integration, which is the process of accumulating and combining data sets from disparate sources in various locations. It helps you identify the data sets so that, once again, you are only collecting the information you need. Data integration then incorporates the ETL process—the sequence of applications that extract data from various sources, bring them to a data staging area, and apply a sequence of processes to prepare the data for migration into the data warehouse and the actual loading process. After the data is extracted, a number of transformations may be applied in preparation for data consolidation and subsequently loaded into data marts, data warehouses, or dimensional data structures used for decision support or business intelligence systems. Metadata then provides the contextual information you need to effectively analyze the data.

Metadata can be located either within the data warehouse or scattered throughout a network. Source metadata is located within the data warehouse and includes all information about the source data. It can provide such details as:

Record layouts. Each record from the source is made up of a series of fields that contain information on a particular item. The pattern of fields, together with the data types and lengths of the individual fields, is termed the record layout.
Business rules. The metadata layer of the data warehouse enforces information consistency by allowing data to be defined in business terms, as opposed to using the database jargon. Rules that specify how business terms are determined or calculated, i.e., the business rules, are also defined within the metadata layer.
Derived data. The transformation definitions derived from the contents of a source field in the records and/or from constants are referred to as derived data or derived values. Derived values provide a mechanism to simplify and reuse transformations defined on the source data.
Customized collating sequences. To order character values according to a sequence other than one of the standard collating sequences, you have to define the rules that detail how character values are to be ordered. These rules describe a special collating sequence that defines the order in which single characters, and possibly double characters, collate.

Metadata makes data more manageable.

Speed Your Backups with Metadata

The use of metadata is also emerging in the area of backup and recovery strategies. For instance, on NetWare versions 6.5 and higher, you can back up data using a pool snapshot, which is a point-in-time metadata copy of a data pool. In addition to increasing the speed of backup jobs, pool snapshots can eliminate job disruptions caused by failed attempts to back up open files. Once a snapshot is taken, the original data pool is immediately available to users, instead of being unavailable for the length of the backup job. Scripts can be run before and after the snapshot to change the state of a database, gather information at the time of a snapshot or job, or perform other manual tasks.

Conclusion

In order to minimize the elapsed times of your applications and speed queries, you’ll need the right software to perform the necessary data transformations, process metadata, and quickly back up your critical information using snapshot technology. It’s important to test the various tools in your environment and take advantage of any proof-of-concept offers that are available. This will ensure that you make the right choice and reach the full potential that your data has to offer. The performance advantages that you’ll gain with the right tools will also give you more time to run additional applications and leverage your existing hardware investment.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

LESSON - Improve Your Analysis Capabilities

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research