TDWI Upside - Where Data Means Business

Event Data: The Root of All Analytics

What is event data, and what makes it unique and valuable for analytics?

Analytics is reshaping computing, businesses, and many of our day-to-day activities. The fuel that drives the analytics engine is all around us, in everything we do. It's data, of course, but more precisely, it's event data.

Events are happening everywhere, all the time -- in apps, cars, appliances, servers, and even in our brains. With more devices connecting to the Internet, it's becoming easier to collect data from just about anywhere.

For Further Reading:

The Benefits of Streaming Data Are Contagious

Why Data Warehouse Modernization Must be Coordinated with Other Modernization Projects

Use a Hadoop-based Data Lake to Empower New Best Practices for Business Analytics

Let's take a closer look at event data and what makes it unique and valuable for your analytics. The easiest way to understand event data is by comparing it to another type of data. I've chosen entity data because it's familiar from use in databases and spreadsheets.

Entity Data

Entity data is stored in tables and can be associated with such elements as users, products, and accounts. Typically, a separate table is assigned for each of type of entity, with columns that contain related properties. This allows a user to quickly look up information about any entity.

Also, in entity databases, data is normalized and rarely duplicated. For example, a table for accounts might contain attributes such as account name, type, and category. Because multiple users can be associated with the same account, user information wouldn't typically be stored in the accounts table. Instead, a key in each user record would link to its account.

A major drawback to this data model is that in order to analyze entities (for example, to sort employees by department name), you must pull in data from multiple tables. At large scale, these operations take time.

Event Data

Event data doesn't just describe entities; it describes actions performed by entities (for example, "Publish a blog post"). Event data contains three key pieces of information, sometimes called behavior data:

  1. Action
  2. Timestamp
  3. State

The action is the thing that's happening (e.g., "publish"). The timestamp is self-explanatory. The state refers to all of the other relevant information we know about this event, including information about entities related to the event, such as the author and content management system associated with the blog post.

Let's consider a more complex event: recording every player's "death" in an online video game. Typically, there are many ways the player can experience "death," such as falling from great heights, starvation, drowning, stumbling into lava, or being killed by a zombie.

To analyze the most common type of death, the age of the player at the time of "death," length of time played at the time of "death," the most lethal enemies, or any number of "death"-related questions, we can use a simple event data model with a few specific qualities:

  • The data is rich
  • The data is denormalized
  • The data is nested
  • The data is schemaless

Event Data is Rich

Events can have hundreds of properties; they seek to describe not just one entity but all of the entities involved in an action. In the above example, we can add even more data, such as location of the death, game settings, and software version -- just to name a few.

Event Data is Denormalized

Unlike in a relational database, the same data is continuously repeated in an event database. User attributes, app version, or difficulty settings might be repeated on every single event even if they rarely change. This redundancy is necessary to capture a representation of the application state at the time of the event. In entity databases when properties (e.g., player settings) are updated, the previous values are lost forever, but event databases can capture entity data at a point in time. To be clear, event databases are a great companion, not a replacement, for entity databases.

Event Data is Nested

Event data can have multiple properties; most databases optimized for event data can store it using nested JSON. This is particularly helpful when data sets have many properties and entities to describe.

Event Data is Schemaless

As mentioned earlier, event data can capture state at the time of an event. For example, starvation, drowning, or lava deaths, which don't involve an "enemy," might have their own unique properties, "lava temperature," for example. In other words, the death events don't follow a strict schema. Event databases are designed to handle a multitude of arbitrary properties.

Event Data at Scale

An online game can have millions of users, and for every user there are many actions. Because entity data captures current state information and the history of actions that happen over time, its scale is massive compared to entity data points. Fortunately, data storage is now affordable enough to support event databases.

Although entity data will always be a valuable asset for data science, without event data we wouldn't be able to perform analytics as we know it today.

 

About the Author

Michelle Wetzler is chief data scientist at Keen IO, which offers products that enable businesses to add analytics and data science features directly into their applications. She previously developed advanced IT architectures for Fortune 500 enterprises as a consultant with Accenture and has also taught imaging technology at the University of Illinois. You can contact the author on Twitter at @michellewetzler.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.