How Data Modeling for NoSQL Improves Agile Development
Many don't fully appreciate how critical data modeling is to understanding data. Modeling doesn't delay agile application development; it helps developers plan ahead.
- By Pascal Desmarets
- October 3, 2017
As we know, data is a corporate asset, and it drives decision making at many businesses. What many don't know, or at least don't fully appreciate, is that data modeling is critical to understanding data, its interrelationships, and its rules.
Far too many people don't see the value that data modeling provides. Some perceive it as just documentation, as a bottleneck obstructing agile development, or even as too expensive to be worth it.
This view of data modeling is not new to the current generation of programmers nor is it a new approach to NoSQL databases or agile development. There have always been, and there always will be, programmers who prefer to jump in and code first without clear requirements, formal design, or a data model for their database.
Finding the Right Balance
It should go without saying that agile development and scrum methodology have proven their value. Agile can help with the development of projects of all sizes -- this is agility at scale. The game changer is having a culture of continuously delivering value-adding software to the users instead of keeping "inventory on the shelves." This is also referred to as "code is only happy in production."
It's not uncommon for database developers to state that agile is not possible for the database part of a project; that databases have to be approached in a waterfall way. In a complex project, data modeling and database development could then become a bottleneck, given the interdependencies with future sprints.
Granted, traditional relational databases don't make it easy or fast to evolve iteratively, whereas NoSQL document databases are much easier. Beyond benefits for scaling out infrastructure, the true revolution of NoSQL is the dynamic schema capability. In particular, JSON documents and denormalization are perfectly aligned with development agility. They allow us to cut the database evolution into chunks corresponding to the length of a development sprint.
In a waterfall environment, data modeling may have been imposed by IT architecture and governance and performed by dedicated resources. However, in an agile environment, members of a self-organized team wear many hats. Data modeling may need to become a team effort -- a role that each developer can endorse part time during analysis, design, and sprint planning. Data modeling becomes an exercise of thinking through what-if scenarios -- a sort of draft copy and simulation mechanism before diving into coding that helps you avoid costly rework.
Not an Exact Science
The JSON-based dynamic-schema nature of NoSQL is a good opportunity for application developers, giving them the ability to start flexibly storing and accessing data with minimal effort and setup, plus fast and easy evolution. Although flexibility brings power, it also brings dangers for designers and developers less experienced with NoSQL because there is no single correct way to model a document.
Stored documents need to be designed and modeled based on how the data will be accessed and updated. It is permissible to denormalize and repeat data, as long as integrity and coherence are kept. Unless there is a good reason not to, you should join data when writing to disk because this leads to a substantial improvement in read performance, when it matters most to users. It is not a good idea to have a document designed such that the document will constantly expand -- you don't want to need to read and write a large amount of data each time a small part is updated.
To make matters more interesting, each NoSQL document database adopts a different storage strategy, even if most of them use JSON. Each vendor also prescribes a different approach for the definition and use of the primary key and different sharding strategies.
The time of ingesting big data without caring about how it is stored is just about over because enterprises now realize the importance of making big data actually usable. In many ways, data modeling for NoSQL becomes more important than with relational databases for which normalization rules were both restraining and guiding.
All these factors create a steep learning curve, and sometimes they are an unnecessary barrier to NoSQL adoption. However, these challenges should not scare people away from NoSQL, but rather alert them to the fact that the technology is incredibly powerful and flexible. Emerging data modeling tools specifically designed for NoSQL databases can now help with the process and reduce risks while letting users leverage all the benefits of the technology.
Benefits of Data Modeling for NoSQL
In the end, a data model is not just documentation -- it can be forward-engineered into a physical database. A data model describes the business. It is the blueprint of the application. Such a map helps evaluate design options before jumping into a project, it provides direction in thinking through the implications of different alternatives, and it detects potential hurdles before you commit sizable amounts of development effort.
This is even more vital with an agile development approach because a data model helps developers plan ahead, thereby minimizing future rework. Data modeling is not a bottleneck for application development. Quite the contrary, it has demonstrated time and again that it accelerates development, significantly reduces maintenance, increases application quality, and lowers execution risks across the enterprise.
Pascal Desmarets is the founder and CEO of Hackolade. He leads all corporate efforts producing visual tools to smooth the onboarding of NoSQL technology in corporate IT landscapes. Hackolade’s software combines the simplicity of graphic data modeling with the power of NoSQL document databases. He can be reached at firstname.lastname@example.org or on LinkedIn at www.linkedin.com/in/pascaldesmarets/.