Skip to main content
00 Days
00 Hrs
00 Min
00 Sec

Copy-on-Write vs. Merge-on-Read: The Quiet Tradeoff Behind Every Lakehouse Update

Updating a single row sounds like it should be simple. In a traditional database it more or less is. But a data lakehouse stores its tables as collections of files in cloud storage, and those files have an awkward property: they're not built to be edited in place. You generally can't reach into a file and change one row. To change anything, you have to write new files. This raises a question that turns out to have two very different answers, and the choice between them quietly shapes the performance of the entire system.

The question is one of timing: when do you actually do the work of incorporating a change? You can do it immediately, at the moment of the write, by rewriting files so they reflect the new state. Or you can do it lazily, recording the change cheaply now and resolving it later, when someone reads the data. These two strategies are called copy-on-write and merge-on-read, and they sit at the heart of how lakehouse tables handle updates.

Start with copy-on-write, the more straightforward of the two. When you update a row under this strategy, the system finds the file containing that row, and rewrites the entire file with the change applied. The old file is replaced by a new one that reflects the updated state. After the write completes, the data files are fully current: they contain exactly the data as it now stands, with nothing left to reconcile.

The consequence is that reads are fast. A reader just opens the files and reads the data, because the files already hold the finished, correct state. No extra work is needed at read time; the work was all done up front during the write. The cost is that the write itself is expensive. Changing even a single row means rewriting a whole file, potentially a large one, just to alter one record inside it. Under copy-on-write, writes pay a heavy price so that reads can be cheap.

Merge-on-read inverts the bargain. Instead of rewriting the data file when a row changes, the system writes a small, separate record of the change, a note saying, in effect, "this row was updated to this" or "this row was deleted." The original data file is left untouched. The write is therefore fast and cheap: you're just appending a little delta file describing what changed, not rewriting a large file to fold the change in.

But the change hasn't actually been incorporated into the main data yet, and that debt comes due at read time. When someone reads the table, the system has to take the original data files and apply the outstanding change records on top of them, merging the deltas with the base data on the fly to produce the correct, current result. Hence the name: the merge happens on read. Reads are now more expensive, because every read has to do the reconciliation work that copy-on-write did during the write. Merge-on-read makes writes cheap by deferring the cost to every subsequent read.

Laid side by side, the tradeoff is clean and symmetric. Copy-on-write does the work at write time, giving you slow writes and fast reads. Merge-on-read defers the work to read time, giving you fast writes and slower reads. The total work is similar; what differs is when you pay for it, and that timing is exactly what makes one strategy or the other the right fit for a given situation.

So the choice comes down to the shape of your workload, specifically how often the table is written versus how often it's read. A table that's updated rarely but read constantly, the typical pattern for analytics and reporting, is a natural fit for copy-on-write. You pay the expensive write cost only occasionally, and in exchange every one of the many reads is fast. Paying once to write and benefiting on thousands of reads is a good deal when reads vastly outnumber writes.

A table that's updated frequently, on the other hand, leans toward merge-on-read. If data is constantly streaming in or changing, copy-on-write would mean endlessly rewriting files, an enormous and continuous expense. Merge-on-read keeps those frequent writes cheap, absorbing the cost into reads instead. This suits situations with heavy ingestion or frequent small updates, where keeping up with the write volume matters more than squeezing maximum speed out of each read.

There's a complication with merge-on-read worth naming, because it connects to a broader lakehouse concern. As changes accumulate, the pile of outstanding delta files grows, and reads get slower and slower, because each read has to merge an ever-larger stack of changes onto the base data. Left unchecked, a heavily updated merge-on-read table degrades over time. The remedy is compaction: periodically, a background process folds the accumulated deltas into the base files, rewriting them to incorporate all the pending changes at once. After compaction, the deltas are cleared and reads are fast again, until changes build up once more. Merge-on-read, in other words, doesn't avoid the cost of rewriting files; it batches it up and pays it later, all at once, instead of on every individual write.

This reveals that the two strategies aren't really opposites so much as different schedules for the same underlying work. Both ultimately have to get changes into the data files. Copy-on-write does it eagerly, on each write. Merge-on-read does it lazily, deferring and then batching it through compaction. Many modern lakehouse table formats let you choose between the two per table, and some let you tune the behavior, precisely because the right schedule depends on how the table is used.

The reason this quiet, technical-sounding choice is worth understanding is that it's one of the clearest examples of a tradeoff that runs through all of computing: you can often choose when to pay a cost, but rarely whether to pay it at all. Copy-on-write and merge-on-read aren't a matter of one being faster than the other in some absolute sense. They're a matter of where you'd rather spend the time, on the write or on the read, and matching that decision to whether your table is written more or read more. Get the match right and the system feels fast. Get it wrong and you've optimized the side that wasn't the bottleneck, paying for speed exactly where you didn't need it.