Redundant Information
Author
Marcus HeldHi,
“After importing, we unpack the file and read the meta-information.”
“And where do we write it?”
“This information is stored in different databases. The path information is distributed across all service databases so that they know how to access the files. Other data is stored in the system database and in each tenant database.”
Does that sound normal to you?
It shouldn’t.
What we are experiencing here is information duplication.
“But Marcus, why is that so bad? We just need the information in different places.”
Yes. True. The same information is required in multiple places.
That’s usually the case.
When we are inside an application, we don’t think of creating new tables for each feature.
Of course, we build a data structure that works for all features.
And why do we do that?
We want to easily modify the application.
If all features use the same dataset, then we can change it centrally.
It would also be strange if we changed the price of a book and it only appears in the detail view.
Of course, it should also be displayed in the search results.
Redundant storage of the same information is a gateway for bugs.
It makes the application hard to maintain.
A change in logic must be made at all points, on all services, in all databases.
Hardly anyone can keep track of all places in the long run.
That’s why it’s so important that we always have only one truth.
This also applies to invertible transformations.
For example, if we run an online store:
We want to calculate the gross price of a product. The gross price of a product is the net price * value added tax. The gross price is directly linked to the net price. It can be calculated from it at any time. So it’s not new information. That’s why you only need to store the net price.
The gross price is just a transformation of the existing data.
The next time you work on your data schema, consider whether you want to store duplicated information. And avoid it!
Rule the Backend,
~ Marcus