The Cost of Accidental Complexity in Development

March 22, 2024

Author

In the beginning, IT teams often implement new features quickly, but over time, development tends to slow down. Accidental Complexity is a frequent culprit – this article explains its origins and how it can be mitigated.

This article was originally published in German as a premium article on Golem.de in June 2023.

It’s no secret that systems grow more complex over time, leading to longer development cycles. This is inevitable when dealing with inherently complex problems, known as Essential Complexity. However, in many cases, the implementation of a feature becomes more complex than necessary. Two months for a feature that seemed straightforward? That’s a classic case of Accidental Complexity. It’s frustrating, costly, and avoidable, as we’ll show with some practical examples.

Many IT professionals will recognize this scenario from projects that have been under development for a long time: “It will take about four weeks,” says the lead developer. “Four weeks?” you respond in disbelief: “And then we can start sending out the first custom offers via email?”

“No, in four weeks we’ll be able to link the behavior data from the existing system with the CRM. After that, we still need to extend the user configuration system with the email settings and connect the authorization process to our mail server,” the lead developer explains, slightly annoyed. “And then we need to adjust our CI system to deploy the changes to the conversion service with the existing system. These changes will take an additional one to two sprints.”

Two months of development time for two developers for this feature? You do some quick math: With estimated costs of 700 euros per developer day and roughly 80 days of development, this feature will cost you more than 50,000 euros. And that doesn’t even account for the costs of other company stakeholders. Is it still worth it?

Essential vs. Accidental Complexity

It depends. Some projects are inherently complex, and no matter how we twist and turn our estimates, it won’t get any simpler.

However, in most cases, high estimates can be attributed to other factors. To implement a feature, we have to deal with complexity that doesn’t stem from the feature itself, but from the circumstances surrounding us.

Our system is split into several services, distributed across a network, with geo-redundancies and different database technologies. The deployment system is outdated, and we need a feature from the new version. There are countless technical reasons for such avoidable Accidental Complexity.

Accidental Complexity often arises when the team prepares for future requirements. For example, the product owner presents a grand vision during pre-production. We want to serve hundreds of different customers, offering them limitless integration possibilities with our system.

This product vision triggers a desire within the development team to prepare. To serve hundreds of customers, we need to distribute user data across multiple databases. The seemingly infinite integration possibilities lead to the idea of implementing a (micro-)service architecture.

Before we know it, a complex system is modeled, which is expensive to implement and maintain. Often, this vision is never fully realized. After two years of development, we find that our product only implemented one integration option, which isn’t even in demand in the market. Yet, we spent months developing a service-oriented architecture – a system that also requires additional maintenance effort.

The Cost of Accidental Complexity

The true cost becomes apparent when we consider the consequences. For example, an integration service could allow customers to be informed about new products on the platform via webhooks.

As described above, this API is developed as a separate service in our architecture. When we add a new product to our system, our webhook service needs to receive the necessary data.

There are, of course, several ways to implement this. If the team prepared for many potential integration services, they might have installed a message queue as middleware. So, our system now contains three components.

From a developer’s perspective, setting up a message queue is quick.

In the development environment, the Docker-Compose file is expanded to include a new service. A library for message handling is added to the main application. The logic for packaging the necessary data is quickly written. The team repeats this process on the side of the integration service, and the superficial requirements are met.

However, when this setup moves to production, additional requirements become apparent. Provisioning the service in production and test environments needs to be implemented. In many companies, this requires collaboration with an external system administration team.

This service falls under the Service Level Agreement (SLA) of the product, necessitating the development of monitoring and alerting for the message queue. Even if the team is knowledgeable about the relevant metrics to ensure operational stability, additional days to weeks must be invested in development.

Metrics must be captured during deployment, directed to the existing monitoring stack, and meaningfully processed within it. Alerting rules for the new service must be established and tested.

Soon, it becomes apparent that, similar to the database, changes to the system necessitate migrating existing broker settings. Some changes, like introducing new queues, can be implemented in a blue-green deployment without downtime; others represent breaking changes that cannot be made without downtime.

These changes and their correct execution on deployment day also need to be automated, requiring further investment in developing a migration strategy and versioning our message queues.

Gradually, the team discovers gaps in their solution’s provisioning. The allocated memory is insufficient, the queue size is too small in some scenarios, developer access for debugging is unavailable, and the provided admin interface isn’t exposed externally.

End-to-end testing also proves to be more complex, as tests must also bring up the message queue, significantly extending runtime. To mitigate this, dynamic port assignment is necessary for parallel execution.

While this list of additional expenditures is certainly not exhaustive, it illustrates the wide-reaching impact of adding another system component. All the examples provided apply to the separate integration service as well.

So, where does this leave us? We’ve learned from the market that the initial product vision did not materialize. We’ve built only one integration, yet the complexity of our system is immense.

This resulting complexity must be shouldered by the team with every change, leading to seemingly simple requirements requiring a long development time. This, in turn, means more employees are needed to achieve equivalent progress at the same pace.

Furthermore, there’s additional effort required to onboard new employees. Every team member must understand and comprehend the described setup.

And, of course, this architecture leads to higher operating costs. Ultimately, the entire functionality could have been encapsulated within a single application, as a deployment monolith.

Preventing Accidental Complexity - Some Tips

As is often the case in software development, there is no one-size-fits-all solution. The advice to always start with a monolithic application might be good in most cases, but not all.

The way Accidental Complexity manifests within a system can also vary significantly. Perhaps a reactive API was chosen at the outset when the classic thread-per-request model would have satisfied all requirements.

Or a NoSQL database solution was used when, in practice, only relational datasets were present.

The reasons for Accidental Complexity in a system are multifaceted. What is seen as Accidental Complexity today might not have been considered as such at an earlier point due to different requirements.

Another cause is the well-documented phenomenon of overengineering or gold plating. A developer implements (technical) features beyond the requirements, in perceived anticipation of future needs. While done with good intentions, many of these features are never needed, leaving behind a more complex system.

It’s crucial for the team to regularly take a step back and reevaluate their solution. Ask yourself: With my current knowledge, how would I design my system? What would I do differently? What components would I eliminate? Which would I add?

Perhaps deploying the CRM as a separate service wasn’t the right decision. Or maybe you can replace your external authorization service with a simpler system within your existing solution. Are the non-standard processes in your deployment truly necessary, or can you switch to a standard implementation?

Viewing your solution from the perspective of a newcomer, unaware of the project’s history, reveals complexities that aren’t immediately necessary for your solution.

And when you find them, be bold and simplify your solution. With each additional technical debt, restructuring becomes more challenging.

Content

7 minutes to read