El Niño
Author
Marcus HeldHi,
El Niño is back. The surface temperature of the Pacific Ocean periodically fluctuates. This seemingly inconspicuous change triggers a cascade of weather changes that extend their influence to the most remote corners of the globe. Floods in South America, droughts in Australia, and even snowstorms in North America - all caused by a few degrees of temperature difference in the Pacific. This phenomenon occurs every 2 to 7 years. This year, it’s that time again.
We experience the same thing in the backend. A seemingly small event, an additional component, a minor code change. This can lead to massive impacts on completely different parts of our system.
A few years ago, I observed in an application that requests were periodically timing out. The phenomenon could not be reproduced locally. A look at the JFR didn’t provide any new insights. So where to look?
The application was running in Kubernetes. And after a look at the system load, we saw spikes in similar patterns. It was a phenomenon that affected the entire machine. So we involved our trusted SRE, and after only a few words, he knew what was going on. Not only our application was running on the same node, but also a log aggregator. And - of course - it periodically started and grabbed all the resources it could get.
We share resources. Everything is interconnected. When the surface temperature in the Pacific rises, we - on the other side of the world, in Europe - are just as affected. When we run multiple applications on the same machine, they influence each other.
Take away two things from this. First, understand your system. If responsibility for the system is shared with other departments, it doesn’t absolve you. You always need to understand the system. Maybe not in all details, but you need to know what you don’t know.
Second, isolate your components. Consider whether it’s worth the savings if you’re sharing your physical resources with virtualization. Overlooking a setting will happen. Maybe it’s easier just to use two machines? But no matter what you decide, understand it!
Rule the Backend,
~ Marcus