Loading...

HashMap

June 25, 2024
4 minutes to read
Share this post:

Hi,

HashMap? How can HashMap be a topic for the newsletter? Hold on!

The topic came to mind because I was in an interview for a client yesterday and spent quite a long time discussing the HashMap. The experience was a bit of déjà-vu. 2-3 years ago, I was heavily involved in recruiting. During that time, I conducted over 150 technical interviews. And from that period, I learned a lot about our industry. First and foremost, we lack seniority and fail to sufficiently develop young developers.

And I won’t tire of emphasizing this.

And when you discuss the HashMap, this becomes apparent.

But let’s start from the beginning - in case you’re not a developer and have no idea what I’m talking about: A HashMap is a map implementation. It is a data structure that stores a value to a key. So, if I want to have a map of translated words, I can fill it with a pair like (Butterfly, Schmetterling). And if I want to know the German translation of “Butterfly” later, I can ask the map for it.

A HashMap is a special implementation that internally uses the mathematical properties of a hash to access data very efficiently. With this implementation, it is easily possible to manage maps with millions of entries.

Many developers don’t even know this basic concept. But it goes even further. For a HashMap to work correctly, it must be able to generate an immutable hash for every object used as a key. So, we need to understand what a hash is. In Java, the class from which all objects inherit (Object) has a standard implementation of hashCode(). The standard implementation uses the object’s memory address as a basis. This ensures that an object’s hash cannot change during its lifetime.

However, this implementation is not helpful for the HashMap. An object that is semantically equal does not automatically get the same hash. In most cases, I don’t want to drag the object reference across my application but rather create the semantically same object and then retrieve the associated value from the HashMap.

So, the question is, how do I correctly implement hashCode()? But I don’t want to go into detail here - there is an excellent explanation in Josh Bloch’s classic “Effective Java” on this topic. If you’re a developer and I’ve already lost you here, buy the book right now. You are missing essential basics.

It doesn’t stop here. Once I’ve implemented hashCode(), I need to understand that a hash is a mathematical projection. This means multidimensional data is reduced to a single value. This operation has the property that information is lost. Consequently, you can never reconstruct the original object from a hash. And it also means that collisions can occur. So, two different objects can generate the same hash.

And precisely because of this property, it’s not enough to implement hashCode(). You also need a correct implementation of equals(). Otherwise, the HashMap cannot determine if the object is indeed the right one.

And equals() is not trivial to implement either. The correct implementation of equals() must be (mathematically) reflexive, symmetric, transitive, and consistent. Naturally, inheritance must also be taken into account.

And if I’ve implemented this correctly - then my HashMap will finally work.

This all sounds very complicated. And for the layperson, it is. But these are fundamental concepts for software developers. I expect software developers to understand these fundamental concepts.

HashMap implementations exist in all languages. The concept is the same everywhere. And they are used in all kinds of libraries.

Instead, I hear about team agreements where teams have mutually banned each other from using HashMaps. “Because they don’t work.”

The implementation of equals() and hashCode() is made trivial with libraries like Lombok or with languages like Kotlin. Just use a value class. But then developers wonder why they have strange behavior in their entities when they implement them as data classes.

I already talked about this in my (k)lean Kotlin presentation in 2021.

These remain fundamentals. This confusion wouldn’t exist if developers understood how a HashMap works.

We lack senior developers who can explain these details to young developers.

Rule the Backend,

~ Marcus

Top