A microservice journey - part 1: before we begin

It's been a while since posting something, but I feel like I would like to add to the millions of posts about microservices.
I was considering avoiding adding to the already overpopulated mess of microservices articles, but I thought what the hey.

I work for a large retailer in Australia and we were having a few issues with our platform.
The very first step was to identify what exactly was our problem and to make sure we chose a path which would alleviate some of those problems. We had some technical problems, but we also had a bunch of business ambition which needed to be achieved along the way. Lets never forget, it the business which drives our decision right? There is a buzz around microservices at the moment. But it's important to understand your problem and make sure it is the right approach.
The problems we had were

Scalability
Availability
Dependancies
Domain knowledge
Experimentation
Long deployment cycles

Scalability

The problem we had with scalability, was mainly due to the fact that we had one database server. Database servers can only scale one way, and we were already at that limit. So we could fix the performance of those queries, and we did. or we could use different storage vehicles for different parts of our system. By moving parts of the system into different data stores, even if it was SQL, we would be able to scale the different part independently effectively giving us a way to scale a vertically scaleable tool such as SQL into horizontal partitions in the form of multiple SQL servers. Once we can understand that SQL is just a tool to be used for relational type problems, and can embrace NoSQL data stores, the problem of having so many domains, each with their own datastore is trivial. Why not have a datastore per query?

Availability

With the SQL problem, comes multiple issues with Availability. If we have one big database, and everyone is touching it, and playing with it.. expecting it to return historical data trends for the last 3 years, etc. then there very quickly becomes an availability problem, where SQL becomes unavailable, and everything dies.
By moving data into different data stores, we introduce a different availability problem. If the new service(s) is not available, the request can not complete.

We looked at various different microservice approaches, and we landed on a particular pattern, of

decentralising data, and centralising logic.

We use an event stream to publish a Business events, with the appropriate data in the payload and the downstream microservices take a copy of the data or a subset of the data they need build their own repository of information. Now the availability problem is gone. By decentralising the data into each service, but centralising the business logic for the function to just that one service, there are no availability concerns to consider, or rather a lot less. Each service can work on its own, without the need for any other service to be available. If a service is unavailable, the event stream will hold the business event, until the downstream consumer becomes available again.

With this approach, we introduced another problem, in that the central point of failure is the event stream. So making sure that this thing is up and running to critical!! Redundancy, Monitoring, Recovery are critical to the success of this system.

Dependencies

We currently have a couple of really big systems, each with a trove of people builds, and making things better. However, to truly accelerate our development, we kept bumping into the old dependency problem. Teams would love to be able to build stuff quickly, but it also means 3 other teams need to build something before we could build ours. those three teams had another six teams collectively they were waiting on, and so on and so on. By having these massive central systems, it was extremely hard to throw more resources at it, as more resources, means more tripping up, more code merges, more people to train on the "right way" of doing stuff, and more people waiting for other people to finish stuff. The business was also eager to start 10, 20, 30 new initiatives and was willing to throw a tone more resources at it, but unfortunately, the more resources they threw at the problem, the more people were just waiting. Our goal was to be able to get a new idea, hire a bunch of new people, and have them deploy to PROD within a week of starting. We would never be able to do this, if we had such deep dependencies.

Domain knowledge

Along with the dependencies, these big systems also commanded a really, really long learning ramp-up time. What would be easier? hire a new developer and ask him to build a brand new component however you would like to do it? or Hire a new developer and ask her to make a single line change to a system which was built over 20 years, with who knows how many developers, with a mix of old and new technology, patterns, and libraries being decommissioned, and transitioning to the new approach?

This task was impossible. even with the domain knowledge firmly grasped, a random problem would come up which stumped everyone, because there is one bit of the code, nobody expected to do that, just did... but only when this once a year condition was met?

Experimentation

Another aspect of the architecture which was important was the growing need to be able to build new stuff quickly, experiment with our customers, and throw it away, or keep building it out if it proved to be a hit with customers. The nature of the industry meant if we didn't delivery value often and always, someone else would, and that would be the end of that. Our current large single application meant long release cycles, with lots of regression testing and lots of merging of project work. So the effort to release each fortnight was tremendous.
Our goal, was for each team to build their own component, and be able to deploy whenever without affecting the entire ecosystem. No need for any other components to be deployed, but it would appear for some customers as a new feature within the site. If successful, we could roll on the entire customer base on the new feature, otherwise, no harm no foul, just throw it away.

Long deployment cycles

With a big system, comes a big regression test suite comes a big merge nightmare. We could never possibly delivery lots of value quickly if we have a 2-week release cycle. Even if we throw more people at it, the problem would just get bigger, and would make everything even slower.

Next steps

We identified the issues and started thinking about microservices. But before we continued down this path, we needed to ensure the most important thing before we fully committed to it. Buy-in.

Without a common understanding of the problem and a shared belief that microservices would solve the problem there is no point in changing your entire IT stack.

I also worked at a largish retail bank previous to this and we looked at microservices, but without understanding why and without buy-in across the board it didn't really go anywhere.

Changing an architecture from a monolith is a massive task and one that can certainly not be done alone. The first level of commitment required is from the senior leadership team. It will be hard, expensive and there will be plenty of mistakes along the way, so to have a real commitment from senior stakeholder is vital for the persistence required to make it all work.

Next, the team around you really needs to understand what we are trying to solve with microservices, what is important to adhere to and what can be a little flexible.
Microservices give the teams a lot of flexibility to build services as they want, and can be delivered with great speed. But with great flexibility comes great responsibility. It is not just a free for all, do whatever you like.
If everyone is on the same page, and understands the core principals of the architecture, then everyone is free to build stuff within the framework. Architecture then becomes just coordinating between teams and making sure there is some alignment between different teams.
What this leads to, are high performing teams who feel like they own their services, and are empowered to do whatever they like. Some really interesting stuff gets built when diversity of thinking is embraced and encouraged.

To me, this is the core benefit of microservices. Every developer loves to make decisions and come up with cool designs. By embracing this concept and letting teams have freedom, we build up high performing teams that can be proud of what they are doing. Because they didn't just get told what to do but had the freedom to make the decisions and come up with the designs.

Good people build good software.

If you want good people to stay, let them do what they are good at.

The Analogy

Its always nice to have a good analogy which can help people understand the benefits, and help solidify what you are trying to achieve.

Our exiting monolith is like a block of units and our new microservice(s) are like a community village.

In our current block of units,

when we are building new apartments, and when there are several apartments being built, everyone need to use the same stairwell and lift. And builder get stuck behind others
when we want to build a new unit. There is far more planning required, and before being finished we need to test the whole structure has not been compromised by the new room.
sometime builders bump into walls and stairs making little dents that need to be fixed again.
when we get new builders they need to understand the building structure before they can start building.
so adding new rooms is very hard because of all the above.
when more people want to use a room, it's hard to make just one room bigger
if we find a good prefab system, we can't just add it in. It need to become another unit block.
because we don't have a good way to move between the block, we need to build custom bridges and ropes to connect to other units.

In our new village,

each service is a separate house
the roads between all the houses is the event stream
each team can build their own house they want to, but they all need good foundation, a solid door, and a good roof.
using the roads we can try multiple houses and choose the one which is the best for our problem
we can easily put a prefab house in and build a road to connect it to everything.
we can start with a simple house and small road so people can test it. As more requirements come along it can be build and expanded as needed.
if someone is using on house more than any other, we can make it a double story.
new builders can come and start building without too much knowledge required about existing block of units, they just need to know to build a house
when a new thing happens in one of the houses. It send a message van. Anyone who want to know about thing that happens get a separate van to deliver the message. If there is nobody at home at one house, the van waits outside the house until the owner comes back. So each house will always know what has happened and more importantly will always do what is required and never miss a van..

Search This Blog

Bits and bobs