I might have a different take. I think microservices should each be independent such that it really doesn't matter how they end up being connected.
Think more actors/processes in a distributed actor/csp concurrent setup.
Their interface should therefore be hardened and not break constantly, and they shouldn't each need deep knowledge of the intricate details of each other.
Also for many system designs, you would explicitly want a different topology, so you really shouldn't restrict yourself mentally with this advice.
At first this sounds cool but I feel like it falls apart with a basic example.
Let's say you're running a simple e-commerce site. You have some microservices, like, a payments microservice, a push notifications microservice, and a logging microservice.
So what are the dependencies. You might want to send a push notification to a seller when they get a new payment, or if there's a dispute or something. You might want to log that too. And you might want to log whenever any chargeback occurs.
Okay, but now it is no longer a "polytree". You have a "triangle" of dependencies. Payment -> Push, Push -> Logs, Payment -> Logs.
These all just seem really basic, natural examples though. I don't even like microservices, but they make sense when you're essentially just wrapping an external API like push notifications or payments, or a single-purpose datastore like you often have for logging. Is it really a problem if a whole bunch of things depend on your logging microservice? That seems fine to me.
> Even without a directed cycle this kind of structure can still cause trouble. Although the architecture may appear clean when examined only through the direction of service calls the deeper dependency network reveals a loop that reduces fault tolerance increases brittleness and makes both debugging and scaling significantly more difficult.
While I understand the first counterexample, this one seems a bit blurry. Can anybody clarify why a directed acyclic graph whose underlying undirected graph is cyclic is bad in the context of microservice design?
This is a fair enough point, but you should also try to keep that tree as small as possible. You should have a damn good reason to make a new service, or break an existing one in two.
People treat the edges on the graph like they're free. Like managing all those external interfaces between services is trivial. It absolutely is not. Each one of those connections represents a contract between services that has be maintained, and that's orders of magnitude more effort then passing data internally.
You have to pull in some kind of new dependency to pass messages between them. Each service's interface had to be documented somewhere. If the interface starts to get complicated you'll probably want a way to generate code to handle serialization/deserialization (which also adds overhead).
In addition to share code, instead of just having a local module (or whatever your language uses) you now have to manage a new package. It either had to be built and published to some repo somewhere, it has to be a git submodule, or you just end up copying and pasting the code everywhere.
Even if it's well architected, each new services adds a significant amount of development overhead.
the problem with "microservices" is the "micro". Why we thought we need so many tiny services is beyond me. How about just a few regular sized services?
This seems cool if all you need is: call service -> Get response from service -> do something with response.
How do you structure this for long running tasks when you need to alert multiple services upon their completion?
Like what does your polytree look like if you add a messaging pub/sub type system into it. Does that just obliterate all semblance of the graph now that any service can subscribe to events? I am not sure how you can keep it clean and also have multiple long running services that need to be able to queue tasks and alert every concerned service when work is completed.
This actually makes a lot of sense. I have one question though. Why is having 2 microservices depend on a single service a problem?
The restriction to a polytree might be useful -- but only with quite a few more caveats. In the general case, this is absurd; having dependencies that are common to modules that are themselves dependencies of some single thing is not inherently wrong.
Now, if that common dependency is vending state in a way that can be out of sync along varying dependency pathways, that can be a recipe for problems. But "dependency" covers a very wide range of actual module relationships. If we move away from microservices and consider this within a single system, the entire premise falls apart when you consider that everything ends up depending a common kernel. That's not an architectural failure; that's just a common dependency. (Process A relies on a print service, which depends on a kernel, along with a network system, which also depends on the kernel. Whoops, no more polytree.)
This is the sort of "simplifying" heuristic that is oversimplified.
For microservice count N > 10, if your interdependence count k > 2.867N − 7.724, you are better off with a monolith. The assertion is based on a complexity metric, that has been correllated with cognitive and financial metrics. This came as an interesting side discovery when writing Kütt, Andres, and Laura Kask. "Measuring Complexity of Legislation. A Systems Engineering Approach." In International Congress on Information and Communication Technology, pp. 75-94. Singapore: Springer Singapore, 2020.
What are you trying to protect yourself against?
1. Microservices imply distributed computing. So work with the grain on that - which is basically message passing with shared nothing resources. Most microservices try to do that so we are pretty good from a technical pov
2. Semantic loops - which is kind of what we are doing here with poly trees. This is really trying to model the business in software
Now here comes the hard part - this is not merely hard it’s sometimes bad politics to find out how a business really works. Is think far more software projects fail because the business they are in is unwilling to admit it is not the shape they are telling the software developers it is. Politics, fraud or anything in steer.
It doesn't seem possible to maintain the property.
Let's say legal tells us we need a way to let a user delete all of their data. All data is directly or indirectly user data, so we need a request to go to all services.
Examine the first polytree example: https://bytesauna.com/trees/polytree.png
The delete request must go to at least n1 and n4, which can pass below in the heirarchy. If we add some deletion service that connects to both, it's no longer a polytree.
I suppose you could redesign your services to maintain the property, but that would be quite the expense.
It's about the same for most code all the way down to single threaded function flow.
I think what the article is doing wrong is treating all microservices the same.
Microservices can be split into at least 3 different groups:
- infrastructure (auth, messaging, storage etc.)
- domain-specific business logic (user, orders)
- orchestration (when a scenario requires coordination between different domains)
If we split it like this, it's evident that: - orchestration microservices should only call business logic microservices
- business logic microservices can only call infrastructure microservices
- infra microservices are the smallest building blocks and should not call anything else
This avoids circular dependencies, decreases the height of the tree to 3 in most cases, and also allows to "break" the rule #2 in the article, because come on, no one is going to write several versions of auth just to make it a polytree.It also becomes clearer what a microservice should focus on when it comes to resilience/fault tolerance in a distributed environment:
- infra microservices must be most resilient to failure, because everyone depends on them
- orchestration microservices should focus on compensating logic (compensating transactions/sagas)
- business logic microservices focus on business logic and its correctnessBack in the day an OS called CTOS hosted what were essentially microservices. This acyclic problem was solved there, by not letting the essential OS services ever wait on a service response. It simply registered the outstanding service request and went back to servicing its own request queue. I thought at the time, this was an elegant solution to the deadlock problem.
Here's a really simple way to get a cycle.
Service A: publish a notification indicating that some new data is available.
Service B: consume these notifications and call back to service A with queries for the changed data and perhaps surrounding context.
What would you recommend when something like this is desired?
Requiring that no service is depended on by two services is nonsense.
You absolutely want the same identity service behind all of your services that rely on an identity concept (and no, you can't just say a gateway should be the only thing talking to an identity service - there are real downstream uses cases such as when identity gets managed).
Similarly there's no reason to have multiple image hosting services. It's fine for two different frontends to use the same one. (And don't just say image hosting should be done in the cloud --- that's just a microservice running elsewhere)
Same for audit logging, outbound email or webhooks, acl systems (can you imagine if google docs, sheets, etc all had distinct permissions systems)
So a data flow path that is a dag. Yeah, sounds right.
Also seems close to Erlang / Elixir supervision trees, which makes sense as Erlang / Elixir basically gives you microservices anyway...
I have a question. Does the directed / no cycles aspect mean that webhooks / callbacks are forbidden.
I work a lot in the messaging space (SMS,Email); typically the client wants to send a message and wants to know when it reached its destination (milliseconds to days later). Unless the client is forbidden from also being the report server which feels like an arbitrary restriction I'm not sure how to apply this.
Is there any way to actually enforce this in reality? Eventually some leaf service is going to need to hit an API on an upstream node or even just 2 leaf nodes that need to talk to each other.
All sounds like a good plan, but there’s no easy way to enforce the lack of cycles. I’ve seen helper functions that call a service to look something up, called from a library that is running on the service itself. So a service calls itself. There was probably four or five different developers code abstractions stacked in that loop.
Rule #2 sounds dumb. If there can't be a single source of truth, for let's say permission checking, that multiple other services relay on, how would you solve that? Replicate it everywhere? Or do you allow for a new business requirement to cause massive refactors to just create a new root in your fancy graph?
If a service n4 can't be called by separate services n2 and n3 in different parts of the tree (as shown in counterexample #2), then n4 isn't really a service but just a module of either n2 or n3 that happens to be behind a network interface.
In reality their structure is much more like the Box with Christmas lights I just got from the basement. It would take a knot theory expert half a day to analyze what’s happening inside the box.
This seems completely wrong. In an RPC call you have a trivial loop, for example.
It would make more sense to say that the event tree should not have any cycles, but anyway this seems like a silly point to make.
Take Counterexample #2. Add n5 as another arrow from n3. That looks like a legitimate use case to me.
it's only in theory, in practice not going to happen.
In most of the cases, authorization servers are called from each microservice.
evented systems loopback and it's difficult to avoid it, e.g.: order created -> charge -> charge failed -> order cancelled
Services (or a set of Microservices) should mimic teams at the company. If we have polytree, that should represent departments.
Why do we use polytree in this context instead of DAG? Because nodes can’t ever come back together?
Oh that's weird, in the hacker news search index, this link was posted 4 days ago.
Good practical explanation of something I felt but couldn't put a name to.
Isn't it the same wisdom as to avoid cyclic dependencies?
Hi, this is my company blog. Hope you like this week's post.
The article is not wrong, but I feel like the polytree restraint is a bit forced, and perhaps not the most important concern.
You really need to consider why you want to use micro services rather than a monolith, and how to achieve those goals.
Here's where I'll get opinionated: the main advantage micro services have over a monolith is the unique failure modes they enable. This might sound weird at first, but bear with me. First of all, there's an uncomfortable fact we need to accept: your web service will fail and fall over and crash. Doesn't matter if you're Google or Microsoft or whatever, you will have failures, eventually. So we have to consider what those failures will look like, and in my book, microservices biggest strength is that, if built correctly, they fail more gracefully than monoliths.
Say you're targeted by a DDOS attack. You can't really keep a sufficiently large DDOS from crashing your API, but you can do damage control. To use an example I've experienced myself, where we foresaw an attack happening (it came fairly regularly, so it was easy to predict) and managed to limit the damage it did to us.
The DDOS targeted our login API. This made sense because most endpoints required a valid token, and without a token the request would be ignored with very little compute wasted on our end. But requests against /login had to hit a database pretty much every time.
We switched to signed JWT for Auth, and every service that exposed an external API had direct access to the public key needed to validate the signatures. This meant that if the Auth service went down, we could still validate tokens. Logged in users were unaffected.
Well, just add predicted, the Auth service got ddosed, and crashed. Even with auto scaling pods, and a service startup time of less than half a second, there was just no way to keep up with the sudden spike. The database ran out of connections, and that was pretty much it for our login service.
So, nobody could login for the duration of the attack, but everyone who was already logged in could keep using our API's as if nothing had happened. Definitely not great, but an acceptable cost, given the circumstances.
Had we used a monolith instead, every single API would've gone down, instead of just the Auth ones.
So, what's the lesson here? Services that expose external API's should be siloed, such that a failure in one, or it's dependencies, does not affect other API's. A polytree can achieve this, but it's not the only way to do it. And for internal services the considerations are different, I'd even go so far as to say simpler. Just be careful to make sure that any internal service than can be brought down by an attack on an external one, doesn't bring other external services down with it.
So rather than a polytree, strive for siloes, or as close to them as you can manage. When you can't make siloes, consider either merging services, or create deliberate weak-points to contain damage
What's wrong with just imposing a DAG?
just imagine how many clients services like auth, notifications and so on has.
Polytrees look good, they don't work on orthogonal services
tl;dr: HTTP/REST model isn't great for federated services.
There are other microservice strategies that are built around a more federated model where even having full-on recursion is not a problem.
Avoiding cyclic dependencies is good, sure. And they do name specific problems that can happen in counterexample #1.
However, the reasoning as to why it can't be a general DAG and has to be restricted to a polytree is really tenuous. They basically just say counterexample #2 has the same issues with no real explanation. I don't think it does, it seems fine to me.