
Ask HN: New Architecture for Legacy Spagetti - simonsaidit
Lets say you have this legacy spagetti of 100s of services and different data sources and you want to start thinking about a better architecture as there is a need to expose a lot of this data via apis and get away from point to point integrations. Its different databases, datawarehouses, transports. Its a a state Where people hardly know where to find the different data and there is a lot of duplikation and inconsistency.
What would be worth looking into going forward to slowly transition to a somewhat standard way of handling integrations and data management.<p>Is there any buzzwords that would make this easier.<p>My own thoughts would be an integration platform, some api management product, event-sourcing architecture... Perhaps Kafka or similar, graphql api.<p>There is no way of migration of data to to some master data format but is there ways to add a frontend on different databases eg postgresql external source and add graphql on postgres or does this sound crazy.<p>Is there data lake software, or graph databases that could load All this data in its own format and connect the dots or is the best still to leave it as it is and expose it via their own apis.<p>Any ideas are welcome.
======
sethammons
Step one: high-level acceptance tests. Does the system as a whole do what it
should do from the end user perspective? Add your highest priority cases
first: the functionality that if we it breaks, your business is at risk. Keep
adding more cases.

As for many services, are these to separated out to help teams from stepping
on each other? I firmly believe that if you have many teams running their own
services, that each service should control its own data store, and any
communication that does not need an immediate response should happen through a
message bus of some sort. If the response is real time, a network call makes
sense. However! If you have micro service hell and it is not for scaling
teams, you might benefit from going back to fewer services doing more things
with shared data stores (but well defined data boundaries and packages). If
you have a tangled mess of services calling each other and services dependant
on other services and cyclical dependencies, something is wrong. Dependencies
should be one direction. If two services share many dependencies, it is worth
asking if they should be combined to one service.

Is there an API gateway that allows one place for requests to go? That could
help with the confusion of where to find data.

Whatever the desired architecture is, migrating to it requires refactoring and
that requires tests. You need to be confident that a change does not break the
system.

~~~
marktangotango
I'd add that an incremental approach is also important, determine the 20% that
will give the biggest bang for the buck. Then the next 20%, etc.

------
verdverm
With graphql, you could provide a unified access point and the resolvers can
go to different backend data stores. This does depend on the consumer.

Someone will surely try to convince you of a data lake product. I would leave
them separate and then migrate components as appropriate.

Kubernetes and Istio / service mesh, more so, are things to consider. Search
for cloud native design patterns and the paper by Brendan Burns

------
was_boring
What your looking for is not new or revolutionary. It's just an abstraction.
Read about the strangler pattern and it will get you pretty far.

~~~
simonsaidit
A lot of the current services and data will stay as is for now but new needs
to be created on the basis of this mess. So one thing that would be of benefit
is to create this uniform interface towards a lot of different things while
adding access control, Security and such in a consistent manner. Graphql being
one of the things Im looking at but also data virtualization solutions like
denodo or Redhat and stuff like neo4j+Graphql seems interresting. On top of
that we would consume and build new api and services in a more fixed schema.
Adding api mgnt

Atleast its my initial thought that it would shorten the development cycle if
all data could be found one place and acccessed the same way rather than now
where people spend months just analyzing Where to find and connect to it.
Hoping someone had experience with some of the techniques and Can say somehing
thats more than marketing you read at a vendor site.

