
Ask HN: How do you harmonize user data? - hackerews
If you have an app you probably have user-related data sitting in a bunch of different places.<p>For instance, I&#x27;ve seen companies with app-specific user data sitting in a database, user sales info in Salesforce, user support info in Zendesk, user chat in Intercom, user design feedback in a Google Sheet, and the list goes on...<p>Keeping this info separate is bad user experience. People want things personalized. The support team should know about a user&#x27;s latest design interview, and sales should know about their recent in-app behavior.<p>What are some best practices for harmonizing user-related data? I&#x27;d love to hear some success stories.
======
aartur
Generally the best advice is to conform to the SPOT (Single Point of Truth)
rule as much as possible. For example, the place DEFINING users' addresses can
be your own database and the data in salesforce is only a REPLICATION of that
data. And the DEFINITION of chats is in Intercom and you only have an API for
RETRIEVING the data in your own code.

It helps a lot when the "owners" of data are defined that way, because it's
easy to reason about the flows. You can then have a set of APIs in your code
for accessing (be it the master data/replicated data/live-fetched from an
external system).

------
codingdave
It is called a data warehouse. (and/or Business Intelligence) You can have
disparate systems that each create their own data... and each datum comes from
the system that "owns" that data. But it gets aggregated to a central
location, which owns none of it, but pulls it together for reporting and
analysis.

There are comments here giving some examples of products intended to help with
this, but this is one of those area where the concepts are easy, but execution
is challenging. I'd recommend looking less at products, and more at theory to
decide how you want to approach it... and then go back and see if any of the
products match your decisions.

~~~
hackerews
I'd like to empower business teams to have a single view of the customer. When
providing support, to see how other teams have interacted with this person. To
see info from all of our apps, as well as add their own info in when needed.

A well-designed data warehouse is a myth. As long as I'm the person the
business needs to call to both get data (eg via SQL) as well as add new data
(eg new ETL process), we'll continue to be f'd.

~~~
codingdave
That is quite a defeatist attitude. I've worked with well-designed data
warehouses. Sure, it is a skill to do it well, but isn't everything?

Even if you do need to develop new reports and inputs, so what? Those results
empower the business from then on out. And that is ignoring the fact that many
products exist to empower the users to develop their own reports.

------
seyz
A decade ago, SaaS companies didn't exist. People had all their data (support
tickets, payment details, analytics, …) in their own backoffice.

Then, SaaS came to remove tremendous pains…

• Paypal/Stripe/Braintree/… solve the payments nightmare.

• Zendesk/Intercom/Front/… solve the support nightmare.

• etc…

Nowadays, you have your data from your application in your backoffice and all
data from SaaS in their own web interface. As your perfectly said, keeping
this info separate is bad UX. It can be very difficult to take good business
decisions when data are split.

It is one of the main reason I developed Forest
([http://www.forestadmin.com](http://www.forestadmin.com)). Forest helps web
businesses to have instantly their own customizable admin interface
(backoffice). Forest connect your services (SaaS) to gather all the
intelligence in one place. You have best of both worlds :-)

------
asimuvPR
In my case its been a generic API that connects all services and feeds a
simple dashboard. Everything a user does is shown there in realtime. Data is
formatted for specific actions. For example: the user created a support
ticket. The generic API gets a signal with data from the ticket (user id, etc)
and ads some predefined tags to it and passes it to a dashboard widget. That
way the staff can see everything that's going on without excess information.
It works pretty well because the sales and support team can help each other
over the company chat. It was built on python and uses websockets for the live
updates. Can't open source it due to it being a day job tool, but am working
in something similar (without the dashboard just yet).

~~~
hackerews
This sounds useful!

~~~
asimuvPR
It really is! I'm currently building a hacky version to get it out there and
see how it works out in the real world. Also doing client libraries in
multiple language so people don't need to do a lot of boilerplate API code
just to get some JSON response.

------
tedmiston
The point you've raised about different types of data being stuck in different
services across teams is a real pain point, especially for startups as we tend
to embrace brand new tools.

Segment ([https://segment.com](https://segment.com)) is one attempt to be the
"customer data hub". It never quite did everything I wanted personally. If it
works for you, then you can hook it up to something like Keen
([https://keen.io](https://keen.io)) and get a dashboard with nice realtime
charts very easily.

My most common approach is to let the data sit in third party services, then
pull it together via APIs for reporting.

------
alexatkeplar
RJMetrics Pipeline -
[https://rjmetrics.com/product/pipeline/](https://rjmetrics.com/product/pipeline/)

Fivetran -
[https://www.fivetran.com/integrate](https://www.fivetran.com/integrate)

Segment Sources - [https://segment.com/sources](https://segment.com/sources)

Snowplow - [https://github.com/snowplow/snowplow/wiki/Setting-up-a-
Webho...](https://github.com/snowplow/snowplow/wiki/Setting-up-a-Webhook)

(Disclosure: Snowplow co-founder)

------
abengoam
Take a look at data virtualization. It's an alternative to data warehousing
without the burden of making mutiple copies of the data; instead, it retrieves
the information fron source systems in real time.

The "single view of customer" use case, which is the one you mention in your
post, is one of the oldest that this technology aimed to solve and it does
solve it very well.

Disclaimer: I work for Denodo ([http://denodo.com](http://denodo.com)).

~~~
hackerews
Does this actually work though? For instance, we may have user data in an API
that, thanks to arbitrary rate limits, isn't useful at all when querying in
real time.

~~~
abengoam
You mean not being able to access specific user data due to hitting the rate
limits of the API? In that case, you can always cache the data (using the
cache facilities of the data virtualization layer) and create a hybrid real
time/cached workflow that pulls data in real time when possible and from cache
for the APIs that have restrictions in place.

------
tony-allan
There is no incentive for separate SaaS companies to use a shared source of
data. Each system will have data for it's own needs and will manage that data
as it see's fit.

The best you could do is to have a web page for the customer that shows all of
the details from each app in the same place, using API's from each SaaS
platform to fetch the user data from each.

If you take an example of an email address, it seems like you could just
synchronise it across various systems but that ignores the different purposes
that each email address might have. For example, you might have a legitimate
reason to separate emails from financial apps from social networks, and games.

------
zhte415
SAP is built for this, however:

Being in a very locked-down corporate environment, I used... Excel. A 'master
sheet' output various files to various directories that themselves were
permissioned for user access, updated once per minute. Which is a crux:

If you're trying to tie a range of systems you need to take into account:

* Permissions. If you have user-related data, you probably don't want everyone accessing and/or writing to this data;

* Data quality. You're lucky to get a unique ID across these systems, unless you've got a unique ID policy in place.

------
gbrits
Good question. I've seen some crm-like systems like pipedrive come close to
capturing both traits (static features) as well as behavior /events of
prospects/users. For capturing and using a bit more structured stuff, e.g. 1-n
user-survey, user-emailOpen, etc even pipedrive etc, even pipedrive (being the
most flexibele i know of although that doesnt say much) falls short. Dont know
of any 'customer/contact/profile'-hub like tool that does all of this.

------
twunde
One way to deal with this is to do bulk data downloads into a data warehouse
or your main database. Alternatively, you can spend time integrating the third
party into your systems.

------
dfischer
Domo is working on this I think.

------
hga
While it doesn't cover all user data by any means, one of the secrets to
Goldman Sachs' success, and survival in 2008 plus or minus, is their SecDB
securities database implemented in their own Securities LANGuage SLANG. It
allows them to know all of their positions, play what if games relatively
quickly, etc., without requiring manual scraping of Excel spreadsheets, etc.

It is, at minimum and for all its present day technical deficits, a big
success story.

