The main reason is that it took tens of years to build Google Analytics as it is and Google has the advantage to be able to provide more information about the user such as demographic data (gender, age, etc) since they have all the data in-place.

Having said that, there is no need to create an exact copy of Google Analytics because most of the people probably use only 20% of the features anyway. Each business has its own use-case and data source so it would be much more convenient to ingest all the raw event data into your data warehouse either using third-party tools such as Segment or open-source tools such as Snowplow and Rakam. This is the only way to have full control over your data.

1. If you don't want to store sensitive user-data, just don't send it to your servers.

2. Create the reports either using SQL or something like Rakam that provides you an interface similar to Amplitude / Mixpanel but on top of your data-warehouse so that you don't need to share your data with a third party service.

Shameless plug: I'm working for the company behind Rakam. (https://rakam.io)

