Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Would you try an app to consolidate and search your digital footprint?
6 points by thatgurjot 7 months ago | hide | past | favorite | 24 comments
The other day I spent a couple of hours trying to extract the data for my runs from Nike Run Club's servers. They have shut down their public API but thankfully some GitHub repos presented working solutions.

As I browsed through ten years of my run data, I had the idea of creating a software that let's you extract all (or most) of your online data and then search it using natural language queries. It is intended to help you piece together the puzzle of your digital footprint/history.

For example, by integrating with cloud storage (Dropbox/Drive), browser history (Firefox/Chrome), fitness apps (NRC/Strava), blogging/note-taking software (Obsidian/Bear), bookmark services (Pocket/Omnivore) and entertainment apps (YouTube/Spotify), you should be able to get answers to questions like –

* "What was my average running pace in July 2019? Show me all of my trail runs from that time." * "What all recipes did I save in 2020?" * "When did I start writing about TypeScript? Which tech blogs was I reading at that time?" * "What videos was I watching between 2015-2017?" * "Show me the artists I discovered on Spotify during the pandemic."

My intention is to build a local-first and privacy-first solution with a simple SQLite database. The user should be able to do whatever they want with the extracted data. A savvy user might build their own GUI, while a not as savvy one might just like to archive their data in a personal storage server.

Does this sound like a good idea? Is it something that you would want to try? Do similar solutions already exist?

P.S. When I told my brother about it he called it "the god app" which I thought was pretty funny and accurate.




Would I use it? Probably once just out of curiosity.

Do I think I'll see it in my lifetime? Not a shot. The odds of someone gaining access to all the data in all the services I use/used have to be close to nil. Companies aren't going to give up their user data, that's their bread and butter. Each service will need to have custom extractors written, and likely rewritten in a never ending game of cat and mouse. That's even if you don't get sued for accessing their systems to extract their data they have on your user.

Then the storage required if you are able to get all the data. I requested my data from Apple a few years ago. It was something like 10GB of information. I assume shopping, social media, fitness, vehicle, mapping, etc. services have similar amounts of data on me. I wouldn't be surprised if the average digital identity has 1TB+ of data associated with it. Then, you have to normalize all the data. Each service is going to have the data in its own format with their own nuances that'll be a huge pain to get to a singular searchable format.


That’s a fair point! Exactly what my brother pointed out when he called it the “god app”.

However, this isn’t intended to be the “ultimate” app with your entire digital footprint logged to a database. It’s supposed to have a few meaningful connectors to services that people would actually be interested in. And a lot of these services offer public APIs as well (Spotify, Omnivore, Strava). Plus, browser history and local files (like markdown notes) are easy to access.

Do you think this reduced scope would be of any interest to you?


Sounds like a good idea with some hard challenges. Cloud storage, browser history, bookmarks, and blogging/notes will generally be easier. Fitness won't be unless you're using Apple Health or Garmin. It sounds a lot like IFTTT with some AI thrown in.

What have you done so far?


I would say that it is different from IFTTT and Zapier because it is not meant to “do something” when something happens in your cloud service. It’s only meant to periodically save your data in a local database on your computer. Plus there’s a GUI that can help you query that data (which is where the AI can come in, if necessary).

So far I have worked out the connectors for a handful of services (prioritising those with local data and public APIs) and a general-purpose database schema to store and query the data.

I have an existing app (made using the Tauri framework in Rust + Svelte) that acts as a local file search engine that I hope to use as a starting point.


Very cool. Consider posting screenshots to the thread!

I do a bunch of this, in a set of messy Python scripts that need refactoring, for personal use. It's definitely largely doable, but not without it's challenges for some services. Thankfully, at a personal level, I can choose not to use those services and move to something that is more open.


Thanks for the vote of confidence! I’ll post screenshots when I have something more than haphazardly put together Python scripts :)

You are right about switching to an open service! I am prioritising such services. Like Omnivore[1] is for bookmarking.

[1]: https://omnivore.app


This is called prototyping and there's not a thing wrong with it. Good luck and keep us updated.


I have one ... it's called IFTTT.com

heck! I have a couple different accounts on it for different purposes :)


But IFTTT or Zapier don’t have the integrations for exporting your historical data, right? At least I didn’t find those plugins!


Nope, definitely not. Not going to give all my data to some app.


The app is only going to extract your data from those services and save it to a database on your local device. I am not proposing a rent seeking SaaS solution. Just an open source solution for a problem, say like Calibre is for ebooks.


Yeah, still wouldn't do that. A vulnerability with just that app is enough to get my data from all those other places, too risky for me


Fair point. Thanks for sharing your perspective!

Would you consider it a risk even if the app was open source?


Isn't this what Microsoft was trying to build with Recall?


I’d differ. Recall is more akin to what Rewind AI[1] is building. I don’t care to know what happened in a chat box in my Zoom call last Friday. I just want to be able to explore structured data that I have generated across different services.

For example, Nike Run Club offers some stats on the app but I’d like to be able to do more. Spotify shows me I follow this artist but I’d like to know since when have I followed them. YouTube has my watch history but I’d like to be able to process what kind of content I was consuming during a certain time period.

[1]: https://www.rewind.ai/


So not a point in time snapshot to go back to, but natural language analysis of aggregate data? I can see how that'd be useful.


Exactly! Consider you have an SQLite database on your computer with your aggregated data from Strava, Spotify, and everything else. You do whatever you want with it. The app will come with a GUI that will allow some additional analysis or querying but you do your thing if you like!


Yeah, that'd be cool. I'd love to feed it my fitness, health, etc. data.

I feel like it was always a missed opportunity with Google... they had all my Google Fit, Google Health, 23andme, Gmail (and thus order confirmations, healthcare appointments), text messages, Google Maps locations, etc. data that I expressly opted-in for, hoping that one day they'd be able to use that to provide actionable insights. But nope, it was all just ads and spam :(


Yup! Are there any specific services that you use regularly? Ones that you would like to use in this “god app”?

For me it is: Nike Run Club, Firefox, Omnivore, YouTube, Spotify and Apple Health.


Yes, Google Fit and Google Maps (now that the timeline data is on-device). I also use RideWithGPS instead of Strava, but never tried to see if it has historical ride data.

But the big one for me is really Google Maps location history. It's tracked everywhere I've been for the last 15 or so years, down to the hour, and it's really useful for figuring out when exactly a certain trip was, or that car accident, or the last time I visited a friend or whatever.


Ah that’s quite an interesting use case. I have never used Google Maps as extensively - very point to point usage for me. But I’ll check their API out! Thanks for sharing!


FYI, before you go digging too far, the shift to on-device Timeline storage was recent (last month or so), so I don't know if they have a public API for interacting with it yet. Prior to that, it was a proprietary in-app thing that lived only in the cloud and could only be accessed via the web or mobile apps.

I am not sure what format that on-device data is, or if it might be accessible via Google Takeout (their data export platform) instead of an API.


Oh I see. Thanks for letting me know. If it’s obscured behind a cloud like Nike Run Club now, there might still be a way. But maybe I’ll start with simpler services with an existing public API like Strava :)


It looks like they're still figuring out the transition :(

I just tried a Google Takeout export of the Timeline data, but it was an empty skeleton of the previous Timeline that just has a text file saying "You have encrypted Timeline backups stored on Google servers."

I have no idea where that actual backup is, either on the phone or on the server. There seems to be no way to access it anymore, lol =/

Anyway, nevermind my niche use case. Good luck with the app and excited to see how it comes out!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: