Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Support ticket for de-duplication of files within (Google+) Takeout archives (fixato.org)
1 point by dredmorbius on Dec 17, 2018 | hide | past | favorite | 1 comment


As the Google+ shutdown looms, users and communities are wrestling with whether, and how, to archive their data.

One problem we're encountering: Google's Data Take Out pads its archives with gigabytes of redundant data, mostly image files. Archives of up to half a terabyte have been reported.

People are trying to access this data over mobile links, metered links, slow residential broadband, dial-up, or unstable and slow links in Africa, Asia, and Indonesia, with little success. De-duplicating data would help tremendously.

My own was 15 GB, of which over 95% were images, most of little value, though I've no way of excluding them from the collection. The textual content I'm interested in is under 500 MB (and probably a small fraction of that).

Google have been almost completely noncommunicative since the G+ shutdown was announced, with two exceptions of which I'm aware.

The first shortened the time-to-live of the platform by another 4 months, a 40% reduction over the initial notice.

The second ... slightly ... improved instructions for Data Take Out.

(I'm moderator of the Google+ Mass Migration community and am helping coordinate other activities in moving people off Google+, preserving and porting data, and keeping communities together.)

In the even the server is hugged to death: https://web.archive.org/web/20181217103934/https://fixato.or...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: