
Support ticket for de-duplication of files within (Google+) Takeout archives - dredmorbius
https://fixato.org/temp/archived/gplus-feedback-takeout-duplicates-usage-analysis.html
======
dredmorbius
As the Google+ shutdown looms, users and communities are wrestling with
whether, and how, to archive their data.

One problem we're encountering: Google's Data Take Out pads its archives with
_gigabytes_ of redundant data, mostly image files. Archives of up to half a
terabyte have been reported.

People are trying to access this data over mobile links, metered links, slow
residential broadband, dial-up, or unstable and slow links in Africa, Asia,
and Indonesia, with little success. De-duplicating data would help
tremendously.

My own was 15 GB, of which over 95% were images, most of little value, though
I've no way of excluding them from the collection. The textual content I'm
interested in is under 500 MB (and probably a small fraction of that).

Google have been almost completely noncommunicative since the G+ shutdown was
announced, with two exceptions of which I'm aware.

The first shortened the time-to-live of the platform by another 4 months, a
40% reduction over the initial notice.

The second ... slightly ... improved instructions for Data Take Out.

(I'm moderator of the Google+ Mass Migration community and am helping
coordinate other activities in moving people off Google+, preserving and
porting data, and keeping communities together.)

In the even the server is hugged to death:
[https://web.archive.org/web/20181217103934/https://fixato.or...](https://web.archive.org/web/20181217103934/https://fixato.org/temp/archived/gplus-
feedback-takeout-duplicates-usage-analysis.html)

