Hacker News new | past | comments | ask | show | jobs | submit login

This replication that was possible earlier, will also not happen in the future thanks to closed proprietary systems. YouTube is wonderful library of knowledge that we are legally prevented from making a copy of.

There needs to be bottom-up and top-down pressure for open data standards and also a re-thinking of digital ownership rather than digital licensing. We think people don't care, but the engineers who build these systems are a tiny minority, we only need to convince them to refuse to build walled gardens.




Nobody needs legal permission to grab all the YouTube content they want to keep. Just dedication, youtube-dlp, and a ginormous RAID array full of empty drives. This is surprisingly affordable as a hobby.


Are you aware of a single person anywhere who has mirrored the entirety of youtube? There is obvious value in having a mirror like that, so why hasn't it been done?

Downloading a handful of videos you personally care about is surprisingly affordable as a hobby. Mirroring and archiving the entirety of youtube is not.

Personally, although I couldn't say it was a hobby, I don't watch youtube on youtube anymore. Every youtube video I watch is downloaded first and viewed locally. I can't recommend it enough. Zero youtube comments, zero recommendations, VLC is a far better video player, Google has no idea how many times I've watched a video (or parts of it) or how I felt about it, and I never have to worry about videos I find valuable being removed. As long as I keep backing them up, I'll have them for as long as I care to.


I don’t think that there is any value in grabbing all of YouTube, and neither did I suggest doing that. I meant content that specifically interests you, or is likely to interest you (or is otherwise valuable as a historical record). “All videos” means including videos that have next to zero views, sensational clickbait, Elsagate/baby content, and a whole host of other unpleasant things. Most YT archives/hoarders are selective for that reason (and because space, while cheap, is not infinite).

I quite enjoy using NewPipe on Android. Once you build up a list of subscriptions, it’s by far the most peaceful way to consume YouTube on a smartphone.


  > VLC is a far better video player
Not for my use case, but maybe someone here has a solution. I watch lectures and lessons, as I watch I will change the playback speed constantly. I use a Firefox add-on for keyboard control of the YouTube video stream speed.

VLC also has keyboard control of the playback speed. However, when changing the speed VLC will skip a split second of audio. This drawback negates all the benefits of playing faster over the non-essential parts, because when we get to an essential part I'll lose some if it. This is on Kubuntu, across many versions over the years.


MPV is the gold standard video player on Linux. { and } halve and double playback speed, respectively.


Also [ and ] adjust the playback speed by 10%, and backspace resets the playback speed to 1x.


MPV is great on windows too. It'd be nice if the interface was more discoverable, but the responsiveness and format support are amazing.


I haven't run into that myself, but I think I'd just hit shift+left arrow (or just left arrow depending on what you've got the jump set to) before hitting + to speed the video back up. I'll take a minor inconvenience like pressing an extra key for all the other features I get.

You can probably also create a single macro to do both actions with a single keypress although not with VLC alone which is fair enough since you're using an addon for the functionality you can't get with youtube's player already.


You don’t need everything but you can curate anything.

My neighbor has a collection of 12 years of city council meetings collected, for example.


r/datahoarders will kindly and rightly disagree with you on that regard.

For an enthusiast, a 720TB array is pretty reachable. A dedicated enthusiast can get a 1PB flash array in 2U.


According to this article [1], 500 hours of video are uploaded to YouTube every minute. Depending on the video size and framerate, YouTube recommends up to 240 Mbps for 8k@60FPS [2]. Of course most video isn't that high res. Let's take a conservative guess that it averages somewhere between 2K and 4K and pick a middle bitrate of 24 Mbps. That's:

      24 Mbps / 8 bit/byte * 60 seconds/minute * 60 minute/hour
    = 10800 megabytes per hour of footage
    = 10.8 gigabytes per hour of footage
At 500 hours of footage per minute, that means 5.4 terabytes are uploaded every minute. Your 720 TB array would be completely full a little over two hours' worth of content that is uploaded to YouTube every single day, day after day.

At the current upload rate, 2,838.24 petabytes are uploaded every year.

I don't think you'll see hobbyist archives of YouTube any time soon.

[1]: https://www.tubefilter.com/2019/05/07/number-hours-video-upl...

[2]: https://support.google.com/youtube/answer/1722171?hl=en#zipp...


For archival 24Mbps is an insane bitrate, you could get away with 1/3th to 1/6th of that.

You're also going to limit to public videos (unlisted and private will make up some share of those uploads) and probably to those with non-zero views.

I suspect archiving only videos with >100 views would probably cut the amount you archive to 1/10th.


This is a distinction without a difference, you’re still talking about O(1PB/day).


Some back-of-the-envelope math shows that YouTube would have to populate and rack a minimum of four 4U storage chassis (60 20TB drives each) per 8 hour shift to store that much. Roughly a little less than half a 42U rack. And that's before allowing for HDD drive parity, redundancy, and distribution across the globe.


1080p webm is around 450kbps and audio is 65kbps, so the estimate is of by 50x for the purposes of a hobbyist archive.


YouTube offers a number of codecs and nitrates. IIRC Opus goes up to 160kbps and m4a goes up to 128kbps, with lower bitrates also available. I imagine video is similar.


I wonder how much of the uploaded content is public versus private or unlisted.


I don't think anybody on /r/datahoarders believes it's possible for a private individual to archive the entirety of Youtube. More than 200,000 hours of video are uploaded every day. Generously assuming something like 1 GB/hour for 1080p, that's 200TB per day that you have to add. No home array can handle that.


This is a weird hackernews phenomenon where two sides of a discussion present the technical aspect of the thing they want to do, and are correct in their description of the technical aspects, without addressing the fact that they are talking about accomplishing totally unrelated objectives.

It is probably possible to horde more Youtube videos than you could ever watch, probably including most of the ones that you might ever be interested in. And it is almost certainly impossible for any individual to capture every video which goes through Youtube.

Neither of these seem to address the issue of whether there exist videos which will retrospectively have archival value which are not captured.


> probably including most of the ones that you might ever be interested in.

That’s a really interesting question: how to determine videos that I might ever be interested in.

> whether there exist videos which will retrospectively have archival value which are not captured.

And that’s not really a question: there definitely exist videos that have a certain historical value which were deleted from YouTube, and most of them before I archived them cause I am lazy.

I would gladly pay for a personal archive.org - a solution that automatically archives each page I visited and video I watched. I guess the required storage amount will be pretty affordable.


> > whether there exist videos which will retrospectively have archival value which are not captured.

> And that’s not really a question: there definitely exist videos that have a certain historical value which were deleted from YouTube, and most of them before I archived them cause I am lazy.

Sure, but you aren't the only one backing up YouTube videos. It seems at least plausible that the aggregate storage capacity of the entire data horder community and their propensity for backing up whatever they come across could result in a situation where if something is interesting, somebody ends up capturing it, right?


There are some communities with a collaborative video index of who has what backed up.


> Neither of these seem to address the issue of whether there exist videos which will retrospectively have archival value which are not captured.

Unless the entirety of youtube can be archived it's safe to assume that there will be something of value which isn't being preserved. It's an unsolved problem and not one Google wants to see solved.


Don't worry, only a very small subset is worth keeping


That’s what you think now.


So true. I look at photos I've taken in the past, and discovered that I took pictures of all the wrong things.


Even now, the most garbage youtube video out there is still probably useful as training data for some AI (maybe even to generate horrible youtube videos)


I have 4PB, but my understanding is that I would fill that mirroring a single day of YouTube at reduced quality. The Internet Archive could surely handle mirroring older YouTube content with a large grant. But the upload rate plus video quality in recent years is definitely cost prohibitive to replicate.


The point I was trying to make is, data hoarders collectively archive a lot of data. They have interests and tend to mirror their interests from the sites. As a result, they can mirror a substantial amount of videos or knowledge from any gigantic site.

I'm aware that even a horde of data hoarders can't archive a drop in the YouTube ocean, but by archiving high-impact channels, they can backup both important and vast amounts of information.


The size of YouTube is likely measured in exabytes. I think it would be hard for any entity that was not organized and well-funded to mirror all of it, let alone make it available in a reasonable fashion.


Note: I would encourage switching over from youtube-dlp to yt-dlp (https://github.com/yt-dlp/yt-dlp)

As I understand it, yt-dlp is considerably faster.


Just curious, why either of these over vanilla youtube-dl? Wondering if I should update my playlist downloader script


It’s far easier to grab an “archival” grade copy of a YouTube video (includes thumbnail, subtitles, metadata, etc) and ask the program to embed all the data within the video file itself. It can even remux all videos into a selected container format, which is really nice.


YouTube-dl has gone slow af ,yt-dlp is fork that solves that ,my guess is some kind of fingerprinting YouTube is employing to throttle YouTube dl


My bad, I meant yt-dlp


Maybe you can escape the lawsuit if you're downloading 1-2 videos for your own use, but you will be sued once you're large enough to matter.

e.g.

https://www.entrepreneur.com/en-in/technology/youtube-attemp...

https://www.makeuseof.com/tag/is-it-legal-to-download-youtub...

https://torrentfreak.com/major-record-labels-sue-youtube-dl-...


The problem is not just closed proprietary systems, but centralization. Usenet is distributed. A server uses store-and-forward and then floods the messages to everyone else, eventually propagating all messages to everyone. Today it's rarely the case, even some decentralized networks are not as distributed as Usenet. If one doesn't subscribe to a node, it effectively doesn't exist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: