Hacker News new | past | comments | ask | show | jobs | submit login
Justin.tv ending archiving, deleting all archived videos after June 14 (justin.tv)
149 points by makomk on May 31, 2014 | hide | past | web | favorite | 74 comments

If the destruction of all this unique user-generated content pisses you off -- and it should -- then feel free to join ArchiveTeam and help save this (and other) content for the Internet Archive. Check out the ArchiveTeam wiki at archiveteam.org, or come on over to #archiveteam and #archiveteam-bs on efnet throughout this week to watch a self-organizing data rescue mission happen in real time. (There will probably be a new dedicated channel spun off for this project later today or tomorrow.)

In the meantime, programs like youtube-dl and livestreamer are your friends.

http://rg3.github.io/youtube-dl/documentation.html http://livestreamer.readthedocs.org/en/latest/

To elaborate, the ArchiveTeam distributes a very simple-to-use virtual machine image that you can run. When you do so, your machine will help download and collect from disappearing websites. I don't think they've got scripts for justin.tv yet, but it sounds like they are going to, so now would be a good time to get your VM ready.

It's easy, and the more people who run the machine, the more power the ArchiveTeam has at their disposal. I encourage everyone to give it a look.

You can get the virtual machine here: http://tracker.archiveteam.org/

I downloaded the VM today and spent a fair while trying to diagnose a CheckIP error. It turns out, after going on the IRC channel that none of the projects on the VM are up to date and hence the VM can't be used to help the project. Hence the error. If I see the website update to mention justin.tv I'll certainly fire it up. But it seems a shame that there is nothing I can do to help with their project as a whole; I couldn't even see any way to donate - I guess I should just donate to archive.org.

It's worth noting that you don't have to use a virtual machine. If you're familiar with compiling basic utilities and installing Python packages, then you can run it without the overhead of a virtual machine. It's generally all on the wiki/Github.

Right, the idea is that justin.tv could be formatted as a new project for the ArchiveTeam Warrior -- which you can run on your laptop or a VM or a cloud server or a few Amazon instances, or whatever. But with only a week's notice before the shutdown, I don't know whether they can get the customized scripts up and running in time. They're working on extracting a list of user and video id's right now.

it has stopped working for me, what about you, it still works ?

Everything seems ok for me. What is going wrong?

It honestly disturbs me that Myspace, Justin.tv, and these other companies don't seem to ever consider shipping their soon-to-be-deceased data to the Internet Archive. Why not help preservation a little? It'd likely be simpler to handle than the oncoming denial of preservation defense that the ArchiveTeam launches. Have there been any examples of companies doing this?

You might think that this is all useless information, but I guarantee we'll look back in likely only a few years and think in horror at all that we've destroyed.

The best art blog about a half decade including a kid who's dead now just disappeared. When Myspace deleted all of their content, I lost all the content of my dead high school friend (turns out you need to back up everyone and thing you care about, not just yourself[1]) and @maxfenton mentioned "The best art blog about a half decade including a kid who's dead now just disappeared.". It's not an isolated problem.

I almost feel like we need a name-and-shame registry and/or public awareness campaign. I'd love to see this written into the user rights of a website when it launches.

"If the website is to be shut down, all publicly accessible content will be sent to the Internet Archive. If you would like to opt out, tick this box."

[1]: http://smerity.com/articles/2013/deleted_digital_tombstones....

As expected, the dedicated channel for the data rescue has now been set up: join #justouttv on efnet. Right now, coders with Python, Lua, and/or Redis knowledge would probably be most in demand, given the existing toolset. Hopefully very soon there will be a new Warrior project formally set up, and then anyone who can run a VM or an AMI or install the tools from the command line would be able to join in to distribute the downloading.

During the Hyves.nl rescue in late 2013, ArchiveTeam did accept donations to cover Amazon instances earmarked for that project, so it's possible they may do something like that again this time.

In case people are curious, the Warrior code and seesaw code are here:

https://github.com/archiveteam/warrior-code2 https://github.com/ArchiveTeam/seesaw-kit

"If the destruction of all this unique user-generated content pisses you off -- and it should"

I'm not sure why it should. When did we get this strange idea that when you make some piece of content, no matter if it's quality or not, it deserves to pollute the cyberspace forever.

I realize this is an unpopular opinion, but one of the reasons nature "works" is that everything old is allowed to dissolve and become the base ground for the new that will come after it.

People naturally copy content that they find interesting. And content which isn't that interesting naturally gets lost. Maybe that lost content stuck around enough to inspire someone to do something in that spirit later or (which also counts), but it doesn't survive itself.

Imagine if everyone gets Glasses one day and starts doing non-stop 1080p video stream of their lives. Are we going to be "pissed off" at that being deleted too?

If you do the stats you'll notice that over 99.99% of the content in archive.org is never accessed. Nobody cares.

And that's how it should be. We should really allow content to die. I have the firm belief that information doesn't just "want" to be free. It also wants to be allowed to die gracefully at some point.

Who's to say what we'll find important down the line? Quite often what seems mundane and not worth the effort to copy or maintain turns out, in hindsight, to have true value. One of if not the most important insights in studying previous cultures comes from going through their trash heaps. Similarly, while they could seem pointless, a lot of this content could provide unique insight on our own culture down the line that 'valuable' pieces saved and curated can't.

See, for example, some of the BBC shows that got deleted because magnetic tape was too valuable to just store popular culture on.

Recently the BBC has been asking people to search attics and sheds for home recordings (taped from tv or radio broadcast) or "lost" tape (probably meant for erasure and reuse that got ahem 'lost' on the way and ended up in a BBC technician's collection).

I find it weird that mega had petabytes of poorly de-duped content (very many copies of the same pirated movie in different rips) yet we're talking about deleting content because, well, because.

Having said that I'm not giving them any money to keep it archived let alone online so perhaps I need to shut up or put up.

People sometimes make copies of things they find interesting, but those copies are rarely accessible to the public. It can be pretty hard to get your hands on something after it disappears from the web. For example, forums.truecrypt.org went down three days ago, it's of interest to some people, and you're going to have a hell of a time finding some guy who happened to launch HTTrack on it before it went down.

Valid points, but you need some strategy for pruning the archive.

We can't just keep everything forever. First it was text files, then html files, then html files with images, and now videos.

Without pruning, the task of archive.org will become impossible, so the question of what's important and what isn't will have to be decided at some point.

I'm currently running through a Peter Norton assembly tutorial from the 1980s. It's already helped me understand the code better than I have before. Should that content have died, just because it's old?

If you'd re-read my post, you'd notice my focus is on the content's quality, and not its age. Age serves as a useful threshold to review if something should stay or not, but it's not the sole criteria.

Important things will be copied and preserved. That's the natural order of things.

Despite I expressed myself very clearly, I expected two types of reactions: knee-jerk reactions like yours, which argue against something I didn't say; and getting down-modded, because the currently accepted wisdom of HackerNews is that we should hoard any type of information regardless of its merit.

Even though it's explicitly against the rules to down-mod to disagree with someone's opinion, every scoring system gets abused this way eventually. I find this sad.

>you'd notice my focus is on the content's quality,

But then you redefined it as 'popular' and 'interesting' content. While something that is popular and holds the interest of people might also be quality content the reverse is not necessarily true. How many quality on maths or chemistry or history or engineering or some remote tribal language or what have you are popular? (Lets assume some arbitrary popularity metric like a million views) Sure you might be able to popularize the 1-2 minute nugget of incomplete/dumbed-down information but I don't consider that quality. And I'm quite sure that you won't be able to actually do anything by knowing some random formula/effect/trivia without the multi-hour lectures associated with their fundamentals.

>Important things will be copied and preserved. That's the natural order of things.

I disagree. We've lost countless important historical documents because they were neglected/destroyed/etc. What you're defining as the natural order is just the 'winners' re-writing history. Its already easy to alter history by altering Wikipedia or buying a news publication and making certain articles un-crawlable. What if Google deleted the cache or Wikipedia did not keep a history of changes? "If you can't Google it, it doesn't exist" is pretty much how the future might unfold. I think its important to preserve even unimportant things.

You might have a point or two in what you're trying to say, but you should really consider the broader argument. Oh and I did upvote you. :)

There's value in preserving the mundane. If we only preserve things which are noteworthy, the cultural artifacts, the special occasions, the celebrations and wedding dresses, it's easy to lose track of the fact that it's worth remembering what everyday life was like, too.

I'm not a Justin.tv user, but it frustrates me when organizations make big decisions like this without much time for their users to react. I can only assume that they're bleeding money and they're trying to quickly cut their operating costs.

Someone should contact the Archive.org and see if they can get copies of all the public archives of these videos.

Justin.tv owns Twitch, which is recently rumored to be in acquisition talks with Youtube for $1bil. I don't think they're bleeding money at this point.

Bleeding money and being in acquisition talks are not mutually exclusive.

But... maybe having a huge archive of user videos and being owned by YouTube might be mutually exclusive.

Maybe not, but it's bad strategy to give away that you're bleeding money while in acquisition talks.

As long as you don't end up like HP and Autonomy:


Oops. Yes, you're right - you almost certainly have to disclose it at some point. Don't take legal advice from me...

Perhaps they want to make this change before the acquisition to spare Google the bad press of "shutting down yet another service".

Yeah, 8 days notice isn't much though at least it's better than Yahoo's shutting down Everyblock with no notice at all. I see Everyblock is back under new ownership but I'm not sure they've got the old data.

IIRC, it's NBC who shut Everyblock down.

A few Defcons ago, Jason Scott gave an excellent talk about this failure mode[1]. Basically, companies host user-generated content for free. Then years later, they destroy it with little or even no notice. Sturgeon's law applies; most of the deleted content is crap. But a lot of it matters to the people who made it. More importantly, a small fraction of it is stuff that future historians would kill for. Collectively, companies that engage in this behavior are burning historical evidence.

1. Archive Team: A Distributed Preservation of Service Attack http://www.youtube.com/watch?v=-2ZTmuX3cog

I find it hard to get too wound up about an only copy that has been turned over to a third party. I get that the user understanding the situation well can be a problem, but it isn't a very careful thing to do.

I like how they phrase this like it's a considerate product decision, while it's essentially them not wanting to shoulder the expense of storage. Without knowing more about their product internals an immediately obvious and important feature getting the axe from this is highlights. Example: I will stream and then go back the next day and cut the video down and take the snippets and post them to youtube. These highlights then lead to interest in the channel which leads to more viewers. Now I need to archive all my stuff locally, and upload it over my mediocre connection. If they limited the archival time window to 4 hours after the end of the broadcast they would have their drastic reduction in infrastructure cost while not totally removing a valid and arguably important feature.

I think that's how Twitch works by default now. They archive the whole stream for like a week so you can Highlight it, and then it's auto-deleted (unless you've configured it to keep everything).

I believe the default now is that nothing is saved, not even for a week. However, each user can configure a setting that does cause Twitch to save the stream for about a week, before it's deleted.

In order to fully archive a stream "permanently", a user needs to view each archived recording individually and select an option from that page to save it forever.

Perhaps they should just keep the archive videos for a week or a couple of days after the broadcast. That gives a window for users to download archived copies and reduces the amount of storage they need on their end. It also means that users aren't forced to generate video archives themselves, and can still rely on the service for this function. Maybe for videos that accrue a certain amount of unique views during the grace period, keep them around in a sort of "Classics" library?

I like the idea of something like 24hour retention. If you just missed a broadcast by say an hour or live in another timezone +12hours then maybe watching it later within the 24 hour grace period would be good.

For comparison, Ustream announced a few months ago that they'd only retain archived video for a month for free accounts.

Twitch.tv uses a similar system

Twitch is owned by Justin.tv

technically it's Twitch Interactive which kind of shows you the company's focus and growth.

So only 1 week notice? Bit harsh. Seems 30 days would be more standard when shutting a service or feature.

It could be a cost elimination.

Likely - also, if true, basically the equivalent of putting a banner on their site saying “We don't value our users!”. Even a minimally competent management team should be able to look at the burn rate and make decisions more than a few days out.

I am sure it is and I can understand eliminating this feature but why not give users more warning?

Could be? I'm amazed they were keeping all that video this long.

This is reminiscent of Google's (and others, to be fair) build-first, monetise-later approach to products. Features get cut, entire products get slashed and users end up losing their data and time invested.

At least Google had acted partly responsibly by providing ample notice and allowing users to download their data before shutting down Google Reader.

In the case of Justin.tv, I would propose they at least take a measured approach and start by deleting videos that haven't received a single view in over a 6 months then work their way from there.

In 2007, this was THE FUTURE, where UGC was going to be the democratizing content tide that would lift all of our boats. Turns out it all devolved into ad platforms anyway, just like 1997.

On the one hand, I have a fair bit of sympathy for the preservationist school of though--and the Internet Archive does great work. I also consider it very fortunate that much (though not all) of Usenet was preserved when it may well not have been.


>We found that more than half of our VODs are unwatched (with 0 or 1 total views), while the vast majority are rarely watched (with 10 or less views).

The reality is that there is an increasingly indescribably volume of user-created "stuff" out there and it's pretty impractical to preserve all of it. And when it's not on a well-known site that's shutting down (think Geocities) it mostly sinks beneath the waves without anyone really noticing. I could probably name any number of online magazines/sites which went away or restructured and whose content is no longer available. I'm not saying that's a good thing but it's hard for me to get too worked up in most of these cases.

>We found that more than half of our VODs are unwatched (with 0 or 1 total views), while the vast majority are rarely watched (with 10 or less views).

Isn't this the standard for content? Anyone know similar statistics for image hoster, url minifiers, youtube, appstore etc?

It seems like they asked the wrong question. It's not about how many people want to watch archived video, it's about how many people want to make an archive of a video. That said, I wonder if the amount of money they save is enough to offset the users they lose to other services.

I think the question they asked was "how much does it cost to host all this video?"

Then for their post, they answered the question they wished they were asked, like a politician does in an interview.

A little over a week doesn't seem like enough time for the news to get out and people to download all their content.

Of course this is financial.

Justin.tv is not a profit center for Twitch Interactive, so they need to cut costs. If 50% of VODs are viewed <= 1 time, and most viewed <=10 times, why not just leave them there? Obviously storing them costs money, but there probably isn't a lot of bandwidth costs associated with it.

I don't have anything against what they're doing as much as the sugarcoating of "focusing on live video"

This also lends more credence that they're getting bought out soon and want to look good on the cost sheets.

and want to look good on cost sheets

Ding, ding, ding. We have a winner!

Seriously, there can be no explanation for the ridiculously short notice other than YouTube said, "You look good, but your costs are just too high." (Or, if not YouTube, somebody else Twitch is trying to get into a bidding war with YouTube.)

Any other reason and management would have looked at the costs and said, "We need to transition out to save money," and provided a reasonable runway for their customers. As it is, this has nothing to do with making things better for their customers.

The page currently says "All VODs will be removed after June 8, 2014. We recommend downloading your recorded videos before the date."

Yeah, I'm not sure why the title got changed to say June 14th.

Oh, that's even weirder. I thought the page changed after you posted the link, not the other way around.

I wonder if it is relate to copyright issues associated with the sale of twitch.tv

As in Google saying they can't deal with these videos without proper acceptance of their terms of service.

The timing seems to indicate they are trying to offload the content ASAP.

From a cost perspective, they are already paying for it and 1 week vs 1 month would not make that much of a difference if it was a "product feature"

If they're cutting things this fine I fear for the future of justin.tv, this does not sound like a well thought out corporate move, it sounds like panic. Imagine coming back from your holiday in two weeks time and finding all your stuff wiped out. Ridiculous.

I know Justin.tv and Twitch.tv are related, does anyone know if this policy will also reflect twitch?

The policy will only affect Justin.tv, not Twitch.

They should only keep content for a week if there is no views.

Having an additional recording is convent, now people have to record locally at the same time as streaming.

Its interesting if you read their FAQ, they are eliminating their premium service which in theory should have included archiving. My first thought was why not charge people for permanent archiving? Guess not enough people are paying to even justify the consumer premium model.

I feel like justin.tv's peak was 5+ years ago, twitch seems to have been their real business in the last 3 years.

I don't see why the decision is so digital. Surely a 1 week hold coupled with the ability to move the video to YouTube (or simply the latter as an option at the end of a live stream) would take away the huge hosting costs but give content creators the chance to save certain videos if they wanted to.

Two week notice for storage policy changes smells like financial concerns to me.

well, to be fair, nearly any storage policy change will smell that way.

It's one thing to say "we're phasing this out over 12 months" (or even, say, six).

Two weeks is freaking desperation.

Agreed, 3-6 months notice is fine. This is not a classy way for them to address the issue.

I wonder what it would take to transload them all to youtube.

Twitch.tv has offered this service for ages, but maybe it started offering it after splitting from the Justin codebase?

Perhaps more than a week?

I understand people need to make business decisions but short notice on something like this isn't exactly a classy move.

That really isn't a lot of notice.

i only use JTV to nerd out in the trek streaming channels and make fun of rikers lack of beard in chat during season 1 TNG

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact