
Archiveteam are backing up SoundCloud - hunglee2
http://archiveteam.org/index.php?title=SoundCloud&utm_content=bufferb94ff&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer
======
rnhmjoj
Someone has already backed it up:
[https://www.reddit.com/r/DataHoarder/comments/6n1pap/soundcl...](https://www.reddit.com/r/DataHoarder/comments/6n1pap/soundcloud_may_only_have_50_days_left_save_your/dk6pou7/)

~~~
ipsum2
This seems a bit ridiculous. 900TB costs $22000 in hard drives (assuming
$100/4TB HDD), without any redundancy. I wonder what their storage solution is
like.

~~~
evook
Almost certainly Google Drive unlimited.

~~~
syshum
And this is how Google Drive Unlimited becomes Google Drive Limited...

Like every other Unlimited service, because people like this abuse the shit
out of it then when it inevitably dies they run around exclaiming "dont blame
me brah they said it was unlimited...., shouldn't have called it unlimited"

~~~
jackvalentine
> when it inevitably dies they run around exclaiming "dont blame me brah they
> said it was unlimited...., shouldn't have called it unlimited"

Well... yeah? If you let people advertise 'unlimited' but they can't actually
deliver, you end up with a kind of market for lemons situation.

[https://en.wikipedia.org/wiki/The_Market_for_Lemons](https://en.wikipedia.org/wiki/The_Market_for_Lemons)

~~~
syshum
Except for the fact that when Google, Amazon and the rest talk about
"unlimited" they are referring to unlimited __PERSONAL __storage of data you
create as a person, this would include backups of your personal computer,
photos, important documents,etc

Not backing up SoundCloud or the entire Internet for $5 a month

~~~
mynewtb
I would love to have my own personal backup of SoundCloud. It's like having a
huge library at home, only not limited by physical space.

~~~
delinka
>... only not limited by physical space.

Well, it still kinda is - the physical drives have to live somewhere... ;-)

------
radarsat1
Their website seems to be down: I'm just wondering, are they downloading
everything accessible through the player, or just songs marked "Download"?

Even given that they could restrict themselves to songs marked okay to
download, how much of that will be DJ mixes containing copyrighted songs?

I'm just wondering because Soundcloud actually has support to specify your
copyright terms, which does not default to "everyone can download this", so
it's an interesting case..

The website works now, it just says "selective content"

I'm sure SC serves up a lot of content per day, but how do you think they will
react by suddenly having someone download all of their 900 TB or whatever it
is in one day? How much will Archiveteam be contributing to SC's downfall by
suddenly causing them a huge unexpected bill?

As someone who _really_ wants the SC content backed up properly, I nonetheless
see how this raises some interesting legal issues.

~~~
cyphar
I think in most cases ArchiveTeam's actions have been copyright-infringement
on some level. They just don't care and find that keeping user content safe
from unrepentant deletion to be more important.

If you've ever seen a Jason Scott talk, he isn't the sort of guy who gives a
shit if you DMCA him while he's sucking up all of your bandwidth archiving
your content two days before your servers shut down.

~~~
schoen
Also, DMCA takedown notices properly order certain kinds of intermediaries
(not people who have consciously chosen to publish something) to stop
facilitating access to allegedly infringing material. They're not properly
directed to an alleged copyright infringer himself or herself, and they don't
order people to stop _downloading_ things, even if their reason for
downloading is apparently to publish the information.

[https://en.wikipedia.org/wiki/Online_Copyright_Infringement_...](https://en.wikipedia.org/wiki/Online_Copyright_Infringement_Liability_Limitation_Act#Take_down_and_put_back_provisions)

This is not to say that people don't send "DMCA notices" for anything and
everything to anyone and everyone, but those notices are not following the
law. (Also, a lawyer can always send a demand letter demanding that someone
stop _any_ behavior, but a properly-constructed DMCA notice gives the
recipient an extra reason to follow it compared to a run-of-the-mill legal
demand—"[a]n OSP who complies with the requirements for a given safe harbor is
not liable for money damages", as Wikipedia puts it.)

~~~
cyphar
Yeah, more likely you'd get a Cease-and-Desist. But once the ArchiveTeam
publishes it, they'll likely get DMCA'd (though they've survived things like
that before).

Not to mention that dying websites or recent acquires are likely to be able to
have the legal muster to start threatening archivists.

------
StavrosK
Hmm, I've been playing with IPFS lately, and just had an idea: Since IPFS is
perfect for archival, Archiveteam could put their files on IPFS, and users
could help out by pinning stuff on their local nodes. For example, I could ask
their website to give me a 10 GB list of files to pin (if I wanted to "donate"
10 GB to them), and I'd keep them available.

The only problem is that I don't know whether IPFS has any way to gauge
availability, so I'm not sure if the team could tell which files were only
hosted by a few people.

~~~
cyphar
ArchiveTeam is putting the files in the Internet Archive archives. While IPFS
is great, I'm not sure I agree it's good for archival because it depends on
the availability of the IPFS network. The Internet Archive does work to make
sure that there are sufficent backups to mean they can recreate their archive.

~~~
mmjaa
As long as there is a single party interested in content hosted on IPFS, the
"IPFS Network" will persist.

It would absolutely be the best move for IPFS to be used in this case - maybe
something like the AkashaApp guys, albeit for audio-media.

Edit: The Akasha App for those who aren't yet familiar with it -
[https://akasha.world](https://akasha.world) \- brings together IPFS and
Ethereum to make a truly distributed peer network for persistent content.

~~~
snsr
> _As long as there is a single party interested in content hosted on IPFS_ ,
> the "IPFS Network" will persist.

I imagine that's one of the reasons why it's not ideal for archival content.

~~~
roblabla
I don't get why ? It's exactly the same today : you need at least one party to
host the data (in this case the archive team). With IPFS however, if more
parties wish to host it, it will lighten the load.

~~~
tscs37
There is no incentive to keep data in the network.

You can pay pinning services, but what's the point if you're just going to pay
someone hosting it?

I'd rather have this archived on Siacoin, Storj, Swarm or any other
distributed network _with actual incentives to keep things around_

~~~
mmjaa
IPFS+Ethereum = incentive. Please do not be so flippant to reject something
until you've grok'ed it sufficiently well enough to argue against it. If you
have looked at IPFS+Ethereum and found it wanting, I'd love to know what
exactly - because from my perspective this is precisely the kind of technology
that delivers your stated requirements.

~~~
tscs37
I've definitely "grok'ed" it.

Which is why I find Swarm a better solution. It is literally IPFS+Ethereum
with additional support for ENS lookups, deniable storage, redundant storage,
etc. This allows for far better privacy and being able to compensate the loss
of parts of the file, both features lacking in IPFS itself.

The current swarm testnet performs, as per my experience, better than IPFS in
terms of bandwidth and latency.

[http://swarm-gateways.net/bzz:/theswarm.eth/](http://swarm-
gateways.net/bzz:/theswarm.eth/)

------
cetra3
I've got a tool to grab all of the songs from your feed. I use this to offline
sync mixes (not individual songs).

[https://github.com/cetra3/rustcloud](https://github.com/cetra3/rustcloud)

It would be an absolute shame if Soundcloud disappears. There has been so much
music I have discovered on this service.

------
fvdessen
Why doesn't SoundCloud want my money ? There are no ads, and no paid plans for
listeners. A lot of songs in my library disappear once the artist gets big and
wants some cash from iTunes. I would have no problem paying for access to
these songs but it's just not possible. I would also like to buy some band
posters / t-shirts, vinyls, cds, show tickets etc, not possible either. It's
like they are actively avoiding revenue streams. I don't get it.

~~~
skymt
There is a paid plan: SoundCloud Go.

[https://soundcloud.com/go](https://soundcloud.com/go)

~~~
fvdessen
Unavailable in my country apparently. It must also be the reason why I don't
get ads.

------
kevinmannix
Shameless plug (has collected a bit of dust):
[https://github.com/krmannix/downcloud](https://github.com/krmannix/downcloud)

It's a node tool built a few years ago to download the playlists of users
through your command line. Might be helpful for a situation where you'd like
to back up your own playlists.

You'll need to get an API key - no sure how feasible that is at this moment.

~~~
mdaniel
No offense, but [https://github.com/rg3/youtube-
dl](https://github.com/rg3/youtube-dl) is without question the best
downloading experience I have ever encountered, and for damn sure doesn't
require an API key

~~~
kevinmannix
Yes, mine is built for SoundCloud, not YouTube or video sites, given the
context of the article

~~~
mdaniel
That leads me to believe you have not used youtube-dl, as it has an
unfortunate name but supports an _incredible_ amount of sources:
[https://github.com/rg3/youtube-
dl/tree/master/youtube_dl/ext...](https://github.com/rg3/youtube-
dl/tree/master/youtube_dl/extractor) including, of course, soundcloud
[https://github.com/rg3/youtube-
dl/blob/master/youtube_dl/ext...](https://github.com/rg3/youtube-
dl/blob/master/youtube_dl/extractor/soundcloud.py)

------
naturalgradient
Archiveteam seems like a really cool project, what I was wondering (and
couldn't find in the FAQ) is who is paying for all the storage? Is it donated
by big tech companies?

~~~
pas
Volunteers, no corps as far as I know.

[https://en.wikipedia.org/wiki/Archive_Team](https://en.wikipedia.org/wiki/Archive_Team)

~~~
rsync
rsync.net donated online storage for the original geocities backup efforts in
2009.

------
skeletonjelly
> Resource Limit Is Reached

> The website is temporarily unable to service your request as it exceeded
> resource limit. Please try again later.

I suppose I prefer an archive over the blog being unavailable

~~~
c8g
>I suppose I prefer an archive over the blog being unavailable

[http://web.archive.org/web/20170717083540/http://archiveteam...](http://web.archive.org/web/20170717083540/http://archiveteam.org/index.php?title=SoundCloud&utm_content=bufferb94ff&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer)

it is interesting that their blog is not good enough for archive. link above
and below are different (first one is updated but second one is not, because
link contains some extra info). they will be able to save a lot of space by
finding duplicate.

[http://web.archive.org/web/20170606104512/http://archiveteam...](http://web.archive.org/web/20170606104512/http://archiveteam.org/index.php?title=SoundCloud)

------
probably_wrong
The more I think about this, the more convinced I am that Archiveteam are
actually detrimental (in the long run) to the well-being of the Internet.

Don't get me wrong, I appreciate the work they do, and without them lots of
content would simply disappear. But solving this problem should be at the core
of the protocol itself (Xanadu, anyone?[1]), not depending on the resources
and goodwill of a single team.

Just like IPv6, I don't think the problem will be solved as long as there's a
patch that somehow works.

[1]
[https://en.wikipedia.org/wiki/Project_Xanadu](https://en.wikipedia.org/wiki/Project_Xanadu)

~~~
Buetol
I downvoted this. Maybe we will get to this point one day, but in the meantime
I think we should all appreciate the amount of conservation work they do for
the future generations. Exactly how NGOs help feed people hoping that one day
the system will be fixed.

~~~
lucb1e
I upvoted this. Regardless of whether I agree, I think it is a reasonable
question to ask / discussion to have, and the downvote button should be used
for comments that don't help the conversation in any way.

------
ipsum2
I'm interested in people's opinions on the legality of this. They mention
"Archive Team considers the SoundCloud service in danger and, as it hosts a
lot of original content, finds it important to prepare to save it selectively
(a full grab would be too big and would raise concerns of mass copyright
infringement).", but how is downloading any portion of artist's music not
copyright infringement?

I've written my own Soundcloud offline audio player, but didn't distribute it
because it was against their TOS.

~~~
jacquesm
> but how is downloading any portion of artist's music not copyright
> infringement?

I had the same issue with backing up Geocities when it went down. I figured
better safe than sorry, established a very easy deletion procedure for the
copyright holders and have received only a very small number of nastygrams
compared to an absolutely enormous number of messages from people that were
happy their content got saved.

So at a guess, yes it is copyright infringement, no, it will not lead to
trouble because most people are able to recognize a good faith effort when
they see it.

~~~
chinathrow
> So at a guess, yes it is copyright infringement, no, it will not lead to
> trouble because most people are able to recognize a good faith effort when
> they see it.

A takedown notice from a few large commercial soundcloud users would probably
be enough, no?

~~~
jacquesm
No, it wouldn't. You just take down _their_ content, as is their right. If
they sue they'll likely lose if they don't first send you a takedown notice
but you are going to have to take that risk.

------
_pmf_
I don't know whether SoundCloud does not have a huge, Wikipediaesque donation
banner on every page illustrating the severity of their financial situation;
it's embarassing for them, but think of it like this: the artists have a right
to know that the platform that manages their life's work needs their support.

~~~
imartin2k
"I don't know whether SoundCloud does not have a huge, Wikipediaesque donation
banner on every page illustrating the severity of their financial situation"

Maybe because they are profit-oriented company that raised hundreds of million
USD in venture capital. Asking for donations would be kinda unethical, and the
founders would know that.

However, if SoundCloud somehow would be transformed into a non-profit
organization...

~~~
sk0g
So you're saying Richard Hendricks' New Internet should be used to keep
Soundcloud afloat?

------
Sami_Lehtinen
Did anyone backup GrooveShark? I had some unique pieces stored there which
seem to be lost forever.

------
chinathrow
What's the current context here? Is SoundCloud going away anytime soon?

~~~
vasanthv
Soundcloud CEO says no its not going away anytime soon.
[https://www.digitalmusicnews.com/2017/07/14/soundcloud-
ceo-r...](https://www.digitalmusicnews.com/2017/07/14/soundcloud-ceo-responds-
shutdown/)

~~~
iagooar
Would they say it if they knew it, though?

------
Steeeve
If these kind of entities actually want to preserve resources, they shouldn't
be generating a petabyte of bandwidth charges. Contact soundcloud and come to
an agreement that will be responsible.

~~~
avian
I imagine a company with its funds running out will not be particularly
interested in dedicating resources to the archive team. Not to mention that
doing that would be kind of signing their own death certificate. It's hard to
argue that you are "on path to profitability" when you're busy handing off
your assets to a museum.

How much does a petabyte of bandwidth go for these days anyway? It might as
well be cheaper than paying engineering, legal and management to arrange for
some kind of off-line data transfer.

~~~
Steeeve
That could cost them more than $50K in bandwidth.

Or if you worked with them, and both of you happened to be in Amazon, that
could be a no-cost transfer.

------
simonhfrost
Anyone interested in backing up their own personal (public) SoundCloud files
will find this tool useful: [https://github.com/mafintosh/soundcloud-to-
dat](https://github.com/mafintosh/soundcloud-to-dat)

~~~
rapnie
Cool!! Based on the Dat project
([http://datproject.org](http://datproject.org)). These guys might become a
good alternative to IPFS, especially if they reposition themselves as a
message-based system, instead of just file sharing.

------
eatbitseveryday
How can tracks be downloaded when many of them are pay-to-access? or stream-
only?

~~~
voltagex_
Generally, if you "stream" (send) me bits, they're mine to do what I want with
(practically, maybe not legally).

Although, if anyone could tell me how to do that with AES-SAMPLE HLS video,
I'd be very happy.

~~~
mdaniel
Can you describe the ways you have attempted, and the bad outcome, to help
focus the answer/search a little more?

The 10 minutes I spent poking around to see what "AES HLS" was made it appear
that a MITM proxy would straighten that problem right out, unless you are in
an iTunes FairPlay-esque encrypted to your iPhone type deal, in which case I
believe the math is against you

~~~
voltagex_
One of the issues is that the stream I have as an example is geo-restricted to
Australia.

[https://au.tv.yahoo.com/plus7/screenplay/-/watch/36206022/sc...](https://au.tv.yahoo.com/plus7/screenplay/-/watch/36206022/screenplay-
thu-29-jun-season-1-episode-1/)

youtube-dl seems to find the stream URL okay, but then prints ffmpeg errors:

[https://github.com/rg3/youtube-
dl/issues/11636](https://github.com/rg3/youtube-dl/issues/11636)

[https://github.com/selsta/hlsdl](https://github.com/selsta/hlsdl) should be
able to do it in theory, but it looks like WideVine DRM is involved.

Someone has a Kodi plugin to make it work but Kodi doesn't support recording
of any kind (?)

------
random_calc
How would you find all files to download without scraping all Soundcloud
pages?

------
philfrasty
Just from a legal standpoint: isn't „...considers the SoundCloud service in
danger...“ slander? Especially since the Internet Archive isn't a nobody (with
maybe some inside information?).

Just remembered cases like Deutsche Bank vs Leo Kirch which are legal
nightmares.

~~~
nextlevelwizard
Good luck suing not for profit volunteer group when you yourself are in
financial trouble to begin with. I'm not saying it's impossible, but there's
not much to gain.

------
imartin2k
I hadn't heard of Archiveteam before, but the fact that HN brings their site
down doesn't make me too confident in their backing up skills :)

~~~
dmacedo
You're thinking scalability. What they do is reliability in archives, and
"team work" (it's in the name) for retrieval. Cheers

~~~
imartin2k
Mostly, I saw the humourous in it, which I tried to emphasize with the smilie.
Didn't land.

~~~
dmacedo
Ah, yes, another good example of Poe's law! :)

[https://en.wikipedia.org/wiki/Poe%27s_law](https://en.wikipedia.org/wiki/Poe%27s_law)

------
omarforgotpwd
That's probably not going to help with their huge bandwidth costs

~~~
anilakar
Ironically their web server is already over limit.

>508 Resource Limit Is Reached

------
majortennis
Aww I like soundcloud

------
CharlesDodgson
Does anyone feel that this is all a bit pointless. Like is there a greater
social need to preserve SC?

I found a lot of the content to ephemeral, things like podcasts or DJ mixes. I
dunno, it just seems a bit silly to put resources on it.

~~~
nextlevelwizard
Someone might have poured all of their creativity to content on that platform
and to loosely quote Jason Scott "this might be the largest audience this
specific line of genes have/will ever reach". Is that not worth preserving?
Even if it just takes a thumb drive worth of space? Even if the content sucks,
it still part of our history as a species. Think of some of the old
works/diaries/letters that are now thought to hold value. Like I've been
lately reading "letter from Seneca". Why are they any more worthy of
preservation than something on SoundCloud?

~~~
CharlesDodgson
i guess it's subjective, some people would think a bank of Vapourwave music on
SC is important. I'm pretty happy to have a 128kbp version of blink 182 albums
on my backup, because they mean something to me. I'm just really questioning
the need to archive for the sake of archiving.

------
streamspeek
I'd recommend transcoding to 96kb/s VBR Ogg Opus during crawling.

~~~
mynewtb
The goal of archiving is making a perfect copy. Transcoding ruins that.

~~~
colejohnson66
Not relevant in this situation, but to be fair, archiving analog media isn’t
about making a perfect copy, just a good one. Also, when old DRM encumbered
games (StarForce, etc.) are archived, the DRM is removed first. So not a
perfect copy there either, but just an as good as you can get copy.

~~~
mynewtb
_as perfect as possible_ then, no need to argue about the 'perfect' phrasing.
It's clear what I meant in the context.

Both your example _require_ transformation, while archiving unprotected
digital media is as easy as making a copy.

------
chazzeromus
Probably not an amazing and erudite source as others have posted but there's
this tweet:
[https://twitter.com/chancetherapper/status/88592122075935539...](https://twitter.com/chancetherapper/status/885921220759355397)

Soundcloud is here to stay

~~~
naturalgradient
I dont understand. Is this supposed to imply this rapper will somehow invest
and personally try to ensure soundcloud will survive?

Or did he call the CEO, the CEO told him 'Yes we will stay'? I mean what is he
supposed to say?

'No, don't bother uploading new songs, we will run out of cash soon, thanks
for the call'?

~~~
chazzeromus
Hm that's a good point. I didn't consider the cynical approach to a CEO's
reassurance, as obvious as it seems.

