Hacker News new | past | comments | ask | show | jobs | submit login
Pirate Bay founder thinks Parler’s inability to stay online is 'embarrassing' (vice.com)
615 points by weare138 4 days ago | hide | past | favorite | 587 comments





Also embarrassing:

https://www.vice.com/en/article/n7vqew/the-hacker-who-archiv...

> donk_enby had originally intended to grab data only from the day of the Capitol takeover, but found that the poor construction and security of Parler allowed her to capture, essentially, the entire website. That ended up being 56.7 terabytes of data, which included every public post on Parler, 412 million files in all—including 150 million photos and more than 1 million videos. Each of these had embedded metadata like date, time and GPS coordinates—unlike most social media sites, Parler does not strip metadata from media its users upload, which, crucially, could be useful for law enforcement and open source investigators.


Someone put together an animated heatmap of Parler photo locations along the Mall throughout the day of 1/6: https://www.reddit.com/r/dataisbeautiful/comments/kvx88n/oc_...

Even better, here are the videos along with their locations:

https://www.tommycarstensen.com/terrorism/index.html


That's hilarious. Some trumpian complaining about furniture in offices while people are homeless on the streets

https://www.tommycarstensen.com/terrorism/pQf5uxtLtxH5.mp4

Neglating to remember his president has been in charge for 4 years

The guy coughing at 1m34 too!

Rioter 1: "They just hit that dude"

Rioter 2: "Yeah because he was being a prick"

https://www.tommycarstensen.com/terrorism/4wIDySD7tKxo.mp4

18 seconds

Woman takes of mask to tell camera "It's amazing". Cameraman says "put your mask on I don't want anyone to see you"


Oh man that last one is right up there with the don’t tread on me lady getting treaded on in hilarity.

35% of the country looks at the videos and says "yeah, must be antifa"

>>Someone put together an animated heatmap of Parler photo locations along the Mall throughout the day of 1/6

Showing that people posted videos from the rally at the Monument and then went to the front of the Capitol buildings. Note that many on site participants reported there was no cell or data service at the Capital, so they were not coordinating with Parler, just reporting.

The heat map might generate hypothesis but conclusions that Parler users or demonstrators as a whole did anything other than asserting rights under the 1st amendment do not necessarily follow from the data[1].

[1]https://gist.github.com/kylemcdonald/8fdabd6526924012c1f5afe...


That graphic is interesting to me, as it illustrates what the view is like at the 3-letter-agencies control centers, who have been slurping up our data for years.

Debatable; this is low hanging fruit.

Turn off or strip EXIF data -- most sites do anyway -- and this wouldn't happen.


What I find astonishing is that—at least according to that heat map—it appears that bordering on 100% of the people using Parler in DC that day we’re part of the riot / coup / insurrection.

This isn’t an app that’s in widespread general use but just so happens to also have a few bad apples using it too. It’s instead almost exclusively used by what would appear to be the most radical wing of the Trump party. Almost every single person using it during that period attended Trump’s speech and/or participated (in some way, shape, or form) in an assault on the Capitol that day.


> What I find astonishing is that—at least according to that heat map—it appears that bordering on 100% of the people using Parler in DC that day we’re part of the riot / coup / insurrection.

These are location from pictures. Of course almost all pictures are of the riots instead of some boring random street in Washington DC.

Even people who are not part of it will take pictures simply because it is a major event and nowadays, every time something interesting happens, there are people to take pictures. You are probably going to find similar heat maps on more mainstream social networks.


> at least according to that heat map—it appears that bordering on 100% of the people using Parler in DC that day we’re part of the riot / coup / insurrection.

Are you sure it's not just a heat map of only those videos?


DC voted 93-5% for Biden. Not gonna be many native Parler users there.

OPSEC 101: blend in, don't look suspicious. By banning these communities from the regular media (Twitter, Reddit, Facebook, Instagram, ...), they need to gather via "anything goes" path where 'freedom of speech' protects their hate speech, such as bulletproof hosting. Which is expensive, and of which all traffic to/from is suspicious by default. Its essentially akin to Bitcoin mixing, or avoiding Monero.

This was exactly my take away from the heat map. If that's all the GPS coords from video taken on Parler that day than it looks to be exclusively used by those supporting/participating/sharing the riot on the US Capital.

Some people stayed just outside of the capital and didn't necessarily do anything wrong.

I mean, look how DC voted in the last election.

The same way it has since I’ve been alive. What’s your point?

that a DC citizen is exceedingly unlikely to be a Parler user unless they're an external protestor or a congressperson (or congressional staff, I guess).


Wow this is a powerful way to word it.

"Authoritarians never believe they're authoritarians, no matter how much censorship, surveillance, jingoism, & imprisonment they demand.

They tell themselves their enemies are so uniquely evil and dangerous - terrorists - that anything done in the name of fighting them is noble."


Indeed. Greenwald helped Snowden; he has the chops to see what's going on. To wit, his subsequent tweet:

Glenn Greenwald @ggreenwald Jan 11

Do you know how many of the people arrested in connection with the Capitol invasion were active users of Parler?

Zero.

The planning was largely done on Facebook. This is all a bullshit pretext for silencing competitors on ideological grounds: just the start.


That tweet seems to be incorrect.

https://twitter.com/nickmartin/status/1349277932531847174

> Hi Glenn, I'm wondering if you would be willing to delete this tweet and issue a correction both on Twitter and your newsletter since a number of the people arrested last week, including Jacob Chansley and Nicholas Ochs, were active users of Parler. Thanks!


Afaik, Greenwald was told the above by Parler CEO. Greenwald then proceeded to uncritically believe that.

They were downloading at 50 Gbps for a while

https://twitter.com/donk_enby/status/1348497204940595201

https://twitter.com/donk_enby/status/1348440720504401921

Also, auth provider (twilio?) removed Parler as a client so for a short while it was possible to create accounts without a phone number (2FA).

https://twitter.com/donk_enby/status/1348298836930867204

edit: okta, free trial, thanks: https://news.ycombinator.com/item?id=25774943


Just a note here, it wasn't Twilio.

It was a free trial of Okta that they were using for their entire userbase.

https://twitter.com/okta/status/1348191370528256002?ref_src=...


How cheap can you be to run your whole site on a trial and fail open if it doesn't work anymore?

I read that they chose to fail open when Okta dropped them in order to stay online for awhile longer.

She is probably liable for that data egress bill.

Also, how does some randomer have ten of terabytes of disk lying around?


Have you ever been to r/Datahoarder or r/Homelab?

10TB fits on one desktop drive, it's completely pedestrian


I have not. I have one 4TB HD in my machine and I've never come close to filling it!

Many people have <1TiB on their machines and are content with it.

Others, like me, have home NAS's which have 10-20TiB, and they're usually close to full.

Fewer people, but a non-zero amount, hoard data, they have 20-100TiB or even more, full on homelabs.

If you're a person collecting datadumps or running rainbow tables, you probably have such a unit. It's not even that expensive really, you can get a 24TiB Pegasus Thunderbolt raid array for <1,000 USD


It's really a matter of multimedia: video, raw format photos, music, ripped movies. That's the vast bulk of the probably 8TB or so I have.

The raw files off my D5600 are roughly 30MB each. If I took 100 photos a day, I would need +1TB every year to keep all of them. The battery is good for about 600 shots. Someone who does photography professionally can easily blow right past that on one job before accounting for their second shooter's files.

Wasn't this an ArchiveTeam Warrior project?

https://archiveteam.org/index.php?title=ArchiveTeam_Warrior

If so it's distributed among many volunteers.

But the data still has to end up somewhere... Archive.org?


the data is being processed by archive team https://www.archiveteam.org/index.php?title=Parler

Grab status: https://tracker.archiveteam.org/parler/

and will be hosted by archive.org


> some randomer

They're not a randomer, they're a person interested in data dumps.


donk_enby wants to be called she/her according to Twitter, unless you're speaking of someone else

I'm not a huge data fiend but I've got maybe ~3-4 2TB drives in my house: a Synology NAS + a spare drive.

Including old desktops and a couple of random external HDDs, I could probably hit 8-10TB easily.

And it's not like it's hard to get more. If I hit a montherlode and I need keep it, it's a 20 minute drive to Target / Best Buy / Walmart for a drive or three. Not as cheap as bulk orders off of Newegg but cost-effective enough to store these dumps.


Given the speed I’m guessing it was towards S3 so there’s plenty of terabytes.

Didn't Parler just get booted from AWS though? Seems odd it'd go right back to Amazon.

Yes probably. Wherever it is, that provider is in an interesting legal position by holding this data.

Also, I assume there is content from EU citizens in there, and so GDPR violations galore.


So people post it on Parlor themselves, it is publicly available, but it would be illegal to download for me? That does not make much sense.

If you were to scape someone's private messages from Facebook - or their private posts - and then post them online en masse ... that may or may not be illegal in Europe, but it ain't ethically white. Grey at best.

Nothing being discussed was private. It was all globally visible.

It might be illegal for you to possess such data after you were asked to destroy it, yes, at least in the EU.

The GDPR talks about "data controllers", and citizens have the right to demand such controller remove their personal data. A "data controller" in this context means you knowingly possess the data and are in the position to make decisions about it. You're not a data controller, tho, if e.g. you run some service that allows users to upload data, without your involvement and direction and also do not decide how to use such data. E.g. amazon would not be liable if people put a data dumb in their cloud (unless amazon used the data themselves, instead of just storing/hosting the data at the behest of their customers).

Even before the GDPR there have been related laws and court cases, like the case that culminated in the "right to be forgotten" based on a decision by the European Court of Justice, which may well come into play here. I also remember a case in Germany, where a women allowed her partner to take intimate pictures of her, then after the relationship ended had a court order him to destroy the material (not a revenge porn case, there was no allegation he ever shared any of those pictures), meaning it's not always about what's public.

I don't know how California's mini-GDPR compares.

Then of course there is still the avenue of copyright law if the stuff is put online. Just because a parler user gave parler the permission to distribute a certain piece of content doesn't mean that everybody else has the same permission. I'm pretty sure Parler didn't make people assign them the copyright (which isn't even possible in some jurisdictions), therefore the people who posted on parler still retain the rights to their content. They can therefore use the DMCA or other similar jurisdictions around the world to demand takedowns.


My understanding is that not all of it was public posts.

As an EU citizen, I can request that a company deletes my data. Unless this data dump is being treated as a crime scene or something, then the holders of this data will need some way to comply with these requests.

Also, now AWS (or whatever cloud provider it is) is holding content that contains racist and/or illegal content. Are they not effectively now just another Parler?


No, they're not. Holding a static, private archive is not the same thing as hosting a live, public site.

How so? Aren't people freely perusing all this data, extracting GPS metadata and reading both public and private conversations?

You cannot be more wrong with this claim, you might not had to deal with this privately or professionaly I presume in which case it is understandable. Parler is liable for taking absolutely no precaution with their users data with whom they had a terms & conditions agreement (in short: users agreed to upload their data for a specific purpose)

Now, ripping that _user_ generated data from the website this way without justified (justified = platform user agreement + legitimate interest) purpose or intent or even agreement and storing / distributing / processing it is the epitome of a GDPR transgression and borderline criminal at least in the EU (saying this as a EU citizen). They are liable. I wouldn't touch that dataset with a 10 foot pole. And I would even less brag about it on Twitter, things we do for clout I guess... :)

I have no stake in this thing, it's just to emphasize that statements like this are what get people and businesses in big trouble. Stay safe! Archive only your own data or data you gathered legitimately. Take the rest up with a lawyer or ... read the laws.


The claim I'm making is "they are not, in fact, 'just another Parler.'" You are pointing out many correct things about their potential liability and problems with holding that data! However, those things are, at most, a subset of the problems with Parler and at worst disjoint problems from Parler's problems. I stand by the argument of "a static, private data archive is not the same as a live, public web service." I did not make the argument of "there are no problems with the static, private data archive."

Agree with the legal implications. I would not go anywhere near this data.

I don’t think this relates to GDPR though. There are some exemptions for personal use which I think there are arguments this could fall into (IANAL). But my opinion is this isn’t in the spirit of GDPR.

There are many other laws that are broad enough in most countries to cover gray area scraping sadly. CFAA in the US for example. This sounds similar to the AT&T weev case.


Maybe the data compresses well and/or a lot of it is redundant (so instead of storing it raw you store it in a database and use relationships to link related pieces of data)?

That's not how you usually measure a database size.

> GPS coordinates—unlike most social media sites, Parler does not strip metadata from media its users upload, which, crucially, could be useful for law enforcement

OH SHI-

This is a colossal mess up, on epic proportions.


Due to the people involved, Parler is almost certainly not a honeypot setup by the FBI, CIA, or some other government organization. However some of the details that have leaked out over the last week made me wonder how little of the site would have changed if that was the intended purpose.

Sufficiently advanced incompetence is indistinguishable from malice.

Hah, this is great. If you don't get the references, it's Hanlon's Razor ("Never attribute to malice that which is adequately explained by stupidity") plus Arthur C. Clarke's "3rd law" ("Any sufficiently advanced technology is indistinguishable from magic.")

Apparently it's called Grey's Law: https://www.urbandictionary.com/define.php?term=Grey%27s%20L...


Does that make it a "quotemanteau"?

Wait... Is that a thing? Are there more of these?


I vote for "quotemanteau" to be officially sanctioned! But I would say the specific case is merely a snowclone[0] of "any sufficiently X is indistinguishable from Y"

[0] https://en.wikipedia.org/wiki/Snowclone


It appears to have the properties of both in this case.

That is to say it's a snowclone where the substitutions come from another quote.

I guess quotemanteaus are a special kind of unique snowclone.

You could take "with great power comes great responsibility" and form the snowclone "with great X comes great Y" and then take the quote "the medium is the message" and use those in your snowclone to make the quotemanteau "with a great medium comes a great message"

Hmm... I feel like I should try concoct more of these. They're fun.


Thank you for the explanation.

More like, sufficiently advanced malice is indistinguishable from common incompetence :)

It's pretty well demonstrated that Three Letter Agencies really like enticing idiots people into fantasy situations well above their competency in order to generate terrorism convictions so it's always a possibility even if there's no demonstrable third party malicious action

And in this case there is demonstrable malicious action.

It would probably have been more secure.

this is true "female body inspector" shirt clientele though

they would have required authentication for their api calls if that was the case...

Besides that, probably not much. =)


Indeed. The level of security failure was pretty incredible. They named media serially (So, pics/1.jpg, pics/2.jpg, etc.) and did not have any validation that you were allowed to access what you were grabbing so it was literally as easy as possible to grab everything. Oh, and did I mention that private messages were also fully accessible?

So Gab's strategy (fork Mastodon) looks solid for security but they hit performance issue because Mastodon isn't made for such scale.

Mastodon scales horizontally until PostgreSQL becomes the bottleneck:

https://docs.joinmastodon.org/admin/scaling/


they needed a platform that would not need to scale before the database server becomes a bottleneck

I am convinced this was an inside job. There is simply no way someone can be this incompetent without willful intent.

Ever worked for a startup? This is what "move fast, break things" does.

This happens at established firms as often as startups.

I have but only at competent ones. Nothing this flagrantly bad.

I’ve seen not quite this bad but definitely in the same order of magnitude

I disagree, incompetance is rampant. I worked for a healthcare company who kept it's data at a Dell security center. One of their people ran a SQL script that deleted millions of billing records. They informed us later that they could not recover the data because every 24 hours they were writing over the one backup they kept. We had missed the window by a few hours.

You take shortcuts. Saying you’ll fix it later. Which never happens because you’re busy on the next feature that is riddled with the next set of shortcuts.

It happens.


Let me introduce you to every early stage startup in the world. Plenty of mature companies also have completely abysmal security practices

I wish I could disclose some of the incompetence I’ve encountered to persuade you otherwise. The reason there aren’t breaches like this of nearly all systems isn’t because most systems are better protected, it’s because no one’s interest (or they’re not interested for the purposes of sharing).

You'd be surprised how incompetent people can be when it comes to security. Nothing i have heard so far would really surprise me for a small startup with very rapid growth.

A career in looking at the guts of companies later I can assure that is very much possible.

A mess up if you intended to protect your users...

Jesus fucking christ i thought it's just some users table but 56.7 terabytes you mofos that's some s3 egress bill!

I kind of doubt I'd pay my AWS bill if AWS banned me.

I think they’ll avoid paying whatever Amazon charges. And I don’t think Amazon will pursue it either. “How about you just let this lawsuit go and we’ll forget about that massive bill you have to pay?”

Considering Amazon built a replacement for mongodb to essentially give them the finger, I'd say it could go either way. If AWS feels like it's worth setting a precident, they may well fight tooth and nail to make them payup.

90$ per TB

Is there still no easy straightforward way in $current_year to put an absolute spend cap of say USD 0, USD 5, or USD 10 per month on Amazon.com web services or Google Cloud Platform?

I’d think I’d like to prepay a fixed dollar amount like USD 200 IF I anticipate some major event but really this is problematic for students. I just want to use the free tier. Why is this so hard?


The common excuse is "billing is heavily distributed and every product does it differently, so it's impossible to implement limits", but yeah, you're probably not the target audience.

A coworker runs a charity site (online monument for Market Garden), and because of its charity purpose, he got some free budget for Azure. He calculated that at the expected usage, that year budget was more than enough to last him a year. Until he activated some innocuous function, and suddenly it ate through his entire budget in a month, so he couldn't afford to stay on Azure anymore.

Yes. A hard cost limit isn't a hard cost limit unless they start deleting resources like databases which has its own problems. One can imagine a semi-hard cost limit that only cuts off stateless services like EC2 and data egress but AWS apparently just doesn't care enough about courting customers like the GP to implement.

I agree that there is some financial risk to using AWS although my understanding is that they're pretty forgiving of surprise bills--at least the first time.


A googler recently commented here that, in order to forgive "bigger" bills, you really need to know someone to pull some heavy strings for you - and that it doesn't happen in many cases.

This taught me to be very very careful with this stuff, and to not bet on Google or Amazon being nice to me when I am bound to mess something up at some point - because with the complexity these services carry, it's no wonder people fall into cost traps all the time.


I do use AWS S3 for storage and it's very cheap for my needs. But I do periodically think about switching to a VPS (or to Backblaze which does let you set a hard cap).

I also help out a small non-profit newspaper mostly operated by students. We're going to need some storage for an archiving project and there's no way in hell I'm going to use AWS for that.


There are also cloud storage providers that do not charge for transfer or egress, etc.

There’s one in particular I’m thinking of…


It's Wasabi S3 https://wasabi.com, and ridiculously cheap for storage.

I don't think so. It would be great though to encourage dabbling.

I am glad that my customers and company pays the bills because they can fluctuate very noticeably.

For private use I still host my own server just because the costs are fixed an I know what I am getting.


It used to be possible to do that with Google App Engine at least.

What's worse is that it costs something like $90 per TB of outbound bandwidth. That is absolute robbery and it is consistent across the cloud cartel.

You're not a customer they want.

I might be the future customer they would want, if I were able to learn with no risk of costly errors what their platform can do and how to operate it.

Free tier gives you a ton of usage across most of the services, but definitely not enough to fully host a side project. But e.g. you can use RDS (database hosting), S3, EC2, text to speech and do whatever you want in a small scale for a year. It's kind of enough to learn for certificates.

I forgot to remove NAT Gateway for a few weeks while learning bits and pieces of VPC usage and it ended up costing me 25$... what can you do.


You're probably still not that customer. You're still concerned about your own money, vs spending employers' money.

they have alerts that can email you if you are approaching a limit but i don't think you can tie that to some automated shutting down of your stuff (i might be wrong i haven't worked in aws heavily for about 2 years now... believe others with more recent exp if they comment otherwise) tbh i kind of get why it is like this, aws is a business and they are only looking for your money. responsibility for keeping your pants up at aws is on the admins.

I think you can tie alerts to automated actions that you write although I've never done so myself. Of course, that requires anticipating what could happen and what you're OK with taking down.

This is the proof to why I strip all metadata AND filename (esp. photos from camera include timestamp in filename) before uploading it to share one social medias.

Camera vendors are perfectly capable of storing a lot of metadata in the jpeg encoded image itself, in an steganographic way. If you want to really be sure, crop your image to a position that is not multiple of 8 on any direction, then scale it by a factor very close to 1, add some noise, and re-compress it again. Some steganography is yet robust to that, but only the really fancy stuff.

Anyhow, if your image shows recognizable landmarks with shadows, then it will be feasible to recover the exact point of view and the time of acquisition.


Use image pool to train a GAN, publish the GAN images instead of the real ones.

> Anyhow, if your image shows recognizable landmarks with shadows, then it will be feasible to recover the exact point of view and the time of acquisition.

With what level of accuracy? Are you claiming you can confidently assert 2020-12-21 15:12 over, say, 2020-12-22 15:15 via shadows


> confidently assert 2020-12-21 15:12 over, say, 2020-12-22 15:15 via shadows

Not at all. By visual inspection you can see whether it's the morning or the afternoon (if you know the orientation of the landmarks). I guess by image processing you could get a precision of +- 1h maybe?


How precise is a sundial? How high is the resolution of the camera? I'd imagine that on a bright sunny day with a high resolution camera showing clean sharp shadows, it'd be possible to get a precision much better than +/-1h.

Sure, I was thinking about a photo of human subjects in the foreground, with some distant buildings in the background. But if you photograph the shadow of the Eiffel tower on a clear sunny day you may be down to the precision of a couple of minutes.

But that's only if you can resolve the "which day" issue?

Oh, yes, that seems a hard problem. But with sufficient effort (cloud patterns, parked cars, other people in the image) there is still a lot of information to extract. Especially if other people were taking exim-tagged photos in the same time and place and putting them on instagram.

The arc of the sun through the sky, along with observable weather and variability in air quality could let you pin the date down with surprising accuracy. Two neighboring dates with similar weather would obviously be very hard, but even variation in cloud patterns could be enough to make an id.

The problem with that idea is that there would have to be a stenography standard and then it becomes easy to defeat.

Of course, the idea is to keep it secret; a "stenographic standard" seems absurd. If they are embedding this information, it will not be readily detectable by most people.

iOS now have an option to do this natively. But I find it bit convoluted. My go to app for stripping out metadata from photos is Exif Viewer.

My camera apps never include GPS coordinates to begin with (which is enough stripping for me, but then again I'm not part of Qanon and all that).

I'm curious where someone just gets 56.7 tb of storage that quickly.

Im more curious how someone pulls down 56 terabytes in a very short period of time without sysadmins at parler noticing. I'm surprised they didn't unintentionally DoS them.

Parler used AWS and AWS is always happy to serve any request without problems and notify you about it hours later if you decided to create usage alerts.

> and notify you about it hours later if you decided to create usage alerts

and bill you at the end of the month.

This is the crucial part: AWS will serve whatever is request. It brings them money.


Sorry for OT but what about a simple small time private AWS user like me who uses it for static hosting my blog and stuff.

Does that mean someone could request a lot of files, over and over, to increase my bill? Or is that served by some cache?

Let's say for example I host Mastodon media on S3, about 30G of unique data. Could someone use that to increase the cost of my bill?

I do of course have a budget alert set but I have to react to that alert too.


AWS can really sting you. I've completely stopped using it for private projects after receiving a $100 bill for something I accidentally provisioned through the command line.

Not only is it very easy to spend real money by accident, the web interface makes it incredibly hard to work out what you're spending money on (it took me a long time to even understand what I was paying for, and then longer still to turn it all off).

Even if you know what you're getting in to, the pricing is very misleading. A few cents an hour adds up if its running 24/7, and its never obvious whether you're in the free tier or not.

If you don't have deep pockets, be very careful with AWS.


The first thing I do at companies I join is dig through common services and tag all resources in a consistent way(ie. app=web-frontend). Then you can create resource groups and breakdown billing at an ‘application level’ through cost explorer, instead of relying on their default, very general filters. Not perfect, but it gets you 90% of the way towards understanding where your costs are.

>>tag all resources in a consistent way

It amazes me when I get pushback for that, but it's to be expected inside toxic culture corporations. I'm kinda disgusted with AWS now so I'm looking to re-tool. The dangerous defaults and concern about arbitrary or uncaring TOU enforcement should be an impetus to diversification of service provision.


AFAIK yes. This is why I never get people that use AWS. DigitalOcean or any other VPS provider for that matter gives you a flat monthly rate so you know there will never be any surprises. Why take the risk?

There are genuine architectural and cost benefits for some type of configurations. But you really need to be an expert (or team of experts) to identify those situations, then architect and configure appropriately. Where it bites people is the "AWS by default" mentality many folks have (after a decade or more of lots of positive press) without understanding what they're using or why they're using it. Many people who make these decisions are shielded from the direct impact of any cost overruns too, to there is less reason to be sensitive to that. Almost any time I've worked with orgs using AWS, any reference to cost is "an engineer is more expensive!". Which is sort of true, but there's also typically no way a company could just accidentally hire 27x more people than they budgeted for in a single day, or that a rival company could force-hire those engineers in to your company without you knowing about it, sticking you with even just a day's cost for 50 engineers, for example.

This is always a possibility so you take steps to protect yourself. If it’s a few static assets, put a CDN in front of it. You won’t be charged extra by S3 because the CDN would cache it.

If you’re hosting Mastodon, then I assume you’d take steps to ensure that only an authenticated user can access any data. And that user would also need to be authorised to access only specific data. And that authenticated and authorised user would be rate limited so they couldn’t scrape everything they have access to easily.

If you do all these things, you’ll be fine.


Most CDNs also charge for transfer.

If you search HN for aws bill https://hn.algolia.com/?query=aws+bill

there are a lot of interesting things that can go wrong.


I believe the person that pulled it down is a digital archivist. I’m sure she has plenty of storage laying around for such occasions.

> laying around

ITYM lying around, unless this is a quirk of US English. Sorry to be pedantic!


> unless this is a quirk of US English

It is indeed. Very common in colloquial speech around here.


Presumably just another s3 bucket?

Do all your transferring from an EC2 instance in the same region and it never needs to waste bandwidth going over the public internet anyway.


Or local storage. The DataHoarder subreddit where a lot of similar efforts are coordinated has a lot of info about building dense home storage on the cheap

You can get that in 5 drives from best buy these days. Not exactly a huge leap for cloud storage.

I got two 90 TB servers that I pay a small amount of peanuts per month at Hetzner to server as backup servers. As long as you stay away from the cloud, storage is dirty cheap.

AWS.

If not them because in you're worried they'd also shut you down than probably BackBlaze.

Could also just buy a bunch of fairly cheap 100 mbit unmetered boxes off OVH/Kimsufi for a total cost probably of ~$300/m.


Modern hdd's store up to 18tb.

I saw 6tb hdd's for €114 on my local site, 16tb hdd's for €370.

It's not exactly cheap, but if you're doing it for a serious project like archiving an entire politically relevant social media website, I'm sure you'll have 1-2 thousand eur lying around for a couple of hard disks


Crowdsourced. The crawling and downloading was able to be coordinated and performed by a bunch of people at the same time.

All the content was hosted on s3 and you just needed the URL's, security by obscurity.

The storage isn't that much of a deal, but I bet that was not on some cheap consumer Internet subscription as an ISP would have throttled her into oblivion after the first few TB.

work gave us google drive accounts with "unlimited" capacity

You'll find out what 'unlimited' means in SaaS speak very, very much sooner than you expect if you tried really to utilize it.

You'd be surprised. Google has a lot of storage lying around, it takes them a while to notice it being used up. I know of at least three people who on the order of petabytes of data on Google Drive.

They took action against one organisation that had multiple 10PB+ users but Google really doesn't seem to care that much about the tertiary institutes giving unlimited GDrive accounts to every data hoarder who pretends to enrol.


Not at 50TB though.

except you can only upload 750GB a day without somewhat workarounds

You can. Create multiple service accounts, add them to the Shared drives and connect the service accounts using Rclone, a CLI tool that allows you to perform I/O operations on multiple cloud storage.

Once a service account reached the limit, switch to another.


"without somewhat workarounds" sometimes I wonder about people.

What exactly is the feed Amazon was getting from Parker from that much stuff? Like it needed just as much bandwidth if not more and dedicated boxes for software to run on. It wasnt cheap, or was it?

I don't have a good source on this, but I saw $300,000/month bandied about social media the other week.

> The Hacker Who Archived Parler Explains How She Did It (and What Comes Next)

I'm really glad she did that. I'm fine with all this stuff getting taken down, but it really needs to be archived somewhere for historical purposes, to help understand this moment.

Given the kind of media and political impact Trump's tweets from @theRealDonaldTrump have been, I really hope they're archived at NARA along with the @POTUS tweets. They're legit historical primary source documents.


> "I'm fine with all this stuff getting taken down"

I understand this on a visceral level, but I wish more people would look beyond that to the implications for communication on the web. This action by Amazon happens to correspond with what I think is right and just on first iteration, but what principle prevents Amazon from arsing some other group that we agree with?


> what principle prevents Amazon from arsing some other group that we agree with?

Honestly, none. It's their business and they can handle it however they want.

What you can do (and this is exactly what Kolmisoppi was suggesting) is build your platform to work without relying on other people's business.

I'm happy that companies like Amazon don't want to get associated with people who organized a failed coup. That should be the bare minimum. But there is no law which forces you to be hosted on Amazon if you want to be on the Internet. You can self-host. You can buy/rent servers in another country, where what you are doing doesn't have direct consequences which might lead people to want to get away from you. Use the blockchain, use torrent, develop your own P2P protocol. Those people just got locked out from the easy way, something they should have expected to happen (and plan for) since day one.


>build your platform to work without relying on other people's business

You can't although you can, of course, mitigate.

But are you OK with just a PWA on mobile w/o the Apple or Google stores?

And your platform is ultimately dependent on a network connection, probably CDN, domain registrar, DNS, etc. Those are a pretty high bar to get kicked off but you're not immune.

(And, yes, there are things like Tor and jumping around providers if you have a fairly lightweight web site--like most torrents--but that doesn't help you if yo have a site with many TB of data catering to unsophisticated users.)


These huge Internet companies really should be regulated. Last weeks we've seen these companies, Amazon, Google, Facebook, Twitter, Apple act like governing bodies by banning Donald Trump from their platforms and essentially shut down a business (Parlor). This is problematic because these companies have monopolistic power, not only in the US, but also internationally.

Only POTUS can silence POTUS.

POTUS has the biggest bully pulpit in the world. whitehouse.gov, daily press briefing, C-SPAN, etc.

No third party should serve as alternate bully pulpit, allowing a democratically elected leader to speak directly to their audience, bypassing the fourth estate.

I'm not saying Twitter was right to shut down POTUS. I'm saying never should have hosted POTUS in the first place. Further, no leader should be allowed to speak as a private individual.

If Twitter wants to feature POTUS, then let Twitter attend the daily presser along with all the other reporters and journalists.

Everyone is responsible for allowing this undemocratic violation of norms. Social medias are just the one that profited most.


In my opinion the internet was already seriously flawed.

This makes people talk about how they think it should work, not just how it works right now, which I think is exactly what is needed.


> what principle prevents Amazon from arsing some other group that we agree with

Personally I found Amazons response to Parler convincing: https://cdn.arstechnica.net/wp-content/uploads/2021/01/gov.u...

The question is, why should a company in Amazons situation be unable to turn off that service? As in, your question assumes that Amazon turned off Parler out of nowhere for no reason other then "we dont like them". It assumes they did not had documented reasons, documented attempts to convince Parler to comply to TOS etc.

Otherwise said, contract.


> what principle prevents Amazon from arsing some other group that we agree with?

The principle of Amazon not wanting to piss off all of their customers and several government organisations. If you're this tentative about something you explicitly agree is just, then clearly you (and millions of others) are going to react pretty harshly to Amazon unilaterally deciding, e.g., that all mentions of Belgium should get scrubbed from the platform.


>Belgium

I mean, it is the rudest word in the universe.


>> "I'm fine with all this stuff getting taken down"

> I understand this on a visceral level, but I wish more people would look beyond that to the implications for communication on the web. This action by Amazon happens to correspond with what I think is right and just on first iteration, but what principle prevents Amazon from arsing some other group that we agree with?

It's kind of predictable but still disappointing that this was the part of my comment people chose to discuss with a 43-comment thread. It was the least novel and interesting idea in it.

But to your point, there's a lot more "looking beyond" than just that. There also needs to be a lot more looking beyond rather limited fundamentalist views of free speech, which tend to abrogate other fundamental rights and be so short-sighted that they actually bring discredit to the values they try to protect.


The only principal that protects that kind of group is that we agree with it. It’s not much, but it’s something.

If only there were some principle or value that would somehow... allow us to express opinions even if others - even powerful people, or even the majority of people - disagreed with us. Some kind of inalienable right. Hmm. It is a mystery.

You can express opinions all you want, but you’re going to have to do a lot of work in order to justify compelling someone to do business with you after you say something they don’t like.

Like, amazon isn’t actually the basic infrastructure of the internet. You can build web sites and apps without involving them. So why would a law force them to do business with you?


I try not to do business with companies that censor on political grounds. I don't like politics being banned in a professional context, but I slowly see no other solution.

The political alignment of many Parler users isn't a secret and if they decide to measure skulls again, I might be in trouble. Still, I don't see them as a relevant threat at all. It is even more ridiculous as the terrorism scare.

However, the actions of SV social media sites and hosts like Amazon censoring unpopular opinions and content outweigh that danger from neo-nazis by magnitudes. These groups have absolutely no political power in the 21st century. They have more influence than a decade ago, but that is mainly due to their ability to reinforce their prosecution narrative and martyrdom.

On the internet there are calls to violence in any political group, even vegans and cat lovers. We ignore that because they aren't relevant. But if you seriously crack down on cat lovers, you might need to expect real violence.

Either there are rules and principles that are valid for everyone or there are none.

Some say people get influenced by far-right propaganda. Far-right groups think in the same way in that they believe everyone not on their side is an "NPC".

I think Dorsey and Zuckerberg handled it relatively well in the grand scheme of things. Their latest ban attempts were over the top though. I know that some people might need help instead of a Twitter account, but that is beside the point. They set the precedent for countries to suppress their opposition and for state propaganda. I think Uganda is one of the latest examples.


These are Amazon reasons for why they shut down Parler as response to Parler: https://www.courtlistener.com/recap/gov.uscourts.wawd.294664...

The reasons are not "they were too much right wing for us". They had multiple reasons, but it is not that long to read.

> On the internet there are calls to violence in any political group, even vegans and cat lovers. We ignore that because they aren't relevant. But if you seriously crack down on cat lovers, you might need to expect real violence.

And vegans and cat lowers do seriously take them down in their forums when those cross the line. And when they discuss cats and food on reddit and reddit deletes accounts of those who threaten violence, the vegans and cat lowers are happily continue to discuss cats and food.

This is bad analogy.


A vegan parler calling for the execution of meat eaters would be banned just as well, don't you think?

I have seen countless comments from vegans that said meat eaters should be butchered instead of animals. The comments didn't get deleted. An no, I did't want to complain about them, venting is healthy and without victims on the internet.

> I have seen countless comments from vegans that said meat eaters should be butchered instead of animals. The comments didn't get deleted. An no, I did't want to complain about them, venting is healthy and without victims on the internet.

That's the difference between merely saying something, and saying something with the likelihood of actually carrying it out.

Parler would still be around if there not been an mob attack on the capitol connected to it (and no legitimate fears of further mob attacks on inauguration day, etc.) If vegans actually started and organized butchering of meat eaters, I'm pretty sure the forums where they planned such things would get shut down quickly.


I didn't say anything about "forcing Amazon to do business" with me. If that's your solution, it's a non-starter.

I don't have a solution, but Amazon + Apple + Google coordinating to shut down a platform for communication - even a platform that contains expression that I strongly disagree with, I might add - is a problem that requires a solution

This is what I wish more people understood now rather than later, but at some point everyone will understand that it's a problem


Isn’t the solution to avoid cloud computing, stack overflow style?


If you come up with a good solution, let us know. Until then, I’m happy to support antitrust action against apple/google/amazon/facebook - but for more pressing reasons than the deplatforming of fascists attempting to overthrow a fair and free election.

> more pressing reasons than the deplatforming of fascists

Hint: it's the identical problem to deplatforming BLM or Antifa or Occupy or the Proud Boys or pick anyone you agree with that annoys powerful people. "Having the better politics" will not protect you. "Oh they're just fascists" will not protect your peoples.


But they have been doing that since forever and nobody was complaining. Here's a link to when twitter deleted 125,000 ISIS accounts [1] (and they had been monitoring for islam extremist content for much longer already). So I ask why this uproar now that they are deplatforming fascists?

[1] https://www.theguardian.com/technology/2016/feb/05/twitter-d...


I suspect the difference is more about deplatforming americans than any political view.

Except you need to replace “annoys powerful people” with “storms the seat of our government with intent to overthrow a democratic election result” I’m fine with that being the line for companies to ban a political entity. Amazon and google aren’t randomly banning BLM for protesting in cities.

Essential Utilities are governed in many places by Universal Service Directives ensuring private companies can not willy nilly choose to deny them as they see fit for commercial or other reasons.

I am familiar with that concept. Good luck arguing that aws, google play, or the app store is an essential utility.

Personally I'm fine with Google and Amazon doing this. AWS isnt the only web host and you can sideload apps on Android.

Apple on the other hand is a lot more murky due to their lockdown of app installs. I don't believe they should be made to host things they don't agree with but people should not be prevented form installing whatever software they want on their own hardware.

For the record I am a staunch free speech advocate but fully believe that no-one (no-company either) owes you (the royal you) a place to use your free speech on their property.


> I don't have a solution, but Amazon + Apple + Google coordinating to shut down a platform for communication - even a platform that contains expression that I strongly disagree with, I might add - is a problem that requires a solution

Did they actually coordinate [1] or just come to similar conclusions based on similar facts, in the context of the same cultural zeitgeist?

[1] e.g. Bezos, Cook, and Pichai (or subordinates) on a conference call, deciding what to do


I wouldn't describe it as coordination, but I think it's relatively safe to assume that at the very least whichever companies were slowest to react would have had their decision influenced by the earlier actions of other large companies. So there probably is somewhat of a snowball effect there.

How long ago was it that the 'cloud' vendors proclaimed/advertised that 'In the old days everyone had their own generator, their own well. Now just like electricity or water, you have computing on tap'. Sounds pretty much like a public utility to me and thus should be regulated as one.

Just because I describe something as essential when trying to sell it to you, doesn’t make it actually essential.

I do believe some level of internet serving access is a public utility. But i doubt that level is cloud hosting.



After reading these beautiful articles I'm sadly wondering if we've already put the golden age behind us.

I will dig a little deeper to support archive.org[1] this year. I'd love to run a mirror.

[1] https://archive.org/donate/


> but what principle prevents Amazon from arsing some other group that we agree with?

Parler had people discussing murdering Congresspeople and didn't do anything about it. No matter which faction of "what level of free speech is acceptable" one subscribes, this is never acceptable and it is no wonder that Parler got booted off.


> this is never acceptable

I want you to understand what I'm about to say. Understand it in your bones: it is unacceptable to me and to you and it is reprehensible; and beliefs that you hold dear will someday, by someone, be seen as unacceptable and reprehensible. To protect the speech of the reprehensible is to protect your own speech. That's what I would like you to understand.

David Goldberger didn't defend the rights of Nazis to march in Skokie because he is a Nazi, but to defend his (and all of our) rights.

https://www.aclu.org/issues/free-speech/rights-protesters/sk...

The people who planned actual murders and crimes must be caught and punished. But if "planning and executing crimes" is the standard by which platforms should be shut down, then Facebook also should be shut down.


> But if “planning and executing crimes” is the standard by which platforms should be shut down

Planning an executing crimes is the standard by which those doing so should be shut down. Ideally, by the actor as close as possible and able to do so with minimal collateral damage.

But a second-order platform that determines a first-order platform is systematically incapable or unwilling to do that does not act improperly in cutting service to the first order platform .


> But if "planning and executing crimes" is the standard by which platforms should be shut down, then Facebook also should be shut down.

That's fine by me tbh


Not all crimes are created equal. It is illegal to use clotheslines to dry clothes in New York City. A protest to remove the law may involve using clotheslines. Would you deplatform for that?

Note, the reason why it is illegal goes back to preventing protesters from hanging up their banners / messages. An act to legally silence opponents.


> To protect the speech of the reprehensible is to protect your own speech. That's what I would like you to understand.

Actually, even in the US there are exceptions - namedly when speech is likely to incite crimes: https://en.wikipedia.org/wiki/Imminent_lawless_action


Indeed! We weren't talking about the 1st Amendment, but let's go where you lead: speech likely to incite imminent lawless action is illegal. Courts decide if that applies. Someday you might say "Politician Y should be hung by his thumbs". Politician Y and friends try to convince a judge that you thereby incite imminent lawless action. Of course, you were just feeling passionate, and no one reasonable would believe that you meant it. Thankfully, due to strong 1A protections, the case would not proceed.

But if you wrote that on a site hosted on AWS, there is nothing in principle from Amazon taking the platform down. Politician Y calls his buddy Jeff Bezos, and fwoomp! Gone. This should be concerning.


Is that what we've learned here?

It seems more likely that we've learned that if I said that quote -- and then I personally tried to hang the politician by his thumbs and got caught because I'm a cartoonish moron. Then, when charged with attempting to hang the politician, I said "I was just joking" even though I had some hanging equipment and I was in a giant mob full of other people all chanting to hang the politician and many of us had guns and anyway I also broke into a locked building to do the hanging -- and then it came out that large swaths of the content on the same site was people making those kinds of threats, and then it came out that the site operators didn't care to remove the threats because they had a moral opposition to moderation, and then it came out that AWS had contacted the site owners many times to implore them to remove other illegal content, and then the site operators, rather than removing the content, gave press interviews where they boasted how they were invincible and didn't care if AWS took them down... then it's probable that AWS would take the content down. And this doesn't concern me at all. Lock me up, in this hypothetical, and lock up the people who enabled me.

I ran a legacy website that once got spammed. My host contacted me because one of the spam things was an ad for a website hosting stolen credit card numbers. They gave me 24 hours to take down the content. This isn't because they're censoring math and they're using their monopoly power to prevent numbers from being posted, it's because stolen credit card numbers, provided for the purposes of credit card fraud, are illegal and they didn't want to do business with me if I wasn't willing to remove the content.

And I also don't see a further problem with using the posture of the site operators + the site itself to make a judgment call about whether the content in question is an aberration or intentional. If someone posted a magnet link of pirated content in Hacker News, I wouldn't presume per se that Hacker News was a piracy website because I can facially see that the site is general purpose, and also because I can see that the site has a general moderation policy that signals it is willing to comply with legal requests. But that doesn't mean that ThePirateBay can credibly argue in court "We had no idea our site was used for piracy, and if you ban us, you have to ban Google, because they crawled us."


> But if you wrote that on a site hosted on AWS, there is nothing in principle from Amazon taking the platform down.

The 1st Amendment (and its European equivalents) usually only bind the government, not private entities.

There is nothing per se preventing you (or Parler) to build their own datacenter or use another hoster - there is no "human right" to be able to use AWS. However, what still remains is that every company has the right to refuse service to an entity that is suspected of criminal activity - and the onus is on Parler to prove they will not serve as a planning platform for criminals.


> But if you wrote that on a site hosted on AWS, there is nothing in principle from Amazon taking the platform down. Politician Y calls his buddy Jeff Bezos, and fwoomp! Gone. This should be concerning.

Why should it be concerning at all? There are a few thousand other hosting providers they can go to.

The slightly stronger argument seems to be that the people making these death threats on Parler should not have been banned from Twitter/FB in the first place. FB/Twitter have far less competition and most of their value is in the network effects they've established.

The tech giants should absolutely be broken up/more heavily regulated, but I think there are much better examples of why than Parler.


> The slightly stronger argument seems to be that the people making these death threats on Parler should not have been banned from Twitter/FB in the first place.

That's not a strong argument at all. Spouting death threats gets you arrested if you do it in public (and someone records it or calls the cops), so it should also lead to a time-limited or permanent ban from social networks.

Social networks are not a free-for-all zone.


> That's not a strong argument at all.

I don't think it's a strong argument either, I just think it's slightly stronger than the one people are trying to use for why AWS should be forced to host Parler.

> Spouting death threats gets you arrested if you do it in public (and someone records it or calls the cops), so it should also lead to a time-limited or permanent ban from social networks.

The argument from people coming out against the moderation of these threats seems to be that FB/Twitter etc. should only remove this content after receiving some sort of court order/government mandate to do so, that they shouldn't "play cop" as it were. Personally this seems like a pretty stupid take.

> Social networks are not a free-for-all zone.

Some people are arguing that the should be, that they should be treated as the equivalent of a modern town square. I don't necessarily agree with this take, but I can see why people might think that way.


> Some people are arguing that the should be, that they should be treated as the equivalent of a modern town square

A modern town square isn't a free-for-all zone either. Try to go and shout "fuck <n-word>" or raise the arm to the Nazi salute in an area where people of color live and you'll be lucky to escape with a minor beating.

Town squares are a form of societal self-preservation too - unruly elements get dealt with, either by the people themselves or by the police.


"Parler had people discussing murdering Congresspeople "

By that criterion we should immediately close down Twitter, Facebook, Reddit, ...


You missed one key difference, moderation, Parler straight up refused to remove these kinds of posts.

Parler just got ejected from the internet. Is it totally unthinkable that they would have adjusted their behaviour and moderated whatever Amazon had told them to if given, say, 2 weeks notice? They probably would have, then transitioned off AWS in an orderly fashion.

I mean, AWS basically took their business under false pretences. AWS wasn't trying to provide them a service, it was trying to kill their company.


Parler knew the terms when they signed the contract and didn't bother to adhere to them. They had months of notice. AWS did wait until Parler became completely toxic to enforce the terms, but they had lots of cause available.

Reddit has banned a lot of toxic communities over the years to avoid negative repercussions: https://en.wikipedia.org/wiki/Controversial_Reddit_communiti...

Same for Facebook and Twitter.

Parler, however, did nothing even as people went and publicly called out criminal acts happening on the platform.


A different take on Parler's connection (or not) to the events of 6 Jan 2021:

https://web.archive.org/web/20210112145206/https://greenwald...


[flagged]


I gotta say I read your comment and I immediately knew whose byline would be on the other side and I wish I could say I was wrong.

Would you mind expanding on this? Something in this article prompted you to write this, but I honestly don't see what. Can you explain it like I'm 5?

Which of the things he said are you asserting are inaccurate?

That's just, like, his opinion man.

Let's see what comes out of this data dump before getting too sanguine about loss of "free speech".


> "free speech".

On this topic, I've seen reports on Twitter and Reddit of people who don't fit Parler's "prescribed worldview" being banned.

Not so free after all.


How much free speech do you have on this site? Why don't you try agitating for violence here and see what happens.

After all, it's just speech right?


I would expect to be flagged, and then banned, and rightfully so.

I was just pointing out that Parler is nowhere near as "free" as it claims to be.


free speech is a human right, but it's limited (in the declaration of human rights, maybe read it?!) to things that dont hurt public peace, that dont negatively impact someone else's rights, etc.

To be clear, I am not defending Parler, at all.

I was actually condemning them for not even living up to the standards they claim to profess, in addition to their other issues.


I got that now, I misunderstood you, apologies. I was just really excited to share this knowledge!

Fair enough, it's a pretty cool, not very well known piece of info!

Didn't they recently take down Trump affiliate Lin Wood's post about executing Pence?

I personally don't find their ability to remain online that surprising.

The Pirate Bay and other torrent networks were built by people with a passion for building, maintaining and hacking things. People who, even without a solid CS background, would spend hours a day learning new things, developing distributed protocols, evading DNS blocks and hosting their content wherever they could to make it accessible - included the small server in their own garage if needed. And they are used by people who don't mind learning a new protocol or how to use a new client to get the content they want.

I don't see the same amount of passion for technology and hacking among the Parler users, nor its maintainers. Those who believe in conspiracy content are people characterized by a psychological tendency to take shortcuts whenever they can in order to minimize their efforts in learning and understanding new things. So when the first blocker hits they usually can't see alternative solutions, because it's not the way their brains are wired. They always expect somebody else to come up with solutions for them, and they always blame somebody else when the solution won't come. And even if they decided to migrate their content to the dark web or on a Tor network, not many people will follow them - both because they don't have the skills, and because they don't want to acquire those skills. Plus, they'd lose the "viral network effect" that they get when posting click-bait content on public networks, the new censorship-proof network will only attract a small bunch of already radicalized people.

And even if they wanted to hire some smart engineers to do the job for them, we all know that engineers tend to swing on the other opposite of the ideological spectrum. Those who have built systems for escaping REAL authoritarian censorship would rightfully feel disgusted if asked to apply their knowledge to provide a safe harbour for rednecks to vomit their conspiracy-theories-fueled hate.


> The Pirate Bay and other torrent networks were built by people with a passion for building

Also by people who know that what they were doing was straight-up illegal in a lot of countries, and grey-area in a lot of others. So this was a real risk.

Parler on the other hand, at its core was just a social network, and if you look at the founders/owners, they have a very disconnected interpretation of "free speech", so they were clearly thinking nothing bad could happen.


[flagged]


That saying “hey, let’s violently overthrow the government” isn’t a class of protected speech.

And also, they are being thrown off the platforms of private companies, not being censored by the government, so it's actually not a free speech issue at all.

> And even if they wanted to hire some smart engineers to do the job for them, we all know that engineers tend to swing on the other opposite of the ideological spectrum. Those who have built systems for escaping REAL authoritarian censorship would rightfully feel disgusted if asked to apply their knowledge to provide a safe harbour for rednecks to vomit their conspiracy-theories-fueled hate.

I'm not sure this is true. This seems to imply that nations which have copyright law are imposing authoritarian censorship on their citizens. This doesn't seem to be a pervasive idea, at least in the US.

There are proponents of information freedom who oppose copyright law. It's not clear to me that this group would oppose Parler, and in fact many I've spoken to believe they should be free to exist without censorship.

But - I am not sure they want to be associated with Parler either, out of concern for their reputation.


>This seems to imply that nations which have copyright law are imposing authoritarian censorship on their citizens.

This is exactly the point of most anti-copyright parties.


I see no contradiction.

> Those who believe in conspiracy content are people characterized by a psychological tendency to take shortcuts whenever they can in order to minimize their efforts in learning and understanding new things

I dont think it is that simple. I remember reading finding that smart highly intelligent people are more attracted to conspiracy theories. The complexity of those theories and details those rely on attract them.

Also, I may be wrong here, but I remember reading that Parler was funded by some pretty rich people. If that is true, they should be able to pay for tech know how.


There is definitely a correlation between lazy thinking and believing in conspiracy theories. Mainly because conspiracy theories do not lend themselves to rigorous inquiry, almost by definition.

This is different than "intelligence." It's more about effort and rigor in thinking. It's the quality of the thought, and the willingness to question your own assumptions. And a willingness to recognize the limits of your own knowledge and understanding.


A study recently published on Scientific American seems to prove that left-leaning people tend to have more gray matter in the pre-frontal cortex (i.e. the area of the brain involved in complex planning, understanding of new things and pattern detection), wwhile right-leaning people tend to have more gray matter in the amygdala (the area of the brain responsible for spotting potential danger and refuse something new if it may pose a risk to survival): https://www.scientificamerican.com/article/conservative-and-....

If that's true, and if indeed conservatives are much more likely to believe in conspiracy theories (http://www.scientificamerican.com/article/information-overlo... in conspiracy theories), then the opposite of what you state may indeed be true

Keep un mind that before a conspiracy theory turns into the perverse mind-twist of a complex theory like QAnon it ALWAYS start simple, and always simpler than reality actually looks like. It can always summarized with "those guys want to harm you, so don't even bother to look further, the explanation is easy": pure and total amygdala stimulation. Then, when they are contradicted by evidence, they put up more and more complex twists to mitigate the arise of cognitive dissonance in its followers ("I know that it looks like things don't make much sense, but you know, you have to follow the crumbs, or keep in mind that Trump is talking to you in Morse code" etc.)


I'd be interested on information on that and how it was performed. Ive found that many of successful people who talk about conspiracies tend to be self serving. Like that Texas lawyer that brought a case of election fraud, likely to catch attention of Trump to pardon him due to his own legal problems. Others as a scaremongering technique to influence politics.

The only ones that seem to believe in them are those clearly unhinged (McAfee comes straight to mind although his seems self serving too).


> And even if they wanted to hire some smart engineers to do the job for them, we all know that engineers tend to swing on the other opposite of the ideological spectrum.

Do we all really know that? Some very good technical people don't have particularly strong political views or keep them separate from their job. Example: lots of ordinary devs helped build porn sites.


As a dev I feel that building a platform to share conspiracy-fueled hate is way more immoral and damaging than building a platform to host porn content. At least porn doesn't harm anybody - except maybe your hand :)

Did you read the recent NY Times pieces on PornHub?

The Children of Pornhub https://nyti.ms/33DMObR

An Uplifting Update, on the Terrible World of Pornhub https://nyti.ms/2W1aB1b


There is no political profile for CS engineers.

The founder's motto was literally "Hack the planet"...

Indeed, that's not to be compared with TPB enthusiast's taste for hack and passion for CS things, but don't underestimate "right wing" techies...


I got the sense while crawling data from their API that the engineering quality is poor at Parler. Dates were represented as strings in "YYYYMMDD" format (so today would be "20210113053923") instead of UNIX timestamps, certain fields were duplicated for no reason (e.g. every object would have an identical "id" and "_id" key), counts of impressions/comments/etc would be the display strings rather than raw numbers (so "2k" or "5m"), and various moderation flags were in place like a boolean "sensitive" which was always false, even for posts that had been downvoted significantly.

Dates were represented as strings in "YYYYMMDD" format (so today would be "20210113053923") instead of UNIX timestamps

Such a representation naturally avoids the Y2K38 problem, and could go beyond Y10K. It's traditional in Windows and DOS (neither of which have the Y2K38 problem) to store timestamps as a structure of fields.

The other things you noted I agree with, however.


If they're using a javascript 53bit int representation for the seconds (or an int64_t cast down to a javascript big int) then it's a Y142711K problem, by which point the Imperium of Mankind will hopefully have settled on a more robust format.

The tech-priests will have lost the ability to fix it.

That's how we ended up with the 2038 problem!

I expect Slaanesh and friends will manage to sabotage that somehow.

You can also instantly read them which makes troubleshooting easier. I mean sure, if your shit is too slow maybe switch to less text in release mode but YAGNI.

Well, assuming they're storing the strings as ASCII, that's 98 bits - the y2k38 problem is for 32 bit integers, so a 64 bit integer would be way, way more than needed for human needs for foreseeable generations.

Doesn't seem to me like Parler will have to worry about Y2K38...

A timestamp is a timestamp. It isn't a date. If you need a date, use a proper date/time data type.

All timestamps have to start somewhere. If you want to avoid DST changes and leap seconds, you can use MJD, TAI or GPS time instead of UTC, but you might as well format it nicely so that you can see roughly at what (civil) date something happened.

ISO 8601 is a good one.

This.

Plus, it's unambiguously human readable, for users, bystanders, platform developers, everyone. There's a useful usability principle in there.


Nice that makes sense. I was unaware and found it strange when I plugged it into JavaScript's Date constructor and got an "Invalid Date" error.

Of all the things to criticise Parler's tech folks over, using ISO8601 (minus the non-digit characters) shouldn't be one.

Is ISO8601 without punctuation still ISO8601? Most log parsers I have seen would not pick up the Parker format. ex gr

https://docs.python.org/3/library/datetime.html#datetime.dat...

https://github.com/elastic/logstash/blob/v1.4.2/patterns/gro...


Yes... kind of. Per https://en.wikipedia.org/wiki/ISO_8601, there is a "basic format" without separators and an "extended format" that includes them for readability. However, a T is still required to separate the date and time in the most recent version of the standard.

ISO 8601 is pretty absurd when you actually read it. `2021-W02` and `--01-14` are valid, as is `--1013` (quick! guess what that means! and beware that `-1013` is valid too!)

Please, everyone, use a single format at all times in your systems. I don't really care what it is, though I'm fond of `2021-01-14T06:28:08Z` because it's unambiguous. But don't just say "use ISO 8601", it's far too vague and you'll inevitably have variations.


Without having read the spec...

* `2021-W02` means the second (ISO) week of 2021. Perfectly valid and used in a lot of planning.

* `--01-14` - I'm assuming this is a recurring date: every 14 Jan for every year

* `--1013` - at 1PM every 10th of the month? Guessing here

I believe ISO 8601 is a ISO codification of a DIN standard, and based on other standards processes I'm guessing some German manufacturing companies were the only ones who bothered showing up, so their internal software practices were encoded into the spec because no-one else cared..


That is such a common problem when standardizing, I've started to force my clients to have at least one person of each entity in project teams.

Often the biggest entity will end up accidentally forcing their practices, sometimes sub-optimal, to entire organizations, simply by having the manpower to show up to meetings.


Edit `--1013` is 13 Oct in any year: https://en.wikipedia.org/wiki/ISO_8601#Truncated_representat...

(`--01-14` is Jan 14 in any year, the last dash is "optional").

The "duration" (`P`) and "repetition" (`R`) syntax is also pretty wild.


RFC 3339 is a profile of ISO 8601 that is much more limited but still provides the timestamp format everybody expects when you say “ISO 8601”:

https://tools.ietf.org/html/rfc3339


Indeed, what you really want to say is "use RFC3339" (https://www.ietf.org/rfc/rfc3339.txt)

IDK the issue OP saw with using ISO over UNIX timestamps, but one reason why you might want accuracy down to the second for dates is with providing accurate relative time/date across timezones.

I think the display strings thing is because exact number of impressions etc is slightly sensitive information. The whole site was "gamed" from the start, but providing exact vote counts makes it easier for other people to game. I guess. Don't really know, but I do believe that the numbers given by reddit, for example, are exact, but fake. Fuzzed a bit. HN also hides some of this, or behaves misleadingly, your downvotes don't always count, I think.

They would display numbers less than 1000 as-is, and only start adding the "k" and "m" prefix after the 4-digit and 7-digit threshold was crossed.

But how could they maintain an accurate count? Maybe they were just persisting the user-friendly format alongside the actual count...

The endpoint of the API is probably just rounding the accurate number and returning a friendly number... or it's all bullshit anyway.

If I remember correctly mongo stores the id in “_id” and has a getter for “id” so maybe they just iterated all the keys of the model when they stringified their output

Elasticsearch, too. In either case, it looks like they're just piping raw backend responses to the API endpoint without removing unnecessary fields.

Yep, that's an indication of Elasticsearch being used (and not transforming documents to a standard representation that strips such fields).

It seems like they basically just exposed a lot of data directly, as apparently most of their APIs didn’t enforce any authentication or hide records that had been soft deleted.

Apparently the records were strictly sequential, which I don’t believe is true for Mongo which IIRC includes the node ID in part of it.


One big advantage of using string representations of dates is avoiding misunderstood timezone calculations that may or may not occur at various layers of the backend stack. The downside of course is storage space.

I think most JSON libraries encode dates in something that's closer to what Parler is doing than when you think is correct (e.g, using ISO 8601 or something)

I could see the argument for representing impressions as a string (especially if it's updated asynchronously and denormalized like that). The major downside is localization.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: