Hacker News new | past | comments | ask | show | jobs | submit login
Malicious attack on Wikipedia – what we know and what we’re doing (wikimediafoundation.org)
610 points by app4soft 12 days ago | hide | past | web | favorite | 293 comments





Remember: there are BitTorrent links that the Wikimedia Foundation gives out of SQL dumps of Wikipedia and the other projects. You can have a copy in case this happens in your country: https://en.wikipedia.org/wiki/Wikipedia:Database_download#Wh...

Also, the Kiwix project has a hotspot project that allows you to host ZIM files (dumps of Wikipedia and other CC licensed content, like TED talks and StackOverflow) on a Raspberry Pi, allowing you to share it with others. Setup info here: https://www.kiwix.org/en/downloads/kiwix-hotspot/


There's also a read-only IPFS mirror of Wikipedia in English: https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1m...

I love ipfs. Can this actually be ddos’d as well?

Yes. It's based on a bittorrent-like protocol with a DHT, so you can get a list of nodes hosting a particular file; then you can DoS them.

The ipfs.io is just a web-based way to access IPFS, called a gateway. There are a bunch of different gateways. In addition, you can run an IPFS node locally, and then as long as just one node holds the content you're looking for you are looking for you're good. There are also browser extensions to re-write gateway URIs to localhost URIs.

Does this actually answer the question? If there's a node online that means I can reach the content, but would it help with DDOS? Not so sure.

A popular IPFS file might be available on thousands of nodes similar to how popular torrents have thousands of seeders. DDoS attack against thousands of servers across multiple countries and networks would be nearly impossible to perform.

It depends on the kind of node storing the data and how many there are. It's likely easier to DDoS 100 people on DSL than a single Wikimedia endpoint.

Ipfs rarely stores 100% of content on one node.

So there is no central place like bittorent tracker which if down the network does not works?

Or is it like DHT which does not need central tracker?


IPFS uses DHT on a fairly fundamental level.

Caveat: the last full Kiwix English Wikipedia archive was made in 2018. They could use some help with automating their build process if anyone here has the time.

From a cursory glance at the site and source code, it's really hard to see who/what is involved with building an archive. There's automated builds set up for the Pi image itself.

The last time I checked, it was more a problem of lacking servers with sufficient resources: https://phabricator.wikimedia.org/T124960 https://phabricator.wikimedia.org/T219078

It sure doesn't harm if someone creates their own ZIM files and reports on their results (and/or shares the resulting files).


Agreed. I can see that other Wikipedia languages are crawled - https://wiki.kiwix.org/wiki/Content_in_all_languages shows dozens of updates this week - but the best leads I have involve poking around the openZIM Github org, https://github.com/openzim . There might be a running "zimfarm" somewhere?

You can build your own ZIMs from any MediaWiki instance using this tool: https://github.com/openzim/mwoffliner.

Maybe it would be worth putting together another zimfarm that is constantly updating.


Looks like you are right, you may be able to join the farm to help ( i have not tested as I am away from my computer at the moment )

https://github.com/openzim/zimfarm/blob/master/worker/README...


Someone should ask drone.io or packet.net for an Epyc machine to do this.

I'd actually love to see a fully working IPFS fallback for wikipedia when regular hosting doesn't work. Would it even be possible with ipfs?

IPFS has a Wikipedia mirror but it is fairly out of date since it is dependent on the Kiwix archive.

https://github.com/ipfs/distributed-wikipedia-mirror



Assuming that all formats contain the exact same data, i.e. they were generated at the exact same time, which is the (1) most useful for offline viewing (2) most future proof for archival and backup? Is there another, more viable/useful format?

The XML dumps are the most compact and sustainable format in the mid term (let's say decades). https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_...

ZIM might be able to survive longer (centuries?) as probably the future will still need some HTML parser, while wikitext parsers or PHP might be long dead, who knows.


Thank you for the answer. So especially for personal use I am better off hoarding the ZIM version, especially considering there is the dedicated Kiwix reader, while I am not aware of a similar tool for the XML dumps.

Only 85GB uncompressed for current revisions, excluding “user” pages. Not so bad!

Someone claimed the attack on twitter with some details (DDoS) - and proved it later by stopping the attack for x minutes then restarting it at a specific time. https://twitter.com/fs0c131y/status/1170093562878472194?s=20 - the attacker also went on to DDoS the twitch ingest servers (not twitch.tv itself) knocking some big streamers offline.

It looks like a volumetric attack from this tweet. Wikipedia needs to use Verisign BGP mitigation. They create GRE tunnels to your routers and are capable of handling 2Tbps. During an attack, you make a BGP announcement and the traffic goes via Verisign scrubbing/tunnels. No application changes are required, no Matthew Prince selectively and benevolently enforcing CF neutrality. It's used by large banks.

After working with a few large corporations and their DDoS protection solutions, I did not have a good experience with Verisign, and they were not able to handle attacks or get things working. However, I have great experiences with Akamai and Cloudflare. I trust the people at Wikimedia will choose wisely. I would I have learned that Verisign has one of the worst BGP mitigation/scraping solutions out there. There are a few alternatives that have more experience and provide much better uptime, include solutions from Cloudflare and Akamai.

Any serious mitigation solution must be BGP based, not proxy. Besides its technical merits and convenience, it also minimizes the risk of a benevolent controller (e.g. Matthew Prince of Cloudflare) ruining your company, because it becomes your upstream provider only during the attacks. Otherwise the GRE tunnels are not in use. The IP addresses are still yours always.

We used Verisign for mitigation of a 44Gbps volumetric attack and it worked very well. We also evaluated Neustar, but Verisign's infrastructure seemed to be more robust.


That's your requirement, but it might not be Wikipedia's requirement. Ownership of IPs is really a technical detail invisible to most people; ownership of eyeballs by way of the domain name and top Google result is probably more important. Cloudflare doesn't impact that ownership other than being able to temporarily take you offline if they choose to terminate your site.

Still, large proxy-based CDNs do have the ability to completely bypass all the same-origin protections in the browser. Even if they are angels and don't abuse this trust for identity theft and surveillance, it makes them a juicy target for bad actors, state sponsored and otherwise.


A proxy is a perfectly acceptable “serious” solution for this type of problem, as well as nearly all of the rest. Wikipedia is not the kind of website that would warrant being removed from Cloudflare. What’s wrong with having an upstream provider for caching close to the user and other features when you’re not under attack?

> What’s wrong with having an upstream provider for caching close to the user and other features when you’re not under attack?

The problem is that you are basically mitm:ed all the time.


That’s not what MITM means. I get that you don’t like Cloudflare but voluntary use of a CDN isn’t a MITM any more than, say, Amazon is a MITM because you host on EC2.

Cloudflare is in between the client and the server, decrypting, rewriting and (if set up right) re-encrypting the request/response. It masquerades as the server by presenting a proper certificate for the domain even though it is not the entity that is actually controlling the domain.

That to me sounds very much like MITM, although it is not a MITM attack since the entity controlling the domain opted into it, so basically it is voluntary MITM.

Using a VPS like EC2 is a different story since the decryption happens within the layer that you control. Of course you need to make sure that you choose a vendor for that layer that you trust, but on EC2 the traffic that amazon sees is encrypted with keys they don't have and decrypted with keys stored on a layer that I control. Amazon could read out the memory of my EC2 to get the keys but their business depends on not doing so, so in this case either I have a vendor that always will decrypt and read traffic (Cloudflare), or a vendor whose business depends on hypothetically being able to but not doing it. There is a clear difference to me.

That is the same for most CDN's (including CloudFront and all the other major offerings), so I'm not trying to single out Cloudflare.


If you don’t trust Cloudflare, don’t use them but there’s no meaningful security distinction between what they do and what AWS does: in both cases you have a vendor with the capability of violating your security and a promise that they won’t abuse that access.

This is why having a threat model is so important: it keeps you from wasting effort on things which sound like security but aren’t actually changing anything meaningful.


There is a security distinction, and this has been shown by for example cloudbleed. Every step that has access to plaintext data is a potential attack vector and might be logging/leaking information.

There has also been times where cloudflare (when setup improperly as I mentioned in the previous comment) has misrepresented the security of a connection, as shown by https://www.theregister.co.uk/2016/07/14/cloudflare_investig...


The MITM can be avoided by using Signed Exchanges. https://developers.google.com/web/updates/2018/11/signed-exc...

That only works for static content, right?

No, they can be created on the fly. That basically makes it a TLS signing Oracle.

Cloudflare’s business also depends on not messing with your traffic, right? It would certainly be easier for them to get your users’ content than for Amazon to do the same, but I think you still have to accept that risk with either. “Hypothetically being able to but not doing it” isn’t a whole lot of confidence if I were hosting some kind of shady website.

Sure, but since Cloudflare’s business is actively "messing" with all your traffic, all the time it's a smaller technical step to do it some more, and can also lead to accidents like cloudbleed. Every step that has access to unencrypted data is a potential attack vector or might be logging/leaking data.

You upload your private SSL key to Cloudflare for example. And I was talking about hosting on your own hardware/colos like most large sites do (7x cheaper than AWS list prices on avg)

Please specify in detail how you believe that’s an MITM using the standard industry definition. In particular, consider whether “attack” and “voluntary business agreement” are synonyms.

MITM is not a uncommon term to use when you do things like install corporate SSL certs on laptops so you can monitor people's activities.

Breaking open encryption to monitor activity between users and other sites is a completely different thing than having a provider handle hosting for your site.

A better comparison would be Cloudfront and Application Load Balancers since you can expose your own ec2 server or load balancer and be e2e encrypted (unless AWS wanted to run commands on your instance, which they could do, but that's a different threat vector entirely).

That was the model I had in mind but it’s not really a meaningful distinction since the host could almost certainly compromise those servers as well. In any case, you’re trusting a third party rather than having their involvement maliciously imposed.

Akamai has a BGP based DDoS mitigation service via their prolexic acquisition.

[flagged]


The originalcontent was posted on IG. 8ch took the reposts down when it became known that it was connected to the real shooting. Watch the video with the 8ch founder explaining (unless YouTube took it down too). Matt was preparing for the IPO.

You appear to be extremely mad that anyone questions the power of political pressure and an angry mob.

Look, you can feel however you like about whether the high-profile takedowns are right or wrong, whether the CEO's promises after the Daily Stormer are hypocritical — but let's be clear-eyed about placing a site in a position where one outside person can do it real harm. The question you should look at is whether the risk is actually acceptable for your organization.


How did 8chan "encourage" large gun massacres exactly? By allowing users to post content?

By not moderating content largely, it was no secret what the site was letting go.

By your statement then reddit was complicit with the Russian trolls during election season because the bitcoin trolls who evolved into trump trolls were not punished in the slightest (I have a list of 300+ usernames that are still active today)

Reddit is actively moderated by both paid Admins (site wide rules) and volunteer Mods (per subreddit rules). So no, I disagree.

the chans are also actively moderated, and the chans remove CP and did remove other content after events happen

The point is that Reddit tries to moderate, which is good enough for their providers (AWS/Fastly).

The 8ch takedown wasn't actually due to issues with moderation, since (at least based on the owner's video) 8ch removed the post, actively responds to real law enforcement requests, and the original post was actually posted to IG. The issue was that CF was getting enough bad press, and more importantly enough calls/concerns from real Enterprise clients (this is speculation on my part), to take down the website.


Alternately: The fact that Prince was super okay with hosting those websites until the moment it made him look bad

That's a valid stance but they didn't host the website; they only provided DDOS protection for the actual host (which proceeded to drop 8ch once CF stopped providing the protection).

> It looks like a volumetric attack from this tweet. Wikipedia needs to use Verisign BGP mitigation. They create GRE tunnels to your routers and are capable of handling 2Tbps.

Great way for a state actor to intercept your traffic. little bit of volumetric dos and the target themselves responds by tunning through your partner(s).


>no Matthew Prince selectively and benevolently enforcing CF neutrality.

What's the logic behind this? It's still a single point of failure and relying on a corporation. If the daily stormer or 8chan tried to use them, they would probably kicked off as well.


If you are not a political undesirable, it does help, though. I think Wikipedia is fine in this regard, not something to shun of for a big corp.

Wikipedia is blocked in China. It's politically undesirable for 1/8 of the human population...

I think undesirable here describes something like white nationalists. They have a problem getting web hosting.

CloudFlare has strategic business partnership with Baidu [1]. They are very likely to cooperate with the chinese government to implement the great chinese firewall.

Additionally, helping to block Wikipedia because China says so is much easier to excuse than blocking 4chan - they would just be complying with local regulations after all.

[1] https://www.cloudflare.com/press-releases/2015/cloudflare-an...


Because all of them don't want it?

Unlike a DDoS attack, this is not a technological problem.

There's always something "undesirable" for someone in a big crowdsourced website.

The cloudfare 8chan action was based on a direct link with multiple actual mass-shootings. Moreover, as they took the decision they went to great pains to explain this was an exceptional case.

Going from that to 'undesired political speech will be censored' requires more of a slippery cliff than a slippery slope.


>The cloudfare 8chan action was based on a direct link with multiple actual mass-shootings

What is this "direct link" you speak of? Did the shooters plan/recruit/organize their attacks on 8chan?


> What is this "direct link" you speak of? Did the shooters plan/recruit/organize their attacks on 8chan?

Legally, a "direct link" is irrelevant, you can rarely find a "direct link" between two of anything. What matters legally is whether 8chan was a "proximate cause" in creating the mass shootings. Whether one thing is the "proximate cause" of another is often pretty difficult to discern.

However, as a helpful guide towards determining proximate cause, lawyers ask whether one thing was the "but for" cause of another, i.e., would the mass shootings occur "but for" 8Chan? Put another way, if 8Chan did not exist, would these shootings occur?

Unfortunately, we do not have an alternative reality to play out events without 8Chan, so we cannot know for certain, but we can use evidence (e.g., 8Chan chats, how the shooter interacted with 8Chan and others on the service, etc) to try to simulate that alternative reality. All of this analysis also needs to consider related issues like freedom of speech on public forums and any commercial interests.

I'm not saying 8Chan is guilty or innocent, just that the existence (or lack thereof) of a "direct link" is pretty meaningless.


There are multiple instances of them announcing them and implying they are follow-ups of previous discussions on 8chan.

These include the Christchurch shootings, the Poway synagogue shooting and the El Paso Walmart shootin.

The Christchurch shooter shared his Facebook stream to 8chan before the shooting started, and it was spread from there.

The Poway shooter blamed/thanked 8chan for his views.


So FB's internet peers should depeer Facebook then in their routers, since the original material (the stream) was on FB? Or you prefer your justice selective?

I'm sure you already realize this, but to make it clear: FB has enormous utility for billions of people outside that and that is worth defending.

You are expanding a lot of effort defending 8chan here. Perhaps consider that it might not be worth defending.


8ch had a lot of very interesting and non-violent stuff. Have you been reading it regularly? I did.

I lived in a socialist country and you did not. Perhaps consider that you might not know where these current trends are pointing to.


[flagged]


you're not really engaging with his point. Effectively banning 8chan by removing network protection does not just restrict extremists; it restricts anyone who used that forum.

Ultimately, such matters should be prosecuted by courts. It is inappropriate for organisations like cloudflare to leverage their position within essential network infrastructure to start editorialising what passes through their network.


It is inappropriate for organisations like cloudflare to leverage their position within essential network infrastructure to start editorialising what passes through their network.

No, I think it's entirely appropriate.

"Don't troll" and methods for dealing with trolls has been a thing all sites have done since the internet was invented. I don't see any difference here at all.


the difference is their position in the stack.

Cloudflare blocking people that abuse the network is legitimate (e.g. spam, denial-of-service), just like it is legitimate for forum admins to block people that abuse the forum (trolling, explicit posts).

But cloudflare, or any other network infrastructure provider, shouldn't be determining permissible content for websites because they are not hosts/administrators for that content.

It is like a postal service reading your letters and then saying "we don't like what is being said, so you can't send letters anymore." They can and should stop people sending dangerous materials by post, but they should not be determining permissible content of letters.


See, I think 8-chan itself is a troll, and it is entirely reasonable to deal with it by refusing to provide service.

It is like a postal service reading your letters and then saying "we don't like what is being said, so you can't send letters anymore." They can and should stop people sending dangerous materials by post, but they should not be determining permissible content of letters.

No it's not. It's like FedEx declining to deliver for a company which continues to cause it problems, or refusing to service Amazon[1]. Or like Visa refusing to service businesses which have lots of charge-backs.

[1] https://www.nytimes.com/2019/06/07/business/fedex-amazon-exp...


Actually, it is illegal to mail obscene materials or crime inciting matter through the postal service.

https://www.law.cornell.edu/uscode/text/18/1461


yes, but these are investigated and prosecuted by police, public prosecution services, and courts; not by couriers discontinuing their services.

if 8chan was cut off because they were subject to extensive network attacks and cloudflare did not see any profit or value in serving them then I am ok with that. I just don't think that's the reason.

I expect that a different site with the same contract and payment terms, subject to the same attacks would have continued to be protected. maybe I'm wrong but it looked like a political decision, not a business decision.


The direct link is that they announced these attacks there.

Beyond that, given the announcement there, it stands to reason they were convinced to do it there.


I think it’s somewhat misleading to refer to those who support genocide and child abuse as simply “political undesirables.”

It's not just supporting. Taking a neutral stance on censoring these things, or not being adequately proactive on hate speech, is now seen as condoning. You either censor your user base, or upstream will censor you. Gone are the days of "The net interprets censorship as damage and routes around it." The new policy is "The net interprets wrongthink as noise and filters it out."

It’s not censorship: they are not suppressing information, they just aren’t allowing their resources to be used to spread it.

It would be “censorship” if they actively antagonized any attempt to spread the information, such as by lawsuit or DMCA notice. They are just refusing to participate.

And given that the “information” is definitively known to be child pornography and violent white supremacy propaganda presented as news, I would personally say refusing to participate is the only responsible action.


> Gone are the days of "The net interprets censorship as damage and routes around it."

But it's clear that it matters just what's being censored. Surely you wouldn't say the same trite clever-sounding hackerspeak if we're talking about censorship of threats, assault and child pornography, would you?


They are beyond a certain line; some very-very far past it, some just crossed it. It makes them unsupportable by any corporation that aims to look decent.

Genocide has been and still is a political tool. It is extreme, but ultimately something that people consider and carry out as part of political processes, not a special category of its own. And realpolitik is to continue dealing with countries that practice genocide. Consider Burma or China.

Cloudflare simply has the luxury of choosing which politically disagreeable parties they do not want to associate with because they are insignificant customers.

Pretending that this is not due to differences in politics and moral judgment is semantic smoke and mirrors.

Anyway, the point is that they are not a neutral carrier/providers. Unlike banks or telecoms which are required by regulation to accept any legal business. CF styles itself as neutral infrastructure, until they decide they are not.

The risk of getting deplatformed due to someone's moral judgment is quite real, even for an entity such as Wikipedia. For example they were blocked in the UK because the Virgin Killer album cover landed it on a block list used by major ISP.


I didn’t say it wasn’t political, but it’s not just undesirable for immediate political reasons — it’s undesirable for nearly universally-agreed moral and ethical reasons. So implying it’s only inconvenient for politics is, in my opinion, misleading.

The political tends to encompass or at least subsume the moral and ethical aspects, as I tried to allude to with the realpolitik aspect.

But again, this is just a tangent. The core argument is that it is best not to rely on providers that have the freedom to make political/moral decisions who they deal with because that freedom makes them susceptible to moral denial of service attacks. You are one moral outrage away from being deplatformed.


I can see that mentality, but what I’m saying is that, personally, if I become a Nazi, I think I should be deplatformed.

Then who decides what is a Nazi? Deplatforming someone for their speech makes them one in my book. How far down do we go?

> no Matthew Prince selectively and benevolently enforcing CF neutrality.

Is this a slippery slope argument.

Because there is a world in difference from discontinuing a few extremists customers, to discontinuing service for something akin to Wikipedia.

I'm not sure every slight compromise of principals is a slippery slope. It seems to me that CF generally aims at being neutral.


The argument made here is there is a chance (however minute) that the same can happen to something like Wikipedia because of some misplaced sense of morality, like say - we don't agree with wikipedia edits and editing process which we see if offending certain sections of X population. It does not matter how right their reason is. The fact that providers like cloud flare are in such position to take a moral high stance is not right ...

I don't disagree, but has it ever been any different?

Anyone that suggests that there is One True Solution TM is either biased or ignorant.

You also don't get to claim it supports 2Tbps if you've only weathered 44Gbps.


This recent CF product announcement might be the same thing (not sure, sounds similar): https://blog.cloudflare.com/magic-transit/

There's plenty of specialized providers which provide this service, Verisign is one of many.

The issue with on-demand BGP mitigation is that an attacker can do short attacks on and off over a long period of time. Each time the mitigation kicks in, BGP propagation takes at least ~1 minute and will cause some downtime. Proper protection is always-on without requiring redirection.


What's with the username? Are you trying to equate dang to Deng Xiaopin?

The attacker also attack Blizzard's game servers. Is actively taking down WoW classic and Overwatch.

https://www.reddit.com/r/classicwow/comments/d10x4f/servers_...

The attacker was posting updates to Twitter, but their account has since been suspended.


Did they say anywhere what their motive was?

They are advertising their services.

Power tripping most likely.

It's not cheap, advertising their services seem more likely to me.

Do these kinds of attacks usually have a motive?

I am pretty certain that when China used the Great Firewall to attack github, this was a test of their capabilities.

Source?


What a world we live in... Government sponsored DDoS.

What a world live in if you are the 1.x million in som northern part of china. The 1.x billion if live inside Great Wall. The 7m if you are in Hong Kong or 2x million if you are in Taiwan.

You live in a world where a totalitarian communist state is welcomed and controlled a significant portion of the world economy. Even speak in internet summit.

Welcome to the brave new internet and international world of china.


They can be used by blackhats selling e.g. DDoD-netbots to prove the “quality of the merchandise”.

I definitely get the feeling that’s what they’re going for. They mentioned they’re just testing out a new botnet made from IoT devices.

Ah, Mirai Mark N. So what, light bulbs? Cameras?

Where did you read this? I am curious...

From the attacker's twitter account, just read through his/her twitter replies (though it sounds like it's actually a group of them).

And so the IoT wars begin...

Who are they?

Some clown named ukdrillas. That's about all any one knows.

Just want to mention, WMF has a very small but elite team of engineers. Amazed they maintain an Alexa top 5 site with many orders of magnitude less engineering staff than Facebook or Reddit. I think they must count ~100 engineers?

I can't imagine what such a small team must be going through with a major DDOS - wish them well in their efforts!


It's because they're just serving a big site, not running the world's most sophisticated surveillance and ad serving machine. Serving giant websites isn't all that hard if you're just spewing out SQL queries into html templates. It all scales in all directions with a properly thought through architecture.

> Serving giant websites isn't all that hard if you're just spewing out SQL queries into html templates. It all scales in all directions with a properly thought through architecture.

No.

1. Your comment makes it sound like Wikipedia is just, or mostly, serving read-only content, which is far from true. Yes, static read-only content is significantly easier to serve than dynamic, editable one, but Wikipedia is the latter.

2. Claiming it's easy to build something at this scale is "isn't all that hard" just makes me think you've never done anything similar. It reminds me of devs saying they could re-build MS Office over a weekend. It's just ignorant of the software's actual complexity.

I'm not associated with Wikimedia in any way, but have worked on large-scale software projects before, and things are quite different from, say, websites only serving 100k monthly active users.


I have, actually, worked on very large and interactive websites at the very core. Notably: betfair.com which has a very busy API and website and used to be something like a 1:10 write:read ratio with multiple clusters and layers of fancy caching to keep it all coherent down to millisecond scales.

Wikipedia does not need to be globally consistent like Betfair does and the ratio of writes to reads is nothing like 10%, I'd guess at one write per million reads or less. There are several pretty obvious ways to architect a site like Wikipedia for effectively unlimited scalability. The main trick is that it doesn't matter if a page is slightly stale and you can queue edits in the backend for quite some time (many seconds) without severely harming end users. Given those constraints it really isn't rocket science given the plethora of amazing tools we have to hand.

What I'm NOT saying is that I could build it in a weekend. It would clearly require a few teams of skilled engineers to put it all together and, crucially, operate it. My initial comment was in the context of Wikipedia having 100 engineers, and I think it's reasonable to say that a team that size is easily capable of such a feat.


Hi, I am interested in this project. Could you please provide some minor detail about the architecture, like what framework was used for serving that many requests?

Oracle and java and a whole lotta optimisations.

I've never heard anyone in my life say they could rebuild MS office in a weekend.

What, in your opinion, would be the work needed to go from a 100k monthly active user site to a wikipedia scale site - that would be comparable to rebuilding MS office?


I've never heard anyone in my life say they could rebuild MS office in a weekend.

The saying usually uses Facebook or Twitter.


That seems indeed comparable given the scope and functionality of Wikimedia products.

The core parts of Office could be done on a weekend, but in order to get the same complexity and incompatibility it would take several "codemonkeys" several years to achieve.

Silly Microsoft wasted hundreds of people and decades of time. Why didn’t you tell them?

Software projects are usually 90% done in 1% of total time taken. And if you just solve the problem with duct tape eg. a shell script, like piping stdin to a file, or contenteditable=true in HTML, you would have a very basic word program, and if you take that route you will probably have the essential features done over a weekend. But going from that to a full Office clone would take years. The real challenge in development though is to solve real problems, eg not make solutions looking for a problem, and not implement new features ( implementing only features that solve real problems).

Your countering a point they didn't make. SQL to HTML templates indicated dynamic site, not static. From there, they describe surveillance and ad networks that both increase the browser workload and make it rely on 3rd-party dependencies.

I thought it was a good, but snarky, point. Especially given my browsing sped up after I installed extensions that turn all that crap off.


Please be careful of logical tautologies:

"It all scales in all directions with a properly thought through architecture" sounds dangerously like, "Programming isn't that hard if you just do it right."


> Programming isn't that hard if you just do it right.

That's not a tautology. In fact, it's actually worth pointing out, especially to junior engineers who get frustrated by how hard everything is, that it actually doesn't need to be that hard if you, well, do it right. Obviously that's not productive feedback without actually helping them be better, but it's far from a tautology.

For anyone wondering, a tautology is a statement that is logically true by construction, rather than contingently true because of the way the world is. For example, "Programming isn't that hard if it's easy" would be a tautology. Constructing a counterexample by changing programming to something else shows that this was not a tautology to begin with: "Sending a man to the moon isn't that hard if you just do it right," which is obviously false, because even if you do it right that's objectively difficult.

Programming is hard, but we make it much harder than it has to be by doing it spectacularly wrong in many ways, both individually and collectively.


I think that you are speaking of a logical tautology, while I am speaking of a linguistic tautology. If I am correct about this, we are both right.

A logical tautology is, "A statement that is true by necessity or by virtue of its logical form."

A linguistic tautology is, "A phrase or expression in which the same thing is said twice in different words."

In formal debating, for example, you can call someone out for either type of tautology.


Because we express logical statements in English, these two kinds of tautologies overlap. (If we were using a formal language we could express tautologies like !(A && !A) without using English.)

If you say "All bachelors are unmarried", this is true both because of the meaning of the words, and because of the logical structure implied by the words.

In either case, if you state a tautology, you state something which is true in all possible worlds, given the definitions of the words, at least. Someone can then call you out for stating a tautology, which is to state something that is vacuously true, that is, you've made a statement about the nature of reasoning itself, in any possible world, but you haven't said anything at all about the world we're actually in. So you're wasting your breath even though what you say is unassailably true.

(Note that in mathematics, the tautologies are precisely the theorems with their premises or axioms! So this is by no means always useless.)

The problem with calling out tautologies in common life, however, is that the danger of identifying a false tautology is very high. When someone says "Either X happened, or it didn't." you may be tempted to say "tautology!" but in fact they are probably making some kind of oblique point or highlighting a flaw in someone else's argument, etc. In other words, tautologies may be vacuous as statements about the world within a logical framework, but as speech acts in the real world, they always come with a motivation and that can usually be expressed in a non-tautologous way. For example, "A or not A" can be expanded charitably to "A or not A, and this is relevant to the topic at hand", which is not a tautology anymore.

In this case, if you do something right, it's not extra hard, which is kind of a tautology. But there's a point in saying it, which is that it doesn't have to be that hard... if you do it right. And that's not true of everything or in every possible world, hence not a vacuous statement.


> That's not a tautology. In fact, it's actually worth pointing out, especially to junior engineers who get frustrated by how hard everything is, that it actually doesn't need to be that hard if you, well, do it right

But this boils down to If you build systems using a high level of skill and foresight, it's easy to do.

This is of course not a tautology, but a contradiction. I agree that inexperienced developers can, as it were, 'make life hard for themselves', but that's (trivially) due to their inexperience. I don't think there's a silver bullet for inexperience.

Over-engineering is bad, as is under-engineering. Fuzzy principles like 'YAGNI' can't be applied without skilled discernment, which means experience.

> Programming is hard, but we make it much harder than it has to be by doing it spectacularly wrong in many ways, both individually and collectively.

I think I agree with this, but it depends on specifics. What sorts of things are you thinking of?


> If you build systems using a high level of skill and foresight, it's easy to do.

The point being made here is not necessarily a flippant 'git gud'. Instead, it is a statement that problems are tractable, and that getting some things right up-front can have good pay-offs down the road.

In other words, don't give up and try to figure out what is good and bad practice.


Yes, "don't give up" is the main thrust. However, if an individual exists in an environment where bad practice is rewarded and good practice is scorned, the advice needs to go beyond individual practice. We need to not give up on the environment, and that in turn requires hope that a better environment is possible and within our reach, as an individual, a team, and a discipline/craft/practice. This is a hard problem.

> But this boils down to If you build systems using a high level of skill and foresight, it's easy to do.

Yes.

> This is of course not a tautology, but a contradiction.

It's not quite a contradiction! If you bill $1 for changing the bolt, and $9,999 for knowing which bolt to replace, this shows that the work is easy, but the experience required to make the work easy is not easy. If the master can draw it in seven strokes, but you don't see the seventy thousand strokes they did before, it looks easy, and in fact it is easy, for the master but not for the novice.

> I agree that inexperienced developers can, as it were, 'make life hard for themselves', but that's (trivially) due to their inexperience. I don't think there's a silver bullet for inexperience.

That's right. However, we can also make life hard for each other, and there are some solutions for that that are better than doing nothing.

> Over-engineering is bad, as is under-engineering. Fuzzy principles like 'YAGNI' can't be applied without skilled discernment, which means experience.

Yes. This is why we have code reviews, design reviews, pair programming, and so on, but these aren't silver bullets either and there is no silver bullet, but if these things lead to increased awareness of why and not just what and how, then we can accelerate the process of acquiring that discernment. As Dijkstra said, if it goes to the grave with you and you didn't pass it on, you didn't really do your job as a senior engineer (paraphrasing).

>> Programming is hard, but we make it much harder than it has to be by doing it spectacularly wrong in many ways, both individually and collectively.

> I think I agree with this, but it depends on specifics. What sorts of things are you thinking of?

Using the wrong tool for the job. Using too many tools for the job. Using tools that do not afford mastery, because they are too complex for anything built on top of them to be comprehensible.

This is all quite abstract. A specific example: we (the JavaScript community) had a good thing in that JS was a small, human-scale, useful and commercially valuable language with applications beyond its initial environment on the web. We got excited and built the npm package archive, and filled it up, and now we have unmaintainable, incomprehensible piles of trash piled upon trash that can't possibly be used as a foundation for anything reliable, performant, or maintainable. This is unfortunate. What's even more unfortunate is that this piled-up-trash approach still has momentum, still lets people get useful work done, and still has some value to the community. So we keep using it, and even keep piling more on. It takes considerable effort to step back from all this, take a collective mulligan, and start over with a principle of taking things away to make things better, rather than adding more hacks to hide existing hacks.

This is one example of many, but I mention this one because I was there when JS was simpler and better and I watched as we made it markedly worse. I would have recommended JS as a first language to beginners when node.js and npm were new, and I did, but I cannot now recommend them in good faith, because they have become antagonistic to quality and to mastery of the craft.


> If you bill $1 for changing the bolt, and $9,999 for knowing which bolt to replace, this shows that the work is easy, but the experience required to make the work easy is not easy. If the master can draw it in seven strokes, but you don't see the seventy thousand strokes they did before, it looks easy, and in fact it is easy, for the master but not for the novice.

If it takes years to be able to do it well, it's not easy.

> So we keep using it, and even keep piling more on. It takes considerable effort to step back from all this, take a collective mulligan, and start over with a principle of taking things away to make things better, rather than adding more hacks to hide existing hacks.

True, but it can be done. The community moved away from Bower, for instance.


I appreciate what you're saying, but I don't think it quite applied. What I meant was that it's easy to create an architecture for an application that doesn't scale well at all. Eg - poorly sharded data, lots of cross dependencies etc. However, if you properly think through your data model and data flows and use cases, it's generally possible to create a system that is extremely scalable in all directions. This is certainly not easy, but it's a hell of a lot easier than creating some huge ai driven data slurping ad empire.

I totally agree that you make an excellent point about the relative ease/difficulty of various approaches.

>>"Programming isn't that hard if you just do it right."

Is this like saying, programming isn't hard if you choose easy enough problems to solve? Or should we ask for a link to see a demo of an AGI implementation?

I guess math is not hard either if you're "doing is right", as long as it's all arithmetic...

>>That's not a tautology.

I would agree tautology is not the best description, probably fallacy would do fine.


> Is this like saying, programming isn't hard if you choose easy enough problems to solve?

No, this is saying that things don't have to be as hard as we make them. You don't need more than a hundred people to run a top-ten website, and that shouldn't be surprising. It is surprising only because we are so good at making things overcomplicated.


But also perhaps it’s because they didn’t allow a team to endlessly iterate on tech minutiae until they required many teams to keep it running and iterate on tech minutiae.

> I think they must count ~100 engineers?

https://wikimediafoundation.org/role/staff-contractors/ has the names of 379 employees. I believe (perhaps astonishingly) that is all - engineers and non-engineers combined. Their engineers spread across departments, but judging by the 141 instances of the string 'engineer' in that page, I'd be surprised if the number exceeds 200.


Speaking as someone listed on that page, and having attended all-hands meetings, yeah, we're not huge.

Though it is worth bearing in mind that everything's open source, and there's a hefty community component. So there's a more vaguely specified number of people who might provide patches, and individual wikis are mainly run by volunteers.


Just drop by and say thanks you. Not many worldwide charity for human knowledge. Add oil.

That’s what happens I guess when you’re running a charity, you can recruit top talent (I assume many 10x folks wouldn’t mind working for wikimedia!) and every dollar counts. Pretty incredible.

They seem to be at the leading edge of hiring remotely and they don't pay anywhere near facebook salaries. The culture must be attracting some strong developers.

And they're hiring! https://wikimediafoundation.org/about/jobs/#section-8

I worked there for four years and I miss it every day.


> I worked there for four years and I miss it every day.

Sorry but now I'm curious, why did you leave?


Wikipedia has a huge impact in people's lives, particularly in non-English languages, and there's so much work to do, and so much of it feels urgent and necessary. I really responded to that, and I wasn't careful, and burnt myself out. (This was not the fault of the org; Wikimedia is largely a do-ocracy, and if you're intent on working through the small hours of the night, there is very little anyone can do to stop you. Co-workers who saw what I was doing did urge me to pace myself and exercise self-care.) By the time I realized what I was doing, I was in a pretty bad way, and felt like I needed a complete change of scenery to get back on my feet.

Interesting! I see that happen to people in the non-tech non-profit world a lot. Well, thank you for your service, and hope you are starting to enjoy a rest well deserved!

What kind of role were you in at Wikimedia if you don't mind answering?

He was all over the place! It's all public under the same username so I hope he forgives me spoilering it: https://www.mediawiki.org/?search=Ori+livneh, https://gerrit.wikimedia.org/r/#/q/owner:Ori.livneh .

Not to diminish Wikipedia engineers talent, of course...

But, I'd consider Wikipedia traffic to skew heavily towards anonymous read-only, with very few logged-in write traffic.

This allows for tons of caching opportunities: Varnish, Memcache, etc. And these techniques are well known.


The proportion of read/write may skew towards reads, but Wikipedia still is an application where any user can create state visible to all other users. It's not as simple as this comment makes it out to be.

But how quickly must those writes be reflected in the reads of others? If you can accept a few minutes of latency there, I imagine things would get easier

In order for wikipedia's anyone can edit to work, its really important that when someone makes a bad edit to a popular article that it can be removed immediately. This is important both to get things fixed quickly and to make it less of a juicy target so less people vandalize (no fun to vandalize if it doesnt stay up).

I suspect latency in the minutes for cache updates would be unaceptable to wikipedia users


Power users use very different workflows than read-only users. You can serve pages from a 30-minute-old cache to the 99.9% of passive readers and it doesn't hurt that much. Editors use "Recent Changes" to monitor edits, and that's much easier to render in real time because the audience is comparatively minuscule.

Yes but if someone replaces the picture on the trump article with goatse, and non power users get this version for 30 minutes until the cache clears - they are going to be pretty pissed and start yelling to power users & just generally cause a PR disaster.

Additionally if vandals know their vandalism will stay for 30 min, they are much more likely to do it, which is a vicious cycle


Aren't articles like Trump write protected? If you want to edit, you shouldn't be able to, unless you have an account that's not brand new. You will be banned very quickly as soon as you start putting goatse on most visited pages.

Also, the white house PR team is actively watching and editing political figures articles. They will sort it out too.


First off, keep in mind that even scaling a broadcast publication can be complex. Sure one can bolt on fastly or s3 but cache invalidation is never a simple problem.

Next "power users" as others put it are not a single set of editors. It's more of a social network with multiple levels of trust. The idea of a wiki is that all users have write access, even if those changes are moderated to have different levels of latency.

Of course there are ways to engineer the system, but at that point one is, well, engineering a system. And WMF is doing so on a shoestring compared to other comparable levels of traffic.

Is WMF creating new paradigms of computing? Probably not. But they are doing a good job, IMHO.


It must be immediate, because Wikimedia can detect edit conflict (when someone update the article you are in the middle of editing)

It must be immediate for logged-in users, but not others. It’s an excellent thing that Wikipedia doesn’t nudge people to log in all the time, and I susped 95% of users are not logged in.

That doesn’t mean all reads have to be immediate, only some.

If you consider the amount of money they are burning in comparison to 5 years ago, are the results really that impressing? See https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...

I found this updated version: https://en.wikipedia.org/wiki/User:Guy_Macon/Wikipedia_has_C...

Their expenses have doubled in less than 5 years...

Even if their ratio Expenses / Assets has now decreased compared to 3-5 years ago (but stalls now), it means that their goal of financial independence is still very far away and they still rely heavily on a huge amount of donations.


I buy that yes it takes time to be financially independent of donations while offering a global information service for free.

But that essay is clearly pure hyperbole. The expenses aren’t exponential, they’ve been roughly linear for a decade. Notice how the word exponential was removed in the second version. The graph is showing increasing savings along with increasing growth, and the expenses appear to have slowed slightly in the last five years compared to the five prior years. It’s completely failing to demonstrate the stated claim of runaway spending, the numbers practically prove the opposite.

Plus it’s not outlining what the money is used for, so there’s no concept of efficiency here, no reason to doubt that increased service came with increased expenses. There’s zero meat in this argument.

Whatever; last year’s total expenses seems very small to me compared to web sites of similar size; there are startups smaller than Wikipedia’s team that have raised more money than Wikipedia’s yearly expenses without managing to deliver anything. Wikipedia’s value to the world is currently larger than it’s expenses, IMO, and I think it’s impressive what this non-profit has done.


It’s sorta interesting how the replies to this digressed into linguistics about systems architecture, and nobody called out the “elite engineers” statement.

I’ve worked with a couple of engineers who are now on Wikipedia’s SRE team. They’re good engineers, but not elite by any means. Not “10x” developers or wizards in castles or whatever. Good solid engineers who I would work with again and fight to hire. But they’re not savants or even the top 10% of folks I’ve worked with. Solid mid to sr level engineers I’d be happy to hand a project off to with ambiguous goals and little oversight, and I’d expect them to get a team of 4 or so other engineers to be more productive.

These are the engineers who meet the job requirements for SRE positions.


I used to work at FB and now work at Reddit. The engineering staff count at Reddit is within the same order of magnitude as the number you cite above. :)

Yeah, but unlike on Reddit, I never see "something went wrong" on Wikipedia.

No offense to you nor your team, but to me, as a consumer, reddit's product doesn't appear nowhere near as polished as Wikimedia's projects.


No offense taken, I don’t work on the product side of things there.

Also, with the caveat that I don’t know enough about the implementation details of the product at Reddit: I’d argue that Reddit’s workload is more write heavy that Wikipedia’s workload, which makes caching and scaling a bit harder for Reddit, relatively speaking.


That could be true, but do you have some numbers? Wikimedia wikis are in the order of a few hundreds edits per minute, around a thousand and sometimes more. https://tools.wmflabs.org/wmcounter/ https://wikipulse.herokuapp.com/

To add the Reddit figures to the discussion. There are something like ~2,100-2,200 comments posted to Reddit per minute on average across a year at this point (around or slightly over three million comments per day). That's not the peak minute figure of course, which is no doubt several times higher.

So it's actually comparable I'd say, in terms of frequency. In terms of how much "writing" this actually means, it varies: an edit to a large article, even if it's just a comma, requires several seconds of parsing to produce the wikitext, a lot of events propagated in various places etc.

OTOH, shouldn't it be easier to scale Reddit horizontally by sharding the subreddits into separate DBs?

Certainly not trivially - the subreddits are not wholly independent entities. User accounts are shared between them, for instance, and every post a user makes is (searchably) linked to their account, regardless of which subreddit they posted it in. Users can also send each other private messages, and this does not take place in the context of any particular subreddit.

Not to mention that a huge portion of reddit traffic goes to aggregation pages like users' frontpages, /r/all, and too a lesser extent multireddits.

Wasn't Instagram famous for having a very small team of engineers responsible for the availability of the entire platform before Facebook acquired them?

Not sure how big ig was on acquisition, but I always remember that story being about WhatsApp with only 35 engineers

Business Insider ran a story about them at the point of acquisition.

April 9, 2012 "Instagram was acquired by Facebook today for $1 billion in cash and stock. It only has 13 employees and a handful of investors. ... Meet 11 of the lucky employees and 9 investors behind Instagram. ... Two other employees were hired during South by Southwest last month and their information wasn't available for this story. "

https://www.businessinsider.com/instagram-employees-and-inve...


> I can't imagine what such a small team must be going through with a major DDOS - wish them well in their efforts!

Not only that. They do all this with amazing openness. Their records of incidents and deployments, who's in charge of what, rotation schedules are all public and shared in MediaWiki (although they're not that well organized). I can trace this back to circa 2005. Maybe this could be the largest knowledge base of devops that is public.

cf. https://wikitech.wikimedia.org/wiki/Category:Incident_docume... https://wikitech.wikimedia.org/wiki/Deployments/Archive/2019...


I was motivated to donate a small amount of BCH to WMF after reading this announcement.

Just like trying to set your local public library on fire. There are always crazies in the world.

except, mental states are perhaps less stable than physical ones, ultimately, and using a 'web site' is largely a mental model on the part of the user, while a technical model on the part of the provider. Dysfunctional mental drivers + lots of access + lots of time .. versus a door that locks each night and an alert attendant or three..

This is dismaying but not shocking.. the first time I saw a newly planted tree on an otherwise bleak urban block, vandalized and broken, I realized that a drive towards "better" is not to be taken for granted, and needs protection.


You can't grow a non-profit that large without angering someone. There will always be folks who believe they are more deserving of the "free money."

Some people just like to break things. It's that simple.

That doesn't appear to be the case here. This was done for the attention, not just for the thrill of breaking things.

Probably proving their skills to potential clients.

There was a string of arson attacks on little free libraries in Metro Vancouver; eventually a pair of teenage boys were arrested.

I suspect that the sharing of knowledge and encouragement of developing wisdom is, to some, a threatening prospect. Perhaps they have experienced learning difficulties and are struggling with shame and frustration, or perhaps they disagree strongly with the concept of an intellectually liberated population. Libraries are, after all, a pillar of liberalism.


That's sad to hear, I always love coming across the little free libraries. I would guess they were just bored and angry teens, likely not making any deeper statement but just expressing their anger and frustration and willingness to break the rules. Also, it's fun to watch things burn, and they come with built-in kindling. Hopefully some judge will make them rebuild what they burned, that would be fair and give them more appreciation for the work of others that they destroyed thoughtlessly.

More likely just trying to have fun. I'm sure they would do the same to local schools.

> or perhaps they disagree strongly with the concept of an intellectually liberated population. Libraries are, after all, a pillar of liberalism.

Thoughtful comment. I would agree with you if the attack is somehow organized at 4chan /b/, Kiwi Farms, or some underground IRC for mysterious or unexplained reasons. If it happens, its philosophical implications would be deep. And I won't surprise if it occurs one day.

But so far there's no evidence to suggest the attack has any ideological motivation beyond making the attacker famous.


> Just like trying to set your local public library on fire.

No, more like superglueing the doors shut for a few hours.

Don't really see the damage, especially since they moved onto Twitch and WOW servers, I'd say more work and a few early nights sleep has got done in the end.


There’s lots of dirty politics on WP. Not sure about other countries, but Russian government is very involved in whitewashing it’s activities.

How interesting. Could you post some evidence? Thanks!

Extraordinary claims require extraordinary evidence, so by that symmetry this shouldn’t require much evidence.

if something is repeated a lot, it doesn't make it true. Though it does make it more believable to common folk according to Goebbels https://www.azquotes.com/author/5626-Joseph_Goebbels

Russian fake news is widespread and easy to find. How can this be controversial, and even downvoted on HN of all places? Russian military shot down a plane full of politicians over Ukraine, and boldly lied about it in the international press. This is just one of literally hundreds of such instances. Russia has been caught numerous times trying to sway elections across the world. How on earth is “Russia whitewashes its name on wikipedia” needing substantiation? It seems obvious, even.

plane full of civilians you mean? plane full of politicians crashed over russian territory in 2010.

Also, Napoleon. Repetition is really effective.

Apparently this group is behind it. Also attacked WoW and twitch servers..

https://twitter.com/ukdrillas


Part of the liability should be shared with the people owning the compromised machines these crazies are using for their attacks, otherwise attacks like these will never stop as long as enough free “ammunition” is being left around by incompetent people who can’t be bothered to secure & monitor their systems properly.

Edit: in reply to some of the (valid) counter-arguments, I'd like to say that there are indeed many issues that will need to be considered before passing such a law - this is just an overall idea. In addition, my intent isn't to punish the occasional kid doing something stupid and leaving a misconfigured device, it's to punish companies selling/deploying obviously insecure devices at a large scale, like ISPs deploying cheap shitty outdated network hardware or the countless resellers white-labelling insecure network cameras. Currently there is no penalty for manufacturing insecure hardware and this situation is the consequence of that - I'd like to fix this problem. We have regulations that (mostly successfully) prevent companies from selling hardware that blows up and destroys your house, why can't we have the same for networked hardware?


I don't see why you'd come up with something that so misaligns the interests of everyone except lawyers.

Instead, imagine the DDoS landscape if we had to pay a small price for bandwidth. There would be a natural disincentive to having a toaster saturating your bandwidth as part of a botnet because it would quickly show up on your bill. And something as simple as shipping an IoT product or Rasberry Pi with bad default username/password might suffer bad reviews like "1/5 stars, this product immediately raised my internet bill."

I know it's not perfect and most of us have a bad taste in our mouth from paying out the wazoo when bandwidth is priced per Gb, but it can be a fair system if priced well that fights against our botnet reality where we basically have zero insight when our networked devices our compromised.

I can also imagine better tooling provided by our ISPs in this world where they help us track down and itemize our bandwidth costs. "Honey, why is SmartToaster89 costing us $24 in network fees?"

It's impressive how poorly our current system equips everyone except malicious actors. How many ISPs don't even filter spoofed outbound packets?

It's hard to complain about everyone centralizing around Cloudflare with the state of cheap DDoS muscle.


I don't see why you'd come up with something that so misaligns the interests of everyone except Comcast :)

We don't need to be priced by the bandwidth, we just need better accessibility to metering. Something my mother could look at and say "huh, the toaster's sent 8gb of data today..."


If you’re not charged for it and it’s not enough for you to feel reduced performance on your other devices, why would you even both looking?

It would consume more electricity, though let's assume by a marginal amount.

I suspect many people are uncomfortable holding a compromised device like that. The unpredictability of a toaster helping to take down Wikipedia is wild and potentially seen as a sign of chaos, especially for less technical users.

Who knows what else this crazy toaster will do next? Will it do the same thing again?


Can't wait to tell Gran she's legally liable for a DDoS because her unsecured IOT washing machine best buy sold her caused the internet to cave in ;)

Skip Gran and sue Best Buy and the IoT washing machine manufacturer.

There are exactly zero big box retailers or lobbyists that will abide that.

Big box retailers seem to be able to comply with regulations mandating physical safety. Digital security requirements could be enforced by a similar system.

Because "physical safety regulations" is something that the majority understands, so it's hard to argue against that in public. With digital security, most people lack the mental models to follow the discussion, so it's really easy for lobbyists to tell them flatout lies about how those damn dems are out to take their smart lightbulbs away from them.

[flagged]


I explicitly formulated very carefully that this is not an issue of "people are dumb", but an issue of lack of understanding. I wish I could downvote your strawman.

Are you claiming that most people do understand computer security? My experience is that even many computer-savvy people (already a small fraction of overall population) are completely baffled by its intricacies.

Is liable? AFAIK strict liability only applies in specific cases, in general cases negligence applies. The claimant would have to prove that there was a breach of duty and a reasonable person would have done something to prevent the damage.

You can't expect everyone, kids and elderly included, to be able to identify when their machine is running a rootkit from the result of exploiting a 0-day, for example.

People also have a very limited view on what's happening on their phones, too. What if the rights to the source and distribution of a free closed-source app is purchased by someone that's going to modify it to include all users in their botnet? It's not like you can monitor what kind of traffic your phone apps send out.


You can't expect everyone to be able to identify when their car is not running as expected. Wait, you can and you must under the law. Also there are liabilities.

Very much not the same thing. Your car can't interact with the cars of others without physical contact, and when it does interact, via a crash or otherwise, it would be very obvious to anyone. Even if you were constantly watching the traffic of your phone and other devices, you'll probably miss malicious packets that are sent among the thousands of packets each device sends per minute. It's also not as obvious to recognize what constitutes malicious behavior in your internet device compared to your car.

Also, the operation of the car is simple enough that you can take it to a mechanic for an inspection and they can reliably inspect everything that the car does. There are no hidden behaviors under complex conditions, like crashing into others when there is a full moon or the sky is cloudy. Your devices can do that. If you bring me your phone/laptop/etc and ask me if it's going to send malicious packets to someone somewhen, I can't reliably tell you that it won't. I'm not sure that even if you gathered all software and electronics engineers that supposedly were involved in the construction of your device, they'd be able to provide a reliable answer. I can tell you that it seems like it wouldn't based on initialization files and services, but I can't tell if the function is hidden somehow, like obfuscated in the machine code of the kernel or something. Finding that would require auditing all assembly code running on the machine, which would not be a task for mortals.


I'm liable if my car spontaneously catches fire while parked. Car is just an analogy.

What I'm saying is that it's very simple to inspect a car and make sure it's not going to spontaneously catch fire while parked. You can get a reliable answer in a day from a mechanic.

You can't get a reliable answer on whether a computing device is programmed to send malicious packets. There's too much code, most is compiled, there's too many ways to hide it. You can probably gather the smartest people in the world and leave them to die of old age before they can arrive at a reliable answer.


We are in the area of probability in both cases. Oftentimes is obvious if PC/device is infected. Sometimes is really hard to find out, https://en.wikipedia.org/wiki/Stuxnet

The question is whether you feel good enough about that probability to be held liable if the malicious code was hidden good enough. Also, some cases might be obvious on Windows PCs, but I don't think that's necessarily the case with phones. Take note that websites can also send malicious packets. When you load a webpage, the code is downloaded and immediately executed. Are you OK with being held liable for visiting a webpage that decided to send malicious packets?

Liability boundary is an important question. There is no simple answer. Couple of years back owner of the abandoned building was declared liable of the death of the kid who entered to the fenced building despite all the signs no trespassing and so on. Neither your example nor mine negates that there should liability for various voluntarily and involuntarily acts, own or third party.

> Couple of years back owner of the abandoned building was declared liable of the death of the kid who entered to the fenced building despite all the signs no trespassing and so on.

Honestly, that doesn't sound right. I hope I'm misjudging because of lack of details.


In the the US the term is attractive nuisance. It's not a new thing either, case law in the US dates back to the 1870's.

I feel like you try very hard to establish a strawman. No one expects 100% perfection. Establishing and following industry standards is a good first step.

Like no unencrypted local passwords. Individual default passwords for every individual device. Not using outdated version, especially once vulnerabilities are known. Including an update mechanism and providing updates for at least X years.

And yes, trained specialists will be able to work through such checklists for many commonly used software, just like your car mechanic.

And by the way, no one expects your car mechanic to [a] be perfect (you really never heard a story of a car breaking again just after leaving the shop?) or [b] be able to handle any kind of vehicle unknown to him.

The goal of rules like that is to punish the worst tier, thereby raising the bar. But this will probably be more hard to implement in the US with their everyone-sues-everyone mindset. Reminds me a lot of the great GDPR scare but now imo quite reasonable actual cases happening.


> I feel like you try very hard to establish a strawman.

This is the second time someone's told me that. Looking into what a strawman is again and reviewing my comments, I'm not sure I'm doing that. The examples I see on Wikipedia[1], at least, don't seem to have a strong relationship of implication. That is, the strawmen aren't directly implied from the proposals.

In this case, I do think that making one liable for damages their machine is causing to other people's machines does directly mean what I said, that one would be liable for behavior they cannot control as well as they can control the behavior of their car.

My intentions are to provide not strawmen, but counterexamples where the proposal fails.

> No one expects 100% perfection.

I do. I'm not really OK with laws where I don't have reasonable control of whether I break them or not. In this case, the only effective control I'd have is to not have an internet device, and that seems unreasonable.

I think we'd all like to think otherwise, but the traffic sent by our phones is very much out of our control because of the reasons I stated, and nobody reviews the javascript code received from an HTTP server before executing it. It seems crazy to be liable for whatever it does.

> And by the way, no one expects your car mechanic to [a] be perfect (you really never heard a story of a car breaking again just after leaving the shop?) or [b] be able to handle any kind of vehicle unknown to him.

I think the analogy isn't that strong. Visiting webpages is like changing car parts every second as the car is running. Malicious behavior of these car parts is not noticeable at all and they're not easy to spot from inspection either.

> The goal of rules like that is to punish the worst tier, thereby raising the bar. But this will probably be more hard to implement in the US with their everyone-sues-everyone mindset. Reminds me a lot of the great GDPR scare but now imo quite reasonable actual cases happening.

Well, there was a lot of things that scared people of GDPR, but I think I can assume your point is that a law can be broad and technically applicable to many people unfairly, but only applied to just cases in practice. I'm not sure I like that kind of law, though. Even if it works well in practice for the majority of cases, it seems like the kind of thing that lends itself well to abuse, the kind of law that everybody is guilty for, even if they're not all actively prosecuted.

[1] https://en.wikipedia.org/wiki/Straw_man#Examples


> What I'm saying is that it's very simple to inspect a car and make sure it's not going to spontaneously catch fire while parked.

With electric cars on the rise, it's only a matter of time until the equivalent of the Samsung Galaxy Note 7, but for cars.


Brakes squeal when they are wearing down. If I put a penny in the tread of my tires and see Abe Lincoln’s head I know they are bald. If my head lamps go out I’ll notice; if it’s a brake light I get a red indicator lamp on my dash that says I have a problem.

Let’s talk about liability when home routers make a revving engine sound when they push too many packets per second, or start playing a “buckle up” warning chime every 6 seconds if they see packets heading to a C2 server.


I'm liable if water/sewage breaks in my condo and there won't be any 'squeal'. Analogies work but are not equal. My point is that there should be liability for the malfunctioning internet equipment, definitely so for businesses.

Maybe. Or maybe liability exists for outside parties in that case: the plumber who was drunk when they put the pipes in, the architect who designed the wall in such a way to force a ton of joints in one weak spot, the building inspector who signed off on it... heck, maybe the water company is causing a water hammer to form because their pumps are busted.

Now if my home owners insurance finds that I flooded the downstairs condo because I fell asleep with the bath running, you bet I’ll pay.

But no matter what, either of your examples have a robust regulatory structure around them in terms of licensing and inspections. That is why liability works - without those structures you can’t say “you fucked up, therefore you pay”.

I’m all for adding liability into the system but if we do we must do it in a way that spreads the burden to the right places (IoT manufacturers, negligent ISPs) and doesn’t push it straight to the consumer.


So same should be with IoT? Programmer 'who was drunk when they put the' code should be liable.

>> I’m all for adding liability into the system but if we do we must do it in a way that spreads the burden to the right places (IoT manufacturers, negligent ISPs) and doesn’t push it straight to the consumer.

Now, having said that, when the limb of my tree knocks the power line off my house I have to pay to fix it, but the electric company is on the hook to send someone to turn the line off so my electrician can work on it.

ISPs have to be in the liability chain too: if one of their customers is talking to a C&C server and participating in a DDoS they have to switch off the customer until repairs can be made.


Nothing stopping you from capturing packets at the router or using tcpdump on your phone. Not convenient but theoretically possible.

Gee, lemme just call up my grandma and teach her how to set up kismet and snort.

Seriously though, this is like holding some one liable if his car is stolen and used as a get-away car in a crime. It's also not really possible to get a shell on most of these devices with serious effort, so apart from turning one off, I'm not sure how any one is supposed to mitigate this. They're too locked down to do any kind of disinfection, in most cases. I guess now I have to teach granny to use a uart cable, too.


I didn't say it was easy or a solution to the problem. I was adding an additional point of context to a discussion in order to clarify the parent comment, which was insinuating impossibility of said action.

It's not easy, but it's not impossible. I could envision a software based solution that approaches simplicity in installation.

What I'm not arguing, is that someone should be held liable for their devices being used in a botnet.

I believe the manufacturers and retailers have the liability. But I am definitely not a lawyer.


> Nothing stopping you from capturing packets at the router or using tcpdump on your phone. Not convenient but theoretically possible.

You don't interact with people out of your bubble much, do you? It's time to start write better code, not blaming users for the programmer's incapability.


How many of those systems are owned by private people that has no idea what to do about it? Do you plan on suing half the planet?

If these systems are owned by private people then the company who designed it/deployed it is liable. If I have root on the device then it's my fault if I screw up, if I don't have root and it's just a plug and play appliance then whoever designed it/sold it can be liable. This solves the issue of "grandma buying an IoT washing machine" mentioned in another comment as the manufacturer of the machines can be sued directly without bothering grandma (besides a recall program and/or firmware update to patch the vulnerabilities).

You seem familiar with hardware and software hacking, but not the creativity of bad-faith legal hacking ;-)

If you pass that law on Day Zero, I claim that on Day One, manufacturers provide some horribly arcane command-line interface for rooting lightbulbs and washing machines, and add some boilerplate to their shrink-wrap licenses forcing customers to acknowledge that they have admin privileges on their devices.

Problem solved for them, Granny is liable again according to your system.


Does the license auto-root the device? If yes, then it's an obviously dishonest circumvention of the law and judges will see right through it. If not, then the manufacturer has to prove the device was rooted if they want to pass liability to someone else.

If that still doesn't solve the problem, the media will take care of it. "Buying this smart lightbulb puts you at risk of being sued for thousands of $$$" can't be good for manufacturers and they'd want to avoid the bad press.


It seems simpler and more direct for the media to say,

"Selling this insecure IoT-device/phone/router/tv that requires every consumer to become a security expert, and taking no responsibility for OTA patches and so forth, puts you at risk for paying hundreds of millions of dollars in fines and/or damages."


You'll also have to prove the IOT device DDoSing from my IP isn't a rogue device. I swear it's not mine.

If someone hacks my WPA password and torrents child porn from my IP I am liable (in Germany) - no need to prove it was me or my device.


I, too, wish to punish people for daring to own something that might have a zero-day.

The negative externalities of owning vulnerable devices need to come home to roost at some point, yeah.

I've chosen to own a "dumb" (read: "reliable") washing machine, and it cannot be used in such an attack. I have to endure the indignity of peeking downstairs to see if I left clothes in it, which is a cost of sorts, but it's nowhere near the cost I'd expect to bear if I bought a vulnerable washing machine and it provided resources to knock Wikipedia off the internet.

What other disincentive to putting vulnerable devices on the internet do you propose?


Are we talking about zero days here or obvious vulnerabilities known for decades but nothing gets done because there’s no cost associated with leaving it insecure? I strongly suspect the latter.

How are you supposed know what chipsets are in your smart lightbulb?

This stands out to me as a problem that mandatory liability insurance would be well suited to. Make everyone liable for damages to people and organizations harmed by illegal use of their unsecured computers. Then everyone who has an internet-connected device has to purchase liability insurance. They would be able to choose between expensive policies that impose no surveillance or restrictions, or purchase cheap policies that require running surveillance software, or submitting to regular security audits. Most individual consumers would probably end up with cheap policies that just require them to use devices that receive timely security updates from vetted manufacturers and run behind typical home internet firewall rules.

I proposed regulation as interim step in the past. Here's a reprint of it:

A combo of per-customer authentication at packet-level, DDOS monitoring, and rate limiting (or termination) of specific connection upon DDOS or malicious activity. That by itself would stop a lot of these right at the Tier 3 ISP level. Trickle those suckers down to dialup speeds with a notice telling them their computer is being used in a crime with a link to helpful ways on dealing with it (or support number).

Far as design, they could put cheap knockoff of an INFOSEC guard in their modems with CPU’s resistant to code injection. Include accelerators for networking functions and/or some DDOS detection (esp low-layer flooding) right at that device.

https://en.wikipedia.org/wiki/Guard_(information_security)

Old one from high-assurance field, albeit with medium rating, that did what I’m describing in an Ethernet, card computer:

https://web.archive.org/web/20040623100328/http://www.crypte...

Modern implementation could probably be done in a cheap clone and security-enhanced mod of this product:

https://www.cavium.com/OCTEON-II_CN68XX.html


> Part of the liability should be shared with the people owning the compromised machines

You’re voluntarily signing up to get fined when someone hacks the computer in your house, because you connected it to the internet, right?

> Currently there is no penalty for manufacturing insecure hardware

I’m glad you see some validity in the counter-points, but doubling down on this idea of punishing manufacturers for things people do with their hardware seems misguided at best.

You can’t prove any hardware is secure, if there were such penalties there would be no hardware, this is a total and complete non-starter. Moreover, there are lots of other bad things you can do with hardware, this would open the door to holding manufacturers accountable for everything. Do you think Intel or Dell will accept fines for every successfully hack into machines they made?

This isn’t unlike suggesting that ISPs should be held liable for people doing illegal things on the internet, or suggesting that it should be illegal to pay ransoms. It’s hurting the wrong people, and failing to punish the people doing wrong.

> this situation is a consequence of that

That’s a purely subjective opinion that ignores multiple causes, and ignores the single most direct cause: people who wish to do bad things. It would be just as valid to blame this on a failure of the education system & social civics as to blame hardware manufacturers. Maybe we should fine teachers who have students that later do bad things?

> We have regulations that (mostly successfully) prevent companies from selling hardware that blows up and destroys your house, why can’t we have the same for networked hardware?

First, the analogy is bad because there are zero good uses for consumer bombs in houses, while there are plenty of non-harmful uses for IoT devices.

Second, because there is a market for simple hardware that can be deployed inside of secure networks, and doesn’t require a team of security experts to run. Secure hardware is more expensive to produce than simply-connected hardware.


That's such a terrible idea but it would make a fantastic satire Sci fi short story premise.

Technology that cannot be used by anyone but the snobbish tech elite because anyone else just gets sued to oblivion.


That's exactly how over-regulation starts.

Could you ELI5 how these sorts of attacks are possible and what the average Joe can do to mitigate them?

Award for worst hot take of the day goes to....

I heard he used the same email on twitch as his personal facebook and has already been identified.

How can it be that a single group can take down Wikipedia? The internet is broken.

It's a bit sad. With a bit of money and some connections, you have access to a botnet large enough to cause some serious downtime.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: