Slack is down

dang · on Jan 4, 2021

All: large threads are paginated, especially today when our server is steaming. Click More at the bottom of the thread for more comments, or like this:

https://news.ycombinator.com/item?id=25632346&p=2

https://news.ycombinator.com/item?id=25632346&p=3

(Yes, these comments are an annoying workaround. Their hidden agenda is to goad me into finishing some performance improvements we're badly in need of.)

exhaze · on Jan 4, 2021

When I was at Uber, we noticed that most incidents are directly caused by human actions that modify the state of the system. Therefore, a large "backlog" of human actions that modify the system state have a much higher chance of causing an incident.

My bet is that this incident is caused by a big release after a post-holiday "code freeze".

exhaze · on Jan 4, 2021

To elaborate a bit more on this point, you have to think about it like any complex system failure - it's almost never one thing, but rather a combination of many different factors. The factors around post NYE releases:

- high risk changes that weren't released pre-holidays get released. Depending on the company, this could mean a 1-week to 1-month delay between implementation and release. The greater that interval, the higher the divergence world of production and the world of the new feature

- lots of new hires (new year = new hiring budget). New hires are missing some tribal knowledge about the system and make a production-breaking release.

I tried to think of other reasons, but these two overwhelmingly stand out as the two biggest reasons. Would love to hear from others.

kevinmchugh · on Jan 4, 2021

If new hires tends to break production, it's not in the first business day of the calendar year. December gets really quiet for recruiting, typically, as candidates get busy with their social lives, and scheduling interviews gets harder.

January is busy for recruiting, but given a week or two of interviewing and negotiating, two weeks notice, it's probably February before new employees are starting, and they're not making big, production-damaging deploys for a week or two after that.

likpok · on Jan 4, 2021

You will also get a pause in new hires in late December for the same reason. I've certainly accepted an offer late in the year and then didn't start until the new year.

Probably not as big of a rush as the end of school year rush in summer though.

I also doubt that new people will be breaking production on day one. Even at a fast moving startup I'd expect it to take a bit to go through the onboarding paperwork, get a laptop and actually try pushing a change to production.

sudhirj · on Jan 5, 2021

I think some big company (maybe Facebook) has this rule that you had to deploy something to production on your first day. They seemed pretty confident in their processes and devops teams. A company trying to imitate that policy without doing the work necessary to make it possible would probably have outages on days when lots of new people joined :-P

spiralx · on Jan 5, 2021

Could be Facebook as I think production releases are always rolled out in phases e.g. first to 10 users, then 100, then 1000 and so on. That means there's much less chance of even the worst mistake having a serious effect.

laci37 · on Jan 5, 2021

Wow, onboarding new hires here is going good, if they can access slack, O365, LDAP, VPN and clone the repo by the end of the first day. Tho we have the initiation ritual of installing the OS to your laptop.

brundolf · on Jan 4, 2021

Sudden surge of traffic as all their users returns to work?

ciceryadam · on Jan 4, 2021

Could be, it's the perfect time overlap between US-West, US-East, and Europe.

johnmaguire · on Jan 4, 2021

Yes - I wondered if they took some servers down prior to the break as a cost saving measure, and forgot to reinstate them.

fragmede · on Jan 4, 2021

Doubtful. It's not impossible a company the size of Slack would be reliant on a specific engineer logging on in the morning before a traffic spike so the service can handle the spike in load, but that's a misuse of modern distributed cloud-based computing.

Hate on the cloud all you want, but AWS has (several flavors of) load balancers and various ways to automatically scale up and down resources (and if you're conservative, you can disable the 'down' part). If you're operating a major SaaS company like Slack and not taking advantage of them, something's gone wrong.

onefuncman · on Jan 5, 2021

It's easy to fall behind on bumping up the high watermark for your max autoscaling or for new traffic patterns to cause emergent instability. New code paths are taking unprecedented amounts of traffic all the time.

In 2021, how does one keep track of resource starvation at the process, container, os, service, pod, cluster, availability zone and region levels?

lwedel · on Jan 4, 2021

I would add here the potential scaling issue - holidays were a dry season - less meeting. So if they have some automation for scaling down to reduce cost, it may have bitten them in their arses now.

People came back to work, and most of them start around the same time (US wise at least).

Hence kids - a vital lesson for all of us - don't start the call at a full hour, give it 3-7 min to make your coworkers confused and give some time for the systems to auto-scale ;)

binaryblitz · on Jan 5, 2021

I hope this isn't the case. It's not like this is the first holiday season for slack.

adrianpike · on Jan 4, 2021

I think you're right on the first bullet, but not the second. If it was mid-Feb, then maybe, but the next FY hasn't even started yet for a ton of companies, let alone onboarding newbies to production.

darkerside · on Jan 5, 2021

People returning to work and downloading a huge backlog of messages from the past two weeks.

pgAdmin4 · on Jan 5, 2021

As an Microsoft Teams Ex-Dev, I can vouch that message retrieval after vacation puts a lot of stress on storage systems before it stabilizes :)

darkerside · on Jan 5, 2021

Yeah, makes sense. A system typically optimized for performance and real time delivery is suddenly asked to perform multiple batch retrievals in large chunks. Ouch!

nkassis · on Jan 4, 2021

I would bet it's just the influx of traffic post holiday with systems that haven't been updated in so long maybe some annoying memory leaks have crept up and gone unnoticed or some other bad state that was exacerbated by return to work day for most NA folks. Code freezes were good at identifying bugs that only show up after long periods.

Doubt anyone releasing big changes Monday morning.

exhaze · on Jan 4, 2021

I haven't worked at Slack, so I can't speak with high confidence. A traffic spike is a possible reason, but I'm willing to bet that it's not the reason:

> Doubt anyone releasing big changes Monday morning.

This is definitely an engineering best practice, and by best practice, I mean something that Uber's, I mean Slack's SRE team strongly pushed for, and got politely overruled on. After a code freeze is lifted, it's quite common for lots of promotion-eager engineers to release big changes.

agrippanux · on Jan 4, 2021

In my experience it's not promotion-eager engineers that want to push after a code freeze, it's antsy product managers. YMMV tho.

godot · on Jan 4, 2021

IMO it really doesn't have to be promotion-eager engineers or antsy product managers. I'm fairly satisfied with my role and comp and work type with where my career/life-stage is. I just did a code release first thing this morning, not because I am promotion-eager, but just because I'm picking back up where I left off, like any normal day. Granted I work at a much smaller company than Slack with orders of magnitude less traffic.

glouwbug · on Jan 4, 2021

What's there to change in Slack, though? It's arguably a messaging system, and that feature is tried and tested. That, and giphys, to be honest.

EDIT: Guys it was a joke, chill

VectorLock · on Jan 4, 2021

HN's tolerance for jokes and sarcasm is extremely low.

brlewis · on Jan 4, 2021

I'm not sure about that. I feel like I get more upvotes from sarcasm and jokes than from insight. In this instance, I think it's because when people hear something dumb said seriously in real life, they're not going to readily recognize online that it's a joke.

Cederfjard · on Jan 4, 2021

Yeah, Poe’s law applies here. That’s definitely something someone less informed might say in earnest.

mewpmewp2 · on Jan 5, 2021

Yeah, there was other thread about Uber, where similar sentiment was seriously debated there, so I didn't recognise this as sarcasm either.

Thaxll · on Jan 4, 2021

You just don't deploy something major the first day after a 2 weeks vacation, it does not makes any sense.

matsemann · on Jan 4, 2021

Why? I had a rewrite of some core logic the last day before Christmas that I didn'td deploy, as it wasn't time critical to get out and I didn't want to be disturbed during holidays. Today it was perfect to deploy, as I can watch it the whole week if needed.

johnmaguire · on Jan 4, 2021

Well, I think it probably depends on where you work. At my work, people just took 2-3 weeks of time off. It takes a moment to get your head back in the game.

hacky_engineer · on Jan 4, 2021

Yeah, I do this all the time. I don't want to be bothered on the weekend, so I push releases at the beginning of the week when possible.

kords · on Jan 5, 2021

Same, I would rather release on a Monday than a Friday.

adrianpike · on Jan 4, 2021

Everywhere I've worked often has a massive backlog of things that get released after a moratorium or extended holiday week. Those are usually the worst weeks to be oncall since things are under so much churn.

londons_explore · on Jan 5, 2021

8am in the first day is too early... But by 10am, after catching up on emails, it's totally time to start releasing stuff.

beamatronic · on Jan 4, 2021

It depends on the goal you’re trying to accomplish. Are you going for a promotion or bonus? Or instead is your goal to maximize uptime?

adambyrtek · on Jan 4, 2021

I doubt that regularly releasing breaking changes that reduce uptime is a good strategy to get a bonus or promotion.

iso1631 · on Jan 5, 2021

Assume promotion after releasing 10 changes

Releasing 1 change a year with a 100% chance of working -- no promotion for 10 years

Releasing 10 changes a year each with a 10% chance of breaking something -- 1 in 3 chance of promotion in a year, and a 2 in 3 chance of downtime

adambyrtek · on Jan 9, 2021

> Assume promotion after releasing 10 changes

Where did that assumption come from? Also are you claiming that it takes 10x more time to release a non-breaking change?

throwaway201103 · on Jan 4, 2021

Interesting, I've never worked anywhere where engineers decide when to release changes. That's a product decision, and there is a process of review and approval at both the code level and the functional/end-user-experience level that has to happen first.

Did you mean that literally? E.g. is it common at Uber that engineers can release changes to production on their own?

pcl · on Jan 5, 2021

At Cisco (Webex team), the engineers decide when to release code, and most features are enabled by configs or feature flags independently of the deploys.

The engineering team is responsible for the mess caused by a bad deploy, so it's appropriate that those engineers should also choose the timing.

Our team typically deploys between 10am and 4ish, local time, since that's when we're at our desks and ready to click through the approvals and monitor the changes as they go through our pipelines.

The feature enablement happens through an EFT / beta process, and the final timing of GA enablement is a PM decision. But features are widely used by customers ahead of that time, as part of the rollout process.

Our team usually rolls out non-feature changes to services via dynamic configuration switches, so that we can get new bits in place, and then enable new behavior without a redeploy. This also enables us to roll back the dynamic config quickly if something unexpected happens.

(We generally don't do this for net new functionality; there's lower risk in adding a new REST endpoint etc. than in changing an existing query's behavior or implementation.)

Aperocky · on Jan 4, 2021

Does Uber/Slack not release in CI/CD? At least in backend?

I don't see any need to deploy a big change at once in the software world today. At worst feature gate the thing you want to do and run it in a beta environment, but still push the actual code down the pipeline.

exhaze · on Jan 4, 2021

> run it in a beta environment

Every Uber/ex-Uber engineer is nervously chuckling at this comment right now

brie22 · on Jan 5, 2021

As the saying goes, Everyone has a Test environment. Some people are lucky enough to have that distinct from Production.

aeyes · on Jan 4, 2021

For those that don't know what this comment is about: https://eng.uber.com/multitenancy-microservice-architecture/

yjftsjthsd-h · on Jan 4, 2021

I'm actually more confused after reading that. I assumed that you meant that tested in production on purpose, but it sounds, at a skim, like they do non-prod testing environments - in fact, it looks like they've gone to having multiple beta environments of every service?

aeyes · on Jan 4, 2021

My understanding is that they have a "tenancy" variable in every service call which can take a different code path. They seem to only have one environment for everything and do tests/experiments at code level based on this variable.

yjftsjthsd-h · on Jan 4, 2021

Ah, thanks; that explains it nicely

xtracto · on Jan 4, 2021

Aaah the wonders of not having to be PCI or SOC2 compliant...

hnlmorg · on Jan 4, 2021

That might be true but when you take the global usage of Slack and their respective time zones, more than half the world would have signed into Slack this morning before SV had and I certainly didn't notice any downtime this morning in my time zone.

radicalbyte · on Jan 4, 2021

It was ropey before SV woke up, I thought it was just my (normally rock solid thanks to using Ubiquity) network having issues.

Guess it was Slack being Slack.

bobthepanda · on Jan 4, 2021

What would make that strange? Where I work it is frowned upon to do releases on weekends and so bad changes due to buildups happen on Monday.

Although, we also don’t close the pipeline for just any holiday break. In fact low holiday traffic is a good time to keep pipelines open, since changes will impact less people.

cratermoon · on Jan 4, 2021

I have definitely worked in places where the times right before and right after a change freeze were the most unstable, so that could be it. However, as others have mentioned, it's pretty early on the west coast of the US. Unless some engineer was up extra early (perhaps at the behest of an anxious project manager) it seems unlikely to be a release.

What it could be is some engineer somewhere coming in after the holiday, noticing a slightly flaky thing, and thinking, "I'll reboot/redeploy/refresh this thing so the flakiness doesn't get worse". Only it turns out the flaky thing was a signal of something else falling over. Or maybe the redeploy was the wrong version because of bad CI/CD, or maybe the person just fat-fingered it.

ikiris · on Jan 4, 2021

Most releases are automated with time lockouts.

cratermoon · on Jan 4, 2021

In what companies?

ikiris · on Jan 4, 2021

Competent ones like those you'd hear about being down on HN.

At least that how it worked at one FAANG

londons_explore · on Jan 5, 2021

It varies a lot by team... I think it's common to have a single click "start" button to press. It's a good sanity check that a release isn't going to happen during a fire drill, outage, or strike...

savo92 · on Jan 4, 2021

Or unless that engineer was not in the US

cratermoon · on Jan 4, 2021

Very possible. I don't know what Slack's workforce distribution is. In places I've worked there have definitely been some incidents in US off-hours triggered by someone on the other side of the world.

NewEntryHN · on Jan 4, 2021

Another common cause is resource exhaustion as a result of poorly monitored resources (or bugged monitoring). For example Google's authentication was down because their system reported (wrongly) available quota of 0. The last two incidents at my company were also related to resource exhaustion.

ThePadawan · on Jan 4, 2021

This is one of the original concepts why to go capital-A Agile. Make smaller releases more often, so at least if something breaks, it's (hopefully) something small, and least it's easier to trace.

(I'm not making a statement if that's good or bad or if it works or whatever. Please don't read an opinion into it.)

erik_seaberg · on Jan 4, 2021

This. If you roll many changes into a single deployment, you don’t know which change broke what. But if you have two or three weeks of commits waiting, it’s hard to do otherwise.

Cthulhu_ · on Jan 5, 2021

That's why good regression tests and CI are so important; in an ideal world (which we were close to in one of my projects), every change is pending in a pull request; the CI rebases the change on top of its upstream (e.g. master/main), simulating the state the codebase will be in once merged, and runs the full suite of tests. The build is invalidated and has to be re-run if either the branch or upstream is changed.

Now, caveats etc, this was a collection of single applications in a big microservices architecture, and as the project grows it becomes more and more difficult to manage something like this, especially if you get more pull requests in the time it takes to do a build. But it is the way to go, I think.

Anyway, since tests and CI are not definitive, you also need a gradual rollout - 1%, 5%, etc - AND you need a similar process for any infrastructure change, which gets more and more tricky as you go down to the hardware level.

alfalfasprout · on Jan 4, 2021

This is very likely a broken release. The timing lines up with pacific time too well.

kevinmchugh · on Jan 4, 2021

They declared the issue at 7:14AM PST. How long is their deploy process?

That sounds pretty early to think somebody on the west coast did something, other than maybe acknowledge the pages and declare the incident.

NewEntryHN · on Jan 4, 2021

Slack does progressive roll-outs. The broken release hypothesis seems very unlikely.

reportgunner · on Jan 5, 2021

If you think about it, modifications to state of the system caused by human actions are the sole purpose of computers.

rwc · on Jan 4, 2021

Seems to be more than that. Even slack.com in an incognito browser fails.

zwily · on Jan 4, 2021

What does an incognito browser have to do with anything?

derin · on Jan 4, 2021

An incognito browser would ignore all client-side cookies, so the Slack web client would not try to - say - resume a previous user's session or re-use any previously saved data.

Likewise, incognito mode will also ignore most cached web content, meaning all assets on the Slack web app will get loaded again from scratch. This "clean state" start could, theoretically, get around issues with old - potentially incorrect/outdated - assets being loaded, even though that really shouldn't happen under most circumstances.

zwily · on Jan 4, 2021

Sure, but why does that indicate the issue probably wasn’t related to a code push, like the person I responded to said?

johannes1234321 · on Jan 4, 2021

It means that one is not sending a session cookie of any kind, thus should be sent to a 100% cached version. No "Are you XYZ and what to log into ABC's Slack again?" box.

SQueeeeeL · on Jan 4, 2021

That means it's not a user auth error

sbilstein · on Jan 4, 2021

non-logged in user may not go through all the same codepaths as a user with cookies present.

alfiedotwtf · on Jan 5, 2021

This is why Change Management is the main tenant of principles like ITIL

thesuitonym · on Jan 4, 2021

I'd like to take this moment to mention self-hosted, open source, and federated alternatives like XMPP and Matrix.

I'd like to, but unfortunately I don't feel like I can in good faith. Matrix is woefully immature, and suffers from a lot of issues, but I think is closer to being a functional Slack/Discord alternative. XMPP is much more mature, and works very well for chat, but doesn't have a nice package that does all the Slack stuff--at least not that I'm aware of. I'd love to be proven wrong there. I know it can be done, but if it can't be deployed quickly by an already overstressed team member, what chance does it have?

MattJ100 · on Jan 4, 2021

The problem is that XMPP and Matrix are protocols, not products.

Element (the primary Matrix software) definitely has Slack and Discord in its sights.

I don't think there are any serious "self-hosted Slack-like" contenders that are XMPP-based right now. You can piece components together (yay, standards!) and I did exactly this for the IETF's XMPP deployment recently. But it's far from being a cohesive easy-to-deploy product. Simply because nobody is building that right now. It takes time and resources and there's no money in it.[1]

People who do set out to build Slack clones (projects like Mattermost and Rocket Chat) and earn money don't have features such as federation on their priority list and don't build on top of Matrix/XMPP. They roll their own custom protocols and as far as I can see they are fairly content with that decision.

[1] There's even less money it, but nevertheless I am currently working on such a self-hostable "package" for XMPP. However rather than focusing on the team chat use-case (Slack/etc.) I'm focusing on personal messaging (WhatsApp/etc.): https://snikket.org/ if you're interested. It's possible I will broaden the scope one day.

EDIT: typo

networkimprov · on Jan 4, 2021

It's largely overlooked that the success of Slack & MS Teams is partly due to the cybercrime portal that email has become. IOW, you don't get phished in your org's Slack chats. To prevent phishing, any chat service will suffice; an open protocol isn't necessary, as you don't intend to engage with ppl outside your org.

The essential problem IMO is how to replace SMTP. No one has proposed and implemented an alternative, to my knowledge. So I decided to[1]. The current draft omits federation (although I wouldn't rule it out in all cases yet).

[1] https://github.com/networkimprov/mnm/blob/master/Protocol.md

dathinab · on Jan 4, 2021

No, EMail has fundamentally bad UX for a lot of use case slack and similar are used for.

> problem IMO is how to replace SMTP.

Sadly SMTP is probably one of the parts of Mail which have aged best. Enforcing the usage of some (currently by design optional) features wrt. authentication and similar at the cost of backwards compatibility and you have all you need from the delivery protocol.

BUT:

- IMAP and similar is much worse.

- Mail bodies are a big mess it's always fascinating for me that mail interoperability works at all in practice (again you can clean it up a lot, theoretically, but backwards compatibility would be gone).

- DMARC, DKIM and SPIF which handle mail authenticity have a lot of rough corners and again for backward compatibility are optional. Again it's not to hard to improve on but would brake backwards compatibility.

The main reason mail still matters is because it's backwards compatibility, not just with older software but also with new software still using old patterns because of the (relative to the gain) insane amount of work you need to put into all kinds of mail related components. But then exactly that backwards compatibility is what.

(Yes, I have read the "Why TMTP?" link and I have written software for many parts around mail including SMTP, and mail encoding. The idea that SMTP is at the root of the problem seems to me very strange. Especially given that like I mentioned literally every other part of mail is worse then SMTP by multiple degrees...)

EDIT: Just to prevent misunderstandings one core feature of mail is the separation of mail delivery and mail authenticity, in the sense that you don't need the mailman to prove the authenticity of a mail. At most the legal/correct/authentic delivery.

ska · on Jan 4, 2021

> No, EMail has fundamentally bad UX for a lot of use case slack and similar are used for.

The opposite is also true.

networkimprov · on Jan 4, 2021

By "replace SMTP" I mean the whole email protocol stack, not only SMTP. I'm not proposing to replace it for all situations overnight; of course SMTP etc will be used for decades.

TMTP also covers most IMAP/POP use cases. And it allows short, plain-text messages (see Ping) to make first contact with others -- necessary when that server has less restrictive membership requirements.

Authenticity is a double-edged sword. For certain confidential content, you want the recipient to know that it originated with the sender, but you don't want anyone else to know that in the event the content is leaked or stolen.

I believe the extinction of email for person-to-person & app-to-person correspondence is a foregone conclusion, due principally to phishing. The question is what should we do now, and the answer is clearly not chatrooms (which are of course useful in certain circumstances).

layer8 · on Jan 4, 2021

Email is not a chat system, and chat systems are unsuitable for asynchronous long-form threadful discussions. There is some overlap, but combined they form a spectrum of communication modes so wide that it can‘t be covered by a single UI.

u801e · on Jan 4, 2021

I would argue that email is not suitable for asynchronous long-form threadful discussions. The limitation that email has is that if you're not part of that conversation from the beginning, you'll have to piece it together from previous quoted material.

One email like protocol that properly handles this is NNTP.

layer8 · on Jan 5, 2021

True regarding the late-comer aspect, although it is less of an issue when using mailing lists with an archive. In the past, when lacking an archive I also just asked another participant to send me the earlier discussion in mbox format, which was easily accomplished with the unix MUAs of the time.

Regarding the actual modes of discussion I was thinking of though, usenet and email are mostly the same.

u801e · on Jan 5, 2021

> I was thinking of though, usenet and email are mostly the same.

For the most part, they are and many readers support both protocols (or at least they did in the past). The nice thing about NNTP is that it doesn't require maintaining a separate archive or having someone send you an mbox file to import. Just subscribing to the appropriate groups was sufficient (depending on the article retention policy).

networkimprov · on Jan 5, 2021

I agree with both of you, and TMTP supports adding people to a thread after it starts (see PostNotify).

u801e · on Jan 5, 2021

I'm not finding much about TMTP or postnotify with a search through Google. Could you link to some resources?

networkimprov · on Jan 5, 2021

I've only just begun publicizing it, after getting the client & server implementations to a point where folks can evaluate them.

Protocol: https://github.com/networkimprov/mnm/blob/master/Protocol.md

Why TMTP? https://mnmnotmail.org/rationale.html

Follow: https://twitter.com/mnmnotmail

megous · on Jan 4, 2021

Why would you replace it? Will not disabling all public un-authenticated submissions on your mail server suffice? You can also prevent delivery to outside world (and error out on submission so that users are notified) if you really like. Result will be your own private mail server.

And you can keep using all the normal MUA's on desktop and mobile.

networkimprov · on Jan 4, 2021

Changing your SMTP server configuration that way would break things, so the question is whether to set up a new, company-internal SMTP server, and give your employees new addresses there. But that won't quickly stop the phishing, because your ppl still need to get email via the public network from clients and suppliers.

Setting up a new server isn't easy unless you hire an outside service provider, and if you're willing to do that, Slack et al offer a nicer UX than the well known email/webmail clients.

Orgs with sufficient IT resources commonly do run internal SMTP servers.

megous · on Jan 4, 2021

I meant that as a suggestion compared to designing a new protocol.

throwaway201103 · on Jan 4, 2021

Yes I'm old enough to remember when organizations had email but it was internal-only. Probably less for security reasons at the time than that they simply didn't have an internet provider. There were also mainframe-based email systems that were internal to that network.

thesuitonym · on Jan 5, 2021

You're making some fundamental assumptions about federation that I think are completely wrong. Are you telling me that you never need to communicate with anyone outside of your organization? How do you intend to receive invoices? How will you communicate with outside vendors? Sorry, but you need some text-based way of communicating with people and email is the best way, that's why it's survived so long despite being problematic. If you have internal, asynchronous chat, why would you need internal email?

Sorry my dude, but business runs on email. Saying lets get rid of it is as naïve as saying lets get rid of Excel. It's just not going to happen.

networkimprov · on Jan 5, 2021

There are two ways to communicate with ppl outside your org without federation (this is covered on the website):

1) Set up a second TMTP service where customers and/or suppliers can create accounts, along with employees who need to interact with them.

2) Have some employees join a third-party service which is open to all involved in your field. There is a risk of phishing in this case, but:

. a) anyone you haven't previously contacted is limited to short, plain-text communications to you (see Ping in protocol), and

. b) such third-party services would typically charge a fee to members and impose a small cost per-ping, and

. c) you know that you're dealing with unknown entities with possibly malicious intent.

TMTP clients support active logins to accounts on any number of TMTP servers, just as browsers support multiple active connections to websites.

u801e · on Jan 4, 2021

> To prevent phishing, any chat service will suffice; an open protocol isn't necessary, as you don't intend to engage with ppl outside your org.

The same could be accomplished with email if you only allow connections to the SMTP and IMAP server from within the corporate network. That is, nothing external can connect to those servers, which is fine if it's only used for internal communication.

baudehlo · on Jan 5, 2021

I used to work in anti spam and we would call these FUSSPs.

https://www.rhyolite.com/anti-spam/you-might-be.html

networkimprov · on Jan 5, 2021

TL;DR: Thousands of brilliant minds have tried to fix email for decades, and realized it can't be done!

Ahem, I'm trying to bury email, not save it -- not unlike Slack :-)

zenexer · on Jan 4, 2021

XMPP is supported by a large number of clients, but running a server and getting everyone on clients with comparable featuresets is a nightmare. It’s a cluster of disparate standards, and it’s overwhelming. I’m sure it’s doable if you have the time to invest, but it’s not straightforward if you’ve never done it before.

Matrix is pretty straightforward on the server side of things, but the client UX is invariably mediocre. Vector—the official client—exemplifies everything that is wrong with Electron apps. Slow, clunky, poor UI, poor platform integration. With the default home server, it can take seconds for a message to go through. At least it’s far more customizable than Slack; it has an option for everything, which, as a power user, I quite like.

I haven’t tried Mattermost, but it looks like some of the important features aren’t FOSS, at which point it’s just another Slack as far as I’m concerned. I’ll gladly pay for support, but for SSO? Meh, might as well stick with Slack; at least everyone and their dog knows how to use it. (This is, of course, an opinion that stems partially from ignorance; I haven’t actually tried Mattermost, and if I do, I might fall in love with it. But my time is limited, and I can only evaluate so many products in a day.)

Not that Slack is much better here: their threading system has so many UI/UX issues. Ever had a thread with hundreds of messages? For your own sanity, I hope you haven’t. Ever tried to send an image to a thread from iOS? It’s possible, but only by pasting the image into the text field; the normal attachment button isn’t available, and Share buttons in other apps can’t send to threads. And, of course, the recent uptime issues.

Arathorn · on Jan 4, 2021

Element (formerly Riot/Vector), has improved loads over the years, and the default matrix.org average send time is around 100ms these days rather than multiple seconds: https://matrix.org/blog/2020/11/03/how-we-fixed-synapses-sca... has details. I suspect you (and the parent) may be running off stale data.

That said, Element could certainly use less RAM, irrespective of Electron - and http://hydrogen.element.io is our project to experiment with minimum-footprint Matrix clients (it uses ~100x less RAM than Element).

rattray · on Jan 4, 2021

> it uses ~100x less RAM than Element

Wow – congrats!!

What have been the most important architectural decisions to achieve this?

Arathorn · on Jan 4, 2021

Rather than storing state from the server in the JS heap, new state gets stored immediately in indexeddb transactionally and is pulled out strictly on demand. So, my account (which is admittedly large, with around 3000 rooms and 350K users visible) uses 1.4GB of JS heap on Element/Web, and 14MB on Hydrogen. It's also lightning fast, as you might expect given it's not having to wade around shuffling gigabytes of javascript heap around the place.

__sisyphus__ · on Jan 5, 2021

I've wanted to try something like this (on a smaller scale), but haven't had time. It's good to hear of an implementation that reflects my expectations. How long did it take you to migrate over?

Arathorn · on Jan 5, 2021

it’s a entirely new codebase; probably best way to visualise progress is to look at the contributor graphs at https://github.com/vector-im/hydrogen-web

__sisyphus__ · on Jan 5, 2021

Cool, thanks for sharing!

zenexer · on Jan 4, 2021

It has, and I’ve been using it since its early days. I still use it. It’s still terrible, just slightly less terrible. And, no, messages don’t consistently send in 100ms on the default home server; there are regularly disruptions that cause significant delays, sometimes as much as 10-20sec. That’s a big problem for a federated chat platform.

Edit 1: I want to love it; the design is everything I could ever hope for in a chat platform. I even tried to contribute to Vector, but it was such a mess that I eventually gave up.

Edit 2:

> That said, Element could certainly use less RAM, irrespective of Electron - and http://hydrogen.element.io is our project to experiment with minimum-footprint Matrix clients (it uses ~100x less RAM than Element).

I'm not sure why this is a priority. Techies complain about RAM usage a lot, but if we have to choose between performance+power and a small memory footprint, we're going to choose the former almost every time. Take Telegram, for example: they have a bunch of native clients that perform amazingly well, although they do gobble RAM. Most of my technical friends use it as their primary social platform. It's not without issues, but it's really hard to go from something like Telegram Desktop or the Swift-based macOS Telegram client to Vector. And those clients aren't made by large teams--most (all?) first-party Telegram clients are each maintained by a single developer, if I'm not mistaken.

feanaro · on Jan 4, 2021

It's weird that you're calling it Vector when it's now called Element and it was called Riot for years before that.

zenexer · on Jan 5, 2021

The constant rebranding and confusion over Matrix/Vector/Riot/Element is another point of pain for me. It’s incredibly difficult to communicate unambiguously about Matrix with people who haven’t been following it for years.

Does Element refer to the ecosystem as a whole, including EMS? The primary client? The core federation? It’s not obvious from a casual visit to element.io. I suppose if I said “Element web app,” that would be fairly clear, but I’m still in the habit of saying “Vector” from the days of Riot.

feanaro · on Jan 5, 2021

Everything related to the company formerly known as New Vector is now called Element. The company is Element, the official clients are called Element (with suffixes Web, Desktop, Android, iOS) and yes, EMS is Element Matrix Services. This rebranding was done specifically due to the confusion brought on by the many previous names. More info here: https://element.io/previously-riot

The protocol is still called Matrix.

zenexer · on Jan 5, 2021

Yes, I actually like the change—I think they finally got it right this time. (The Riot rebranding was a mess.) However, it’s still frustrating when trying to communicate with people who aren’t following Matrix-related news. In my circle of friends, “Vector” remains more widely understood than “Element Web,” so that’s what I’ve been using.

Anyway, my point stands: Element Web/Desktop feels fairly unresponsive compared to something like Telegram Desktop. It looks so much nicer now, the UI layout is great, and it’s far more powerful than Telegram—yet, I can’t help but feel like I’m swimming through molasses even when dealing with moderately-sized groups. Try clicking around on different groups rapidly; you’ll likely find that you have to wait several seconds for the UI to update.

upofadown · on Jan 4, 2021

>XMPP is supported by a large number of clients, but running a server and getting everyone on clients with comparable featuresets is a nightmare.

XMPP is, well, extensible. If things don't match in the clients then that particular feature just doesn't work. These days all the clients pretty much try to match the feature set of Conversations. That applies to the servers as well. There is a server tester for that:

* https://compliance.conversations.im/

So the tooling is pretty good these days.

The great thing about XMPP is that basic messaging always works. That stuff is just too simple not to.

BlueTemplar · on Jan 4, 2021

I recently finally found someone to try Element with and the experience has been great so far. (Except for the need to go through Google's captcha at some point.)

It even sent a 20 Mo MP4 like a champ, while Conversations sometimes chokes on not that high resolution photographs...

Unklejoe · on Jan 4, 2021

I can only offer my own personal experience: Matrix has been working well for me for a couple years now. However, I probably have a more narrow use case than you're thinking of.

I run a small homeserver and use it to communicate with a group of about 20 friends. Most of them aren't "technical" people. We use it mostly for chatting and image/video sharing. We never use live calling (audio or video).

There have been a few bugs in the mobile apps, but for the most part, everything has been working fine.

The biggest issue is the UX. It's not as polished as the big players.

thesuitonym · on Jan 4, 2021

This is actually the use case I've been trying to get to for some time. Unfortunately, I need it to "just work" to get my non-techy friends interested, otherwise they'll go right back to Discord.

Like I said, it's close, I just don't think it's there yet.

Unklejoe · on Jan 4, 2021

I'd say it's almost in "just works" territory for everyone except the person who has to actually administer the homeserver (me). I absorb a lot of the complexity for my friends.

The only thing that's a little cumbersome is requiring them to enter a custom server URL when the register/log in for the first time.

will_pseudonym · on Jan 5, 2021

For competing with Discord, it seems like it would benefit from a more robust free offering to compete with Discord. Being able to create a free Discord server is great, and it is incredibly capable for most communities that don't need the fancy perks of Discord Nitro etc.

> The free Discord plan provides virtually all the core functionality of the platform with very few limitations. Free users get unlimited message history, screen sharing, unlimited server storage, up to eight users in a video call, and as many as 5,000 concurrent (i.e., online at the same time) users.

For a lot of small communities that aren't focused around commerce of any kind, Discord's free offering blows Element Matrix Services out the water. It's a non starter. If I could create a server with feature parity to Discord's free server, any new community I'd create I would definitely jump on EMS in a heartbeat, and I'd start trying to recreate communities currently within Discord, to be on EMS.

So like a very normal progression for Discord servers is that some niche sub-community wants to gather, and so they create a free server, and people join and there's all kinds of rich content that gets posted and curated and great discussions and then as it gets bigger, people running the community or people who want to support the community will boost the server with Discord Nitro for additional features like more slots for custom emojis (I can't communicate enough how important of a feature this is to Discord's success, even though it seems like minor window dressing).

That kind of model is what would justify a server starting to shell out money every month for EMS. I would note that Discord's pricing for this kind of level of community is tiered and not a per-user thing. You unlock more features based on how many users are paying for Nitro, going up a tier based on breakpoints of 2/15/30 Nitro Boosts per month. It doesn't cost more to have a tier 3 server if you gain more users. This is a big deal for fostering growth and unseating incumbent social networks (which is what Discord and Slack are).

Just some thoughts. I really want stuff like Element/Matrix to succeed!

etherealG · on Jan 4, 2021

Actually that’s the same with slack, each slack server has a unique url too.

JohnBooty · on Jan 5, 2021

    and use it to communicate with a group of about 20
    friends. Most of them aren't "technical" people.

I'm insanely curious about the human side of things here. How did you get them to buy into this idea in the first place? That sounds like quite an achievement.

The non-technical folks in my life generally struggle with paths of least resistance (iMessage, etc) and it's hard to imagine getting them onto some alternative platform/protocol.

Unklejoe · on Jan 5, 2021

It did take some persuading. I think the main reason I was able to pull it off was ironically because they're not that technical. I bet most of my friends don't even know what Slack or Discord is. That's not to say they're dumb or anything - they just don't spend as much time online as one would think.

Previously, we were mostly using group texts or Snapchat/Instagram to communicate, so the biggest selling point was the fact that we can share full quality pictures and videos between iOS and Android people.

decrypt · on Jan 5, 2021

This is awesome. I have always wanted to self-host a Matrix instance as well, but I imagine it's going to be very hard to convince them to move over, from Telegram. Is there a blog post that I can read about homeserver setup? I am keen on seeing how easy it was, and keen on seeing what level of technical and financial resources you had to invest to get going.

JohnBooty · on Jan 5, 2021

That's interesting. Thank you for sharing that!

thesuitonym · on Jan 5, 2021

For my part, I don't have buy in yet (Because I'm not convinced Matrix is ready) but I think I could get it. I have 7 or 8 friends who do not use Discord except to talk with me and a few other friends that I know can be convinced to at least start using Element next to Discord. Once I feel like my homeserver is in a state that I can invite these non-technical people in, I'll be in the same place.

projectileboy · on Jan 4, 2021

You bring up a good point, however, which is that we _could_ use open source, non-centralized alternatives for many of the online products we consume, but we choose not to, and so we increasingly become slaves to corporations that actively seek to narrow our choices. Another example of this is the push from big sites like Reddit to use their apps rather than just use a browser - it’s not about functionality, it’s about destroying the free and open web.

thewebcount · on Jan 4, 2021

> You bring up a good point, however, which is that we _could_ use open source, non-centralized alternatives for many of the online products we consume, but we choose not to, and so we increasingly become slaves to corporations that actively seek to narrow our choices.

That doesn't happen for no reason. The vast majority of open source products I've used have terrible usability. I simply don't want to use them. I don't want to be beholden to corporations and walled gardens, but for me, the existing alternatives are worse in too many ways.

ende · on Jan 4, 2021

Or, or... and bear with me here... or, packaged click-button solutions with paid (contractually obligated) dedicated product support is a better use of our short time, more often than not.

megous · on Jan 4, 2021

That only works if you only need to use Slack alone or whatever. The moment you have to use more of these annoying services at once and manage N different stupid client apps for Y different platforms (desktop/mobile), the lack of open/shared protocol becomes a major issue. Let alone if you want to use them on emerging mobile OSes that are not a hellhole of data thievery.

nix23 · on Jan 4, 2021

>support is a better use of our short time, more often than not

Not when it's down.

thewebcount · on Jan 4, 2021

Which open source solutions never go down?

megous · on Jan 4, 2021

Everything goes down. But it looks like huge complicated distributed services shared by huge amounts of people, that are continuously updated and developed, and are constantly trying to attract more users/load, seem to go down more than a simple service on a simple server.

No hard data though. My mail server only ever went down when I upgraded the server and didn't check that everything was still working right away, or similar maintenance induced incidents. It never went down by itself.

Such systems only ever go down unpredictably on HW issues, or when overloaded/out of resources. Neither is very likely, because you're not trying to grow your service in any sense similar to VC backed enterprises. Most of the time it has constant very low load and resource use. And you can simply stop introducing changes to the system if you need more stability for some time. (stop updating, for example)

nix23 · on Jan 4, 2021

The one solution with PLANED downtime.

spicyramen · on Jan 4, 2021

Facebook and other vendors killed XMPP, we lived in a non federated world in Enterprise and consumer. No interest of companies to change this

josephg · on Jan 4, 2021

XMPP killed XMPP. Its just not very good. It doesn't work well between different clients and servers. The protocol is a horribly overcomplicated mess of overlapping, partially supported extensions for basic functionality. And it doesn't work at all with low power mobile delivery. (It was invented before the iphone.)

There might have been political reasons why google dropped XMPP, but it would also make sense as a purely technical decision.

MattJ100 · on Jan 4, 2021

> And it doesn't work at all with low power mobile delivery. (It was invented before the iphone.)

This is plain untrue. Yes it was invented a long time ago, but thanks to the extensibility it has evolved over time just as the way people use it has changed. This evolution is a healthy and necessary part of an open ecosystem.

I know it frustrates people that modern features don't work in stagnated clients such as Pidgin and Adium, but modern clients support all the things you would expect.

Servers and mobile clients have supported mobile-friendly traffic and connection optimisations for many many years now.

> There might have been political reasons why google dropped XMPP, but it would also make sense as a purely technical decision.

Google contributed extensions to XMPP, the same way they contribute to other internet standards. I think they were quite comfortable with this. The XMPP-based Google Talk was their longest-running messaging solution after all...

BlueTemplar · on Jan 4, 2021

> And it doesn't work at all with low power mobile delivery.

What makes you think so? If Conversations was draining my battery, I would have noticed by now, I'm pretty sure that Facebook Messenger is worse in this aspect...

josephg · on Jan 5, 2021

Maybe things have changed - certainly when I looked at it a few years ago (around the time that google stopped supporting it) my understanding was that xmpp had no push notification support. The app in the phone had to either poll or explicitly hold open a TCP connection. (Which is problematic when the app is backgrounded.)

Has this been fixed in XMPP?

MattJ100 · on Jan 5, 2021

Yes, XMPP has had push notification support for years. It's the only way can work on mobile these days.

perryizgr8 · on Jan 5, 2021

> It was invented before the iphone

That's true for email also. So that's not an evidence for anything lacking in xmpp. The plain fact is that Google killed it due to their greed.

BlueTemplar · on Jan 4, 2021

I was recently forced to use Facebook Messenger (thanks God it's soon over), and I'm hating it : it's slow on mobile, even worse on PC, where it regularly makes my whole OS hang requiring a reboot.

Scrolling back is atrociously slow, and it doesn't even seem to have a search feature !

I'd take XMPP alternatives like Conversations, Jitsi, Pidgin any day ! (And Element of course.)

upofadown · on Jan 5, 2021

XMPP is hardly killed. There are tens of thousands of XMPP servers out there with over a hundred public servers. There are lots of client implementations. Even the really bad implementations manage basic messaging.

lallysingh · on Jan 4, 2021

Spam killed xmpp

spicyramen · on Jan 4, 2021

I remember using clients like pidgin with all my accounts it was a great experience. Now I need to have like 100 apps

pmlnr · on Jan 4, 2021

BS

The only thing "killed" XMPP was that proprietary made money, XMPP didn't.

Apart from that, it's alive and well. See Conversations for android, Prosody for server.

BlueTemplar · on Jan 4, 2021

What are you talking about ?? You need to confirm each other before communicating on XMPP, spam can't get through that !

pachico · on Jan 4, 2021

I can hardly see those as alternatives to Slack. Maybe https://mattermost.com/ is what you were thinking about?

thesuitonym · on Jan 4, 2021

Matrix with Element (Riot) as the front-end is pretty close. It does what slack does, it's just not very good. XMPP is arguable. It can be a Slack alternative, if you stitch enough other servers on top of it. Personally, I don't think XMPP will ever be more than chat, but some of its adherents believe differently.

Mattermost is certainly not what I meant. That's just trading one Slack for another.

PhilippGille · on Jan 4, 2021

This thread is about a Slack outage, which you have no control over. Mattermost and similar software is self-hosted, which of course doesn't mean you're getting 100% uptime, but you have (more) control over it.

perryizgr8 · on Jan 5, 2021

In practice self hosted usually translates to more downtime and slower performance when it works. Unless your org has more expertise running a chat service than Microsoft or slack, your self hosted alternative is always going to suck more.

XCSme · on Jan 5, 2021

Did you try self-hosting and it lead to more downtime and slower performance?

From my experience, when I self-host stuff it's a lot faster (more server resources) and never had any downtime (server doesn't simply go down for no reason).

perryizgr8 · on Jan 5, 2021

I haven't tried it personally. But my employer hosts an on-prem Github instance and it is just terrible. So many downtimes, long times before anybody gets around to repair it, general performance issues, maintenance windows for upgrades, etc. Just a huge pain. I've seen this sort of problem with the old Exchange on-prem services too.

welterde · on Jan 4, 2021

IRC may be out these days, but at least deploying a small IRC server for the own team is really not that much effort anymore and doesn't incur that much ongoing maintenance work either.

pas · on Jan 4, 2021

RocketChat works pretty well for simple team comms. I have no idea if it can do XMPP and/or Matrix.

rexreed · on Jan 4, 2021

I suggested RocketChat when the outage was announced and HN community downvoted it quite heavily. I'm not sure why. [0]

We ended making the switch and committed to Discord. We're now looking at Rocket.chat as a backup in case Discord goes down. But Slack is now completely out of the picture for our team.

[0] https://news.ycombinator.com/item?id=25633047

win_ini · on Jan 5, 2021

Just curious - why not use Matternost as a backup? (disclosure: I work at Mattermost, but really just want to know what you think)

I’ve advocated for an idea where Mattermost is to be used as a “bunker” where it is hosted on a raspberry Pi (or somewhere else) and acts as a digital bunker if your critical infrastructure (slack, teams, exchange?) is compromised somehow.

pas · on Jan 5, 2021

Not OP. Good idea. I thought it's integrated into GitLab (on premise omnibus), but I still haven't fiddled with it, but enabled something in the config file, but nothing happened.

I know it's a tough spot, but if it were usable from GitLab with zero config that would be great for fallback.

jasonblais · on Jan 5, 2021

@pas: Mattermost is indeed integrated with GitLab Omnibus.

To enable Mattermost, you can add the Mattermost external URL in the config file, and run `sudo gitlab-ctl reconfigure`. I'm wondering if that's something you've tried? https://docs.gitlab.com/omnibus/gitlab-mattermost/#getting-s...

pas · on Jan 5, 2021

Thanks for your reply! I gave it another try, and it works now beautifully :o (Possibly last time I tried it, there was no LetsEncrypt integration and the external URL setup was more involved?)

rexreed · on Jan 5, 2021

Interesting! I never looked into it, but now I will!

perlgeek · on Jan 5, 2021

My experience with RocketChat is that it works quite well on the surface, but after using it for some time, some very annoying bugs emerge:

* You get notifications for channels for which you have suppressed those notifications

* Some channels are marked as having new notifications, when they haven't

* Notifications for new messages in threads you are involved in are quite hard to find (horrible UX)

* Some UX choices are very confusing (you get a column of options related to notifications, and for some, the left option is the one leading to more notifications, for some the right option)

* There are some overlapping features that lead to inconsistent usage (channels vs. discussions vs. threads)

* Threads are hard to read, because follow-ups in threads are shown in a smaller font size. You cannot increase the font size at all in the desktop application

.... and so on.

Also, I tried to submit some bugs, but for that I'd need to have some information which only the admins have that run this instance, and in the end it was too much effort to get all that information together, so I didn't even bother.

pas · on Jan 5, 2021

I agree, it's still got a long way to go. I'm saying it's still a perfectly viable alternative to Slack. Fast, simple, works. (At least this is my perception/impression. I wanted to evaluate Slack alternatives for some time, but haven't got the time for it yet. So I was surprised when I got an invite to one of our client's rocket.chat instance and things worked pretty well.)

I'm in a slack workspace that is constantly notifying me of a thread, but I can't make it read. Maybe it has something to do with the free messages limit. So the message is there, but cannot be accessed. Annoying as hell. I thought about submitting the bug to Slack, but then just let it go, and probably we'll just move to Signal or something.

m12k · on Jan 4, 2021

I've heard lots of good about Zulip - haven't tried it myself yet though

rattray · on Jan 4, 2021

Have you tried http://quill.chat/ ? Younger startup (invite-only) but very slick.

dmix · on Jan 5, 2021

Looks like a great start. The TODO list is most of the stuff I'd want that push it over the edge of even suggesting to our company to replace Slack.

But I signed up and will keep it on my radar as it matures.

corytheboyd · on Jan 4, 2021

Thanks for sharing! They definitely nailed the marketing page, I’ll keep in my list of products to follow up on :)

halukakin · on Jan 4, 2021

If the benefit we are looking for is better up time, that will not happen. The main benefit is going to be knowing why the system is down, and the eta to being up again.

Cthulhu_ · on Jan 5, 2021

That takes care of the software and protocol side of things, true, but does it give more reliable and predictable uptime? That's the main thing here; while there are plenty of software alternatives to Slack, their product is not just the software but also the hardware, servers, and scaling. You can get a Slack instance from 10 to >10K members without ever having to worry about your hardware, or how much hours your staff needs to spend on maintaining said hardware. And when there is inevitably downtime, you and your staff don't have to scramble to get it back up - with this outage, it's a shrug, it's down, it'll be back soon probably, I'm going to do some work or do something else. Extended toilet / lunch break.

dheera · on Jan 4, 2021

> self-hosted

How often is Slack/Discord down? I mean it's not perfect, but I really honestly don't think I could match their uptime by self-hosting, as well as more on-call rotations for something that's not core product.

I very much prefer that for something that isn't core product, if it goes down I need to do exactly nothing for it to come back up, and that the engineers at Slack will be starting to work on it likely before I even realize it's down.

darkwater · on Jan 4, 2021

This is a tale SaaS vendors (which have strong presence in online tech communities like HN because they are software companies) sold very well, and it's probably true for many small startups, but for medium sized companies managing their own platform for something like Slack is completely doable and you will not have those big downtimes compared to Slack. Sure, you have to dedicate time and resources to it, and obviously is not "core business" although a chat platform is a pretty important component in an online company.

welterde · on Jan 4, 2021

I would be surprised if you couldn't match or exceed slacks uptime running whatever alternative you want (IRC, mattermost, rocketchat, etc.) on a random dedicated server.

Hardware is quite reliable these days. And updates can be scheduled to be at a convenient time for the team.

dheera · on Jan 4, 2021

Yes, but what if you're taking a few days off to backpack in the wilderness with no signal while it goes down? Who deals with the downtime?

welterde · on Jan 4, 2021

If you are the only technical person on your team then it's of course not ideal and would require some further thought into making things redundant. But even that is easy enough to do with IRC (setup two servers, link the irc servers together, single DNS record that points to both servers - job done).

If there are other people on the team that have _some_ technical skills then they can fix it..

IRC lacks quite a few features compared to other solutions, but the reduced complexity does bring very low operational complexity.

dheera · on Jan 4, 2021

IRC will be incredibly hard to use for non-technical people on your team. Mobile clients for IRC look like crap, and have horrible-looking ad bars. No integration with Google Drive, Github, or other things.

It's just not a business-friendly tool.

I'm an engineer and personally I'm fine with IRC, I'm just trying to be realistic here.

welterde · on Jan 4, 2021

Is it really though? If I take a look at a random modern IRC desktop client - how is it more difficult to setup than say your email program? The amount of information needed on setup is about the same: server, username, password (in fact email can get a bit more confusing in big corporate email setups with differing imap and smtp servers, etc.)

Also there are plenty of modern web clients for IRC, such as https://thelounge.chat/ or https://kiwiirc.com/ (which is supposed to work on mobile too).

dheera · on Jan 5, 2021

> your email program

Reality check: Most people don't use email programs anymore.

Also how do you get IRC to sync all conversation data, history, between your several desktops and phones, how do you send files, make calls, and thread conversations?

welterde · on Jan 5, 2021

But clearly no one is saying that email is too hard to use and we should just use $something_else (or are they?).

And you are starting to move goalposts here.. first it was uptime, then it was operations and now it's features...

And what about those web based IRC solutions? They are even easier to use than slack, have combined history, file sharing, etc.

mewpmewp2 · on Jan 5, 2021

They are moving the goalposts because there are several and ultimately very many reasons why IRC won't work, they just didn't bother to think of all the reasons and list them at once.

welterde · on Jan 5, 2021

Ultimately there is only one reason that matters: The person in charge of deciding what communication channel to use likes Slack/Teams/IRC/whatever.

Add to that the SaaS propaganda that hosting literally anything yourself is just too hard (it really isn't). Or this notion people are just too stupid to deal with anything more than the simplest possible web interface - Really? what do those people even do? Stare at Notepad all day? Of course not. They stare at various complicated software packages ranging from CAD, $spreadsheet abominations, SAP to various Adobe software packages. Sprinkle in a bit of hype for the latest new thing and presto.. </rant>

erikbye · on Jan 5, 2021

> Reality check: Most people don't use email programs anymore.

Guess you are not in enterprise.

eropple · on Jan 5, 2021

I'd bet hard money that within epsilon of anyone using a desktop email client in 2020, and thus having one to set up in the first place...is in an organization with access to Microsoft Teams.

throwaway201103 · on Jan 4, 2021

Who deals with the downtime if any other on-premises system goes down?

If you are running networks and software on site, and they are business-critical, you have people and a plan for this. Or you don't, and suffer the consequences.

zelly · on Jan 5, 2021

There will always be more downtime on Slack/Discord. There are more users, more updates. Slack/Discord is a giant distributed system with nodes all around the world. An IRC/XMPP server on one machine that 100 people use is not going to crash unless intentionally.

chillfox · on Jan 5, 2021

People really over estimate the difficulty of running self-hosted systems with great uptime.

When self hosting you can get away with simpler systems that ends up being more stable and have higher up times for lower effort.

The reason you see cloud providers having issues is not because the thing is difficult, but because doing anything at huge scale ends up being difficult.

erikbye · on Jan 5, 2021

> it's not perfect, but I really honestly don't think I could match their uptime by self-hosting

This is such a common misconception. The services I self-host was configured by me, if anything goes down (which they very rarely do), I know the exact cause and have it fixed in minutes. When some company's cloud service goes down I'm completely at their mercy. I also spend very little time on maintaining these services, just security updates, which are mostly automated.

Bottom line, maintaining and self-hosting services that has 1 or a few users is much less complex than services with millions of users. Hence, my uptime is better than Google's, Amazon's, and Azure's, etc.

vector_spaces · on Jan 4, 2021

Yep, I've stopped recommending Matrix because

1. There is virtually zero user-facing documentation. Need to know how to backup keys, verify another user, or what E2EE means? Ask your server operator. Basically the onus is on operators to document this stuff for their users. Except the stuff we're documenting is hard even for server operators, and especially challenging to document in a way that both nontechnical and technical users can understand.

2. Because this stuff is challenging even for more technically minded users to understand, it leads to a kind of burnout for interested non-technical users where they learn all they can about some feature and how it works at a high level from out of date random blogs, try to use the (complex, multi-step) feature, but then something won't work, and it isn't be clear whether it was because the user did something wrong or because the clients or server implementations are broken

3. Issues where core functionality is broken (e.g. two mutually verified users on my homeserver haven't been able to talk to each other in months -- see [1], [2], [3]) languish for months with zero response from maintainers.

4. While core functionality is both broken and undocumented, the maintainers announce rabbit hole features that no one asked for and seem very much like distractions, like their recently-announced microblogging view/client[4]

In short the Element maintainers have shown little interest in making the platform accessible to the people who need its differentiating features the most, and have prioritized the "mad science"/technical aspect of their platform at the expense of the human element (end-users and operators).

It'd be cool if Element used their resources to hire some UX folks and community advocates whose sole focus is addressing the horrid accessibility of their platform. I think most users would rather see that than further "mad science".

[1] https://github.com/vector-im/element-ios/issues/3762

[2] https://github.com/vector-im/element-ios/issues/3572

[3] https://github.com/vector-im/element-ios/issues/3393

[4] https://matrix.org/blog/2020/12/18/introducing-cerulean

erikbye · on Jan 5, 2021

Have you submitted the requested bug reports?

Also, it seems the FAQ answers several of your points: https://element.io/help

vector_spaces · on Jan 5, 2021

Yep, I have, although funnily enough it turns out that the rage shake feature was the only way to submit a bug report with diagnostics from a client (as of a couple of months ago anyway) and that feature itself was broken for one of my users (who has since churned).

That FAQ is a great start, but it's not sufficient for non-technical users. It's not easily searchable, it doesn't provide screenshots, and it doesn't go into enough detail for each item (e.g. describing what can go wrong + troubleshooting).

Thanks for pointing it out though

pmlnr · on Jan 4, 2021

This is disturbingly good summary. I remember Matrix being presented as less bloated compared to XMPP... sure.

zelly · on Jan 5, 2021

it's trivial and essentially free (<$5) to run an IRC server supporting tens of thousands of users.

it also doesn't take gigs of memory on client devices.

johnisgood · on Jan 5, 2021

There is also Mattermost which is literally like Slack, but self-hosted.

mnky9800n · on Jan 4, 2021

Remember when Facebook messenger was xmpp based? Lol.

sizeofchar · on Jan 5, 2021

Also Google Talk.

unethical_ban · on Jan 4, 2021

I am genuinely surprised that Slack wasn't ready for people to come back from holiday, to view increased queues of unread messages, to have to manually login vs. having auth tokens or cookies, etc. Either that, or they had a cosmically coincidental outage on a really bad Monday to have it.

It's bad enough team comms go over Slack so much now, at least we have email fallback. What scares me is for the teams that use Slack for system alerting.

dysfunction · on Jan 4, 2021

My coworker's theory was someone was waiting for the holiday's end to deploy something risky.

And I'm in that boat of depending on Slack for alerting... in fact my team was also waiting over the holidays to deploy more robust non-Slack-based alerting (in our defense the product is only a few months old and only now starting to scale to any real volume).

bitbuilder · on Jan 4, 2021

I wouldn't be surprised if it's actually a combination of a new feature being recently rolled out, along with the sudden spike in load this morning.

The holidays are actually the perfect time for Slack to roll out a risky deployment, as it has to be their lowest usage time. So it would make sense if something was pushed out last week or the week before. And everything probably seemed fine.

And then this morning they suddenly realize this new feature does not perform under load. And to make matters worse, the new feature has been out long enough to make any sort of rollback very tricky, if not impossible. Which means they'd need engineers to desperately hack out, test and deploy a code fix.

If this is the scenario, I do not envy them at all.

rrrrrrrrrrrryan · on Jan 4, 2021

Holidays are a good time for a company to do a risky deployment, but a bad time for an individual employee to do a risky deployment, assuming one doesn't want to work overtime over the holiday fixing things.

spurdoman77 · on Jan 4, 2021

Depends on how well compensated holiday overtime is. There are some employees happy to work overtime if their hourly pay is doubled or tripled. However there also those who wouldnt do that for any price.

iso1631 · on Jan 4, 2021

Depends how bad it goes wrong. My org is a 24/7 one, but one Christmas back in the 90s (way before my time) some work was done on Christmas eve, I think it was on the phone system, in the days before widespread mobile phones.

It broke, which was a major problem, this meant that senior management were being phoned (ho), and relatively high middle managers were on site to deal with the fall out. Of course most suppliers were also closed so everything was harder to fix.

There's good reasons not to do changes when places are closed, or at least skeletoned, for 2 weeks.

ellisv · on Jan 4, 2021

This depends on how easy/difficult the rollback strategy.

hinkley · on Jan 4, 2021

Not a bad theory.

I used to work for a place that had a FY that ended in summer. We had a lot less problems with stuff being shoveled out the door at Thanksgiving and Christmas because nobody was trying to finish their year-end performance goals over the Holidays.

I think what I'm implying is that management creates this issue, but we are complicit.

spelunker · on Jan 4, 2021

Yeah, I think it's this rather than load. Slack should be able to handle load fine (probably), but since this is the first weekday post-holidays I imagine some deployment broke something.

t-writescode · on Jan 4, 2021

Slack has been in business for several years and has survived several December to January transitions, including several people stopping using their product before Christmas and then returning early January.

It seems a bit presumptuous to assume that's at fault here, given their age.

rrrrrrrrrrrryan · on Jan 4, 2021

Does it? Don't you think their users might be leaning on it more heavily this year due to working from home?