Hacker News new | past | comments | ask | show | jobs | submit login
Twilio releases open source video conferencing apps for iOS, Android and web (twilio.com)
583 points by devinrader on March 13, 2020 | hide | past | favorite | 102 comments

Disclaimer: I used to work at Twilio

I'm seeing discussion below about fully open-source solutions (Jitsi Meet) vs. fully proprietary solutions (Zoom). What's interesting about using Twilio is that it is a programmable solution.

Case in point: I am building a webpage where people can log-in to a virtual office, so that people working from home don't feel so lonely. I have certain requirements, such as:

    - All audio is muted
    - Participants are arranged in a grid
    - Room is persistent (no need for an "owner")
Accomplishing this with either Jitsi or Zoom may have been impossible, because while they are configurable they are not programmable.

Now, Twilio isn't the only Video platform on the market (see TokBox, etc), but I think comparing this to Jitsi/Zoom/Hangouts is like comparing apples to oranges.

Maybe keep the Kurento project on the radar [1]. It provides you with "building blocks" to create an application, I think that's what you mean with programmable: you use it to build the application you need.

There are blocks for WebRTC, RTP, RTSP input (not output), and others. You can also apply computer vision to the video, with OpenCV filters. There is a "Composite" filter that places N streams in a grid. And users are expected to write their own fancy "blocks" if needed, so the option is there too.

Only important matter would be to see if the currently existing blocks are already providing the features you need, or new ones would have to be created.

Some technology from the Kurento project was acquired by Twillio [2], so it's possible (but not sure, I don't really know, maybe the parent commenter knows) that their media services might be based or inspired on Kurento.

Full disclosure I currently work for the Kurento project as the main developer, at Universidad Rey Juan Carlos.

[1]: https://www.kurento.org/

[2]: https://www.kurento.org/blog/whats-next-kurento-and-elasticr...

EDIT -- I wrote "under the radar" which is exactly the opposite of what I meant to say :-) now fixed

It definitely has a lot of potential but nobody is going to re-assign a bunch of engineers to code up a solution to their immediate need for massive teleconferencing capabilities today - as I imagine the timing of this launch is heavily influenced by the Coronaviris and current demand of these tools.

Where I think it does play in is that while Zoom is probably making a lot of sales right now, they are probably starting to regret their start-up friendly pricing model now that they have to fund large engineering efforts to keep up with demand. I imagine Zoom prices will go up while Twilio APIs will stay fairly consistent. There may be some cool start-ups that get built on top of this as well.

We're a startup that makes a video API that competes with Twilio. I can tell you that we're seeing a big increase in interest this week from both big companies and startups, across a big range of use cases. (Lots of telehealth and online education. Also lots of cool "distributed teams" experiments.)

what's your website? and what's your advantage to twilio?

> It definitely has a lot of potential but nobody is going to re-assign a bunch of engineers to code up a solution to their immediate need for massive teleconferencing capabilities today

No, but I bet a lot of places are (like the big enterprise shop I'm in) both scaling out their existing solutions for things like conferencing and VDI that were initially chosen to support more limited use and starting to consider long-term solutions to support more comprehensive at-need remote work.

Cisco Webex also has fully programmable SDKs[0], with the flexibility to either manage video windows and controls directly[1] or embed fully functional widgets. Zoom also has extensive APIs and SDKs[2], though I'm not familiar with how flexible they actually are.

[0] https://developer.webex.com/docs/platform-introduction

[1] https://github.com/webex/webex-ios-sdk-example-buddies

[2] https://marketplace.zoom.us/docs/sdk/native-sdks/preface/int...

Hell is trying to deal with an API / SDK that isn't the vendor's core product.

At best, they have no incentive to fix your issues / provide missing documentation.

At worst, you're seen as sabotaging revenue from their productized offerings.

I've seen very few companies that can successfully be a product and a platform company at the same time.

Here is a link to my fork of the repo shared by the OP:


You can try it out at:


Just enter your name and the room "HN".

On mobile Safari, the mic & camera permissions pop up, but after clicking Join Room the page goes blank. Any ideas? It works perfectly using Desktop Chrome.

Same on iPad Pro ios13

Adding text chat to this would also be quite handy


Media connection failed or Media activity ceased

Error Code: 53405

Hate to say it, but: Works for me ™

Was that an error on the webpage or in the JS console?

Yep..had to tweak Matrix, Origin, and NoScript before it worked, asked for javascript, then errored 5300 for a bit as I worked the extensions.

> it asked for JavaScript

I'm about 300% unsurprised that this uses JavaScript- correct me if I'm wrong- I don't think there's really a way to do this without it, if we aren't supposed to be using Flash anymore, either.

So Twilio's SDK counts, but Jitsi's doesn't because ...? You can build custom interfaces with all your requested features with that too.

The difference in the ability of a developer to modify an open source software product (think: Gimp or Jitsi Meet) to suit their needs versus using a platform SDK that is designed to provide flexibility and to be used by devs is massive.

I spent some time looking at the Jitsi Meet source to confirm this assertion:


It was quite clear that trying to use that code to execute my use case above would have been a very large waste of time, and quite possibly (given the lack of docs) a futile exercise.

This is the react example - https://meetrix.io/blog/webrtc/integrate-jitsi-meet-to-react...

These are the React features supported, including chat/conference,etc https://github.com/jitsi/jitsi-meet/tree/master/react/featur...

I read and re-read this "examples" docs page several times:


Hats off to anyone who can rapidly go from those docs to a deployed and working app. I was able to modify the repo linked in the parent to suit my needs in roughly 2-3 hours, much of which was install/setup.

Nice points. Twilio is certainly well supported for developers.

Still it feels like a bit of a letdown to see protocols meant to open up communication (RTC) being wrapped in a per minute charge.

Kickstarting innovation would be even more possible if charges only kicked in after a high enough threshold.

Again, big fan of Twilio and what they did to abstract away telephone hardware.

please share when you're done, we've started using Zoom and it's unusable for any non trivial feature on my linux machine.

You're running an LTS with latest packages & latest Zoom client?

Has worked out of the box on Ubuntu 18.04 for the last four months with screen recording, as host, as attendee, background replacement, chat, seminar mode, etc.

Yesterday I found one issue, the whiteboard share collab mode doesn't work but the app continues to function fine.

FYI: Get rid of unity if you're using that. It'll crash on it's own, not zoom. I'm using i3wm as a tile manager and everything just works as expected.

AWS recently announced SDK for its Chime conferencing app, too: https://aws.amazon.com/chime/chime-sdk/

Also Tokbox is horrible. Twilio is modern and pretty good.

Tokbox has better pricing. What's so horrible about it? I've found their sdk trivial to implement.

These are open source clients for a proprietary backend, which are no doubt still useful in a number of circumstances.

If you are comfortable with a more proprietary stack, then Google Hangouts / Zoom / others are likely to provide a better supported user experience.

If you would prefer a more libre end-to-end experience then Jitsi Meet may be an attractive alternative.

Even RMS has no problems with using open source frontends to proprietary services. The reasoning is: if you can audit and modify the way you talk to the service to ensure you arent revealing any information you'd rather not, why do you care if the service itself is OSS, proprietary, or some dude in Qubec handcrafting TLS packets?

That doesn't sound like him :)

Do you have a reference/quote for that, out of interest?

In the past it seems like he's been skeptical[0] of providing user data to third-party services.

[0] - https://www.theguardian.com/technology/2008/sep/29/cloud.com...

This was from a private email exchange, but you can look at how he's soliciting a FOSS Google Docs client as evidence that he'd be fine with a FOSS interface to a proprietary service.

Additionally, https://www.gnu.org/philosophy/javascript-trap.html makes no mention of proprietary backends being harmful to user freedom, only obfuscated (unauditable) frontend code.

> https://www.gnu.org/philosophy/javascript-trap.html makes no mention of proprietary backends being harmful to user freedom

OK, but the later essay https://www.gnu.org/philosophy/who-does-that-server-really-s... does.

The proprietary backend is providing a network service, the question is: to what degree is that service being used as a software substitute?


My own opinions on RMS' opinions:

The essay I linked is from 2010; don't get too hung up on specific examples, things have changed since then. The Facebook of 2010 was fairly usable as a dumb service without being a software substitute. It was possible to consume your news feed via RSS, and perform your own sorting of it (for example).

That said, I do think that RMS has taken a rather naive view of which services are pure services, and which services are software substitutes. I can see how many things can reasonably be conceptualized as a pure service... but also, they could be replaced by software. If you could use software as a service substitute, then surely the service is a software substitute. And part of this changes over time, as people figure out how to replace more things with software. (Based on personal (mailing list?) communication regarding DDG vs YaCy vs Searx as the default search engines in FSF-endorsed distros)

It should also be said that RMS refines his views change over time, and that he often doesn't do a good job of acknowledging that they've changed, instead doing a sort of "I've always had this wisdom" thing. Years ago, he had a bit about how people would ask him about "free hardware", and that his answer was that the concept of free hardware didn't make much sense, as you can't copy a piece of hardware the way that you can a piece of software, and that having the source files for hardware wasn't very important. Then a few years later, consumer 3D printing and home manufacturing started to take off, but he held on to his old position. Then a few years later, he came around, and he had a bit in a talk at LibrePlanet about how free hardware was obviously important. So don't give old statements by him too much weight--his views change in response to new information and developments (as they should).

That's cool - do you think his opinions have changed since writing that 'to identify yourself to a Google service is a grave error'?[0]

It'd be great to learn more about that call for a FOSS Google Docs client if you can share a hyperlink.

[0] - https://www.stallman.org/google.html

I guess the logic there being, if you're running it, you should be able to change it / fix it.

It doesn't seem to me that this is end-to-end encrypted, I would care where my conversation, video, and voice data go.

You quite explicitly don't need to audit the backend to verify something is end to end encrypted. I can be confident nobody is snooping on my TLS communications without needing to audit every router along the way, this is in some ways the entire point.

The point of e2e encryption is that the server can't decrypt it.

Yes, but I don't need to audit the server to be sure it can't decrypt it. Thats the real point. Otherwise a ROT-13 that I've manually verified isn't decoded by the backend would be considered "e2e encryption".

> Google Hangouts

> likely to provide a better supported user experience.

Yeah, about that: https://arstechnica.com/gadgets/2019/01/the-great-google-han...

According to a more recent (2020-03-03) blog post[0] by Google, they'll be rolling out free access to Hangouts Meet for GSuite and GSuite Education users.

[0] https://cloud.google.com/blog/products/g-suite/helping-busin...

> Google Hangouts / Zoom / others are likely to provide a better supported user experience.

My experience with Google software is the exact opposite. The never-ending parade of new products, SDK versions, grand rewrites, non-existent support, and cancelations has served to remind me of one thing: that I am the product.

Last year I’ve worked on an open source soft phone where the proprietary backends are abstracted away, and you can switch them depending on your desired infrastructure.

Twilio, PortSIP, and Abto are currently implemented as demo backends.


Zoom is owned indirectly by the Chinese government, not sure what that means to your company but FYI.

They’re a public company as of March 2019, so unless the Chinese government is somehow a majority shareholder, I believe you’re mistaken.

the CEO is from China yes, its major shareholders include VCs are from China yes, but that does not prove it's "indirectly" owned by the Chinese government, it's likely but we need proofs for claims like this.

Do you have a reference for this or is this just a major assumption

Don't really like the misleading projects Twilio releases as "open source". They are basically releasing demo code of how to use their commercial, paid for backend service and while that demo code is open-source, I don't think that's what people are thinking of when they read that title. I think most people think they are releasing some kind of self-hosted solution. At least that's what I think when I read this.

With that said, Twilio has a very easy to use API and if usage cost isn't as big of a deal to you as development time is, it's an easy decision to utilize what they offer.

Knowing twilio's business model, I knew exactly what was meant by the title.

Our company, Daily.co (YC W16), makes APIs that compete with these new Twilio offerings. A couple of lines of js code embeds a video call in a web page:

Our focus has been on doing as much as possible to make video calls reliable, and helping developers get started with video as quickly as possible.

Twilio is a great company. Jitsi is a great open source project. As always, there's room for different approaches to solving hard problems, for different customers. We've found that the biggest pain point for lots of developers are that 1) building out a full video call UI is non-trivial, and 2) handling all the failure cases possible in real-world calls is non-trivial. So we try to solve those two problems.

I love this stuff (this is the third time I've built a video tech stack, in my career), and I'm happy to answer any questions about WebRTC, building live video into applications, and give you (relatively) non-biased advice about tools/approaches for your particular goals.

How does Daily.co do ICE, STUN/TURN? Do you use a third party service, host coturn yourself, something else?

I'm currently a Twilio customer using their Network Traversal Service.

Ok. Why are there so few people doing ScreenHero-esq dual-driver pair programming friendly remote control with this tech? Are the APIs just not there?

At least on iOS the heavy lifting seems to be done by the TwilioVideo library: https://github.com/twilio/twilio-video-app-ios/blob/c9d5410b... ... which is not Open Source: https://cocoapods.org/pods/TwilioVideo

Open source client for a closed source backend.

Businesses aren't charities but man does this feel cynical to anyone else? Twilio is not a bad investment for any business, in my opinon -- they make very complicated things very simple and offer a lot of value. This feels like OSS-washing though, and I feel like I'm a mark -- "use our awesome open source stuff so we can pull you into our ecosystem".

The advantage of the open source here is so you can configure the endpoints exactly as you wish. Not vendor independence etc.


Does Twilio still use Zoom for internal voice & video employee collaboration?

I worked there for a few years and left Twilio not long ago. I always thought it was super strange how Twilio doesn’t dog food any of its own service.

Some examples: - Twilio Flex isn’t used for support desk - Slack is massively used at Twilio vs Twilio Chat - Zoom is massively used vs Twilio video - employees don’t text yet SMS is your core business

And more

Twilio is building the building blocks for solutions like Slack and Zoom. Their core competency isn’t to deliver a full fledge enterprise chat platform.

I felt the same way about Slack's CEO saying they company is using Zoom for interviews https://twitter.com/stewart/status/1237815465126354945

If this is backed by Twilio cloud webRTC API, I believe that does not support more that 50 people at a time watching the video

I have been using Janus[1] signaling server to broadcast live drone video on my website and it has worked flawlessly with multiple expectators. The drone video is captured through a Janus webrtc library in the Android phone connected to the drone.[3]

With a TURN server from Xirsys[2], I have had live drone video recording as well as barely a second of latency. I disabled peer2peer video because I do not want a 4G connection to broadcast to multiple viewers.

The reason I have not used twilio(even though I evaluated its offerings) is that they do not provide on premises hosting and my potential customers are not normally interested in not having on premises hosting.

[1] https://janus.conf.meetecho.com/ [2] https://xirsys.com [3] https://github.com/ptsneves/janus-gateway-android

I was asked today to help on exactly this for an artistic project, do you have a write-up to share? I'm also a fan of Xirsys And Janus, good to see it plugged.

Hello telesilla, I do not have write up as this is part of a project I am hoping to commercially release. If you have specific questions let me know.

The basic thing is that if you are doing a basic demo without business logic you can easily pick up a Janus video room demo and set Xirsys as your ice server in the Java script file. After that, start front end work, to get to your demo goal. Of course if you know Janus you probably are already doing this. :)

For the android part, if any, you can pick the library that i quoted before.

My biggest difficulty, that almost made me give up was realizing that my reverse proxy internet facing machine needed to have the ports open necessary for webrtc. Otherwise you get extremely weird results in the ice candidate gathering stage, or sessions that take very long to start. Even weirder, the turn server does not work without it, which my limited understanding at the time did not even consider it. It may sound obvious in retrospective but the debugging results were just misleading.

With that problem solved i just programmed a small library in C# that sets up private video rooms per user group. Also programmed a system that gathers the mjr files after a live video session and makes it available in a gallery as webm files(tried mp4 for ios safari but never got them working)

The system design is glued together by a docker compose file which has containers for: asp.net core server, Janus, mjr transcoder, ngingx reverse proxy, and mysql database.

Sorry if this is not much detail, but I am a one man show and I dedicate the time apart from my family to this project and day time job, and I am really trying to be laser focused to release it commercially. Also, I work in an Eastern European country so my resources to hire out help, are more constrained. When I have a minimal viable product I will try to get it to the Show HN and hopefully work on the growth and scaling.

Wow thanks for the comprehensive reply, I wasn't expecting that! I can't wait to see more from you. Best of luck - it seems a good time to be working with drones.

Wow @ptsneves, that sounds really cool. Could you give me a consultation on video streaming? I want to implement this into my product. If so, please write to my username at gmail. Thank you!

Hello @aaronlifshin. Thank you for your interest and compliments. I answered you by email.

use a phone on drone for camera, how about its zoom functions? can it record far distance contents? per my understanding phone-camera is only good for relatively short distance objects which is not the use case for drones? Thanks.

Hello auskje, maybe my post was not clear. The phone acts as a connecting point to the drone, through a manufacurer's SDK.

The camera of the phone is not used nor is the phone flying. Actually the phone is connected to a remote control. See [1].

The role of the phone is not only as a viewer of the live video being captured by the drone, but also as a 4G internet point and a hardware video encoder, that pumps the live video to my platform.

[1] https://developer.parrot.com/

got it. Thanks for replying.

Thanks, that let me find the pricing. It looks like it's $0.48/hr for a two-person meeting, or $30/hr for a 50-person meeting: https://www.twilio.com/video/pricing

The bottom mentions that peer-to-peer room pricing is even lower, $0.18/hr for a two-person meeting, but I'm not sure when that's used.

I am gonna open source my video conferencing application which will redirect you to zoom.us


With Jitsi I can self-host. In my view, that makes Jitsi better from a privacy perspective.

The problem with Jitsi is that the video streams have to pass through the videobridge. That means paying for ingress and egress network cost if you are on AWS .

It detects if two participants are in the same network and then uses p2p.

I've recently tried Jitsi but could not figure out how to do screen / desktop sharing. Is this possible at all, do you happen to know? (there were some sketchy-looking 3 star review plugins for Chrome, but they didn't really inspire confidence)

There is a button in the bottom left.

I tried Jitsi and I love the UI but it uses so much CPU... Still looking for a self-hosted alternative.

if I may ask, what did you not like about Jitsi? the backend server /videobridge is pretty cool. When you say ui cpu usaage, do you mean the browser /javascript UI ?

Yes, browser/javascript UI. I was testing it with a colleague and having 2 people in a meeting CPU usage already hit 50% and with 4 it was 100% all the time and the CPU was boosting to its maximum. Tests done on latest Chrome.

Ooooo, Jitsi has some competition?

Except with Jitsi you can self-host the entire back-end. These apps seem to depend on Twilio's infrastructure, as the READMEs mention needing a backend to generate Twilio tokens.

But they are open source (Apache licensed), so that can be changed.

Potentially, one could make a fork that can use either one’s own backend or Twillio’s, while Twillio’s own can only use Twillio’s.

Is it trivial to write a backend that will replace the functionality in Twilio's backend?

It may not be entirely trivial but if enough people are willing to contribute it could be trivial on an individual basis.

I'd be interested in giving it a go. If anyone else is, just leave a comment on this issue. https://github.com/ethanwillis/oss-twilio-video-backend/issu...

Right, but you're talking about starting an open source project whose complexity is similar to that of Jitsi's backend. It may be hard to attract contributors until there's a proof of concept.

That's fine. What I'm really looking for is just an expression of intent to contribute once a plan is in place. I don't mind taking the first whack at a proof of concept.

However if people do want to contribute early on I'd welcome that :)

Probably not, but it might be possible to write simpler "shim" which would go between this frontend and jitsi backend.

Yes, maybe. For that to work, I guess you'd need the concepts (rooms etc.) to be similar enough, so that you can convert the control messages in a way that makes sense, and you'd need the media formats/chunking to be close enough.

It's not going to work 100%, as the Twilio API can do stuff that Jitsi doesn't (e.g. the choice of video being transferred p2p or via the server).

But if you're going to do that, then you're using Jitsi mainly for media transcoding/multiplexing. So maybe it would be easier to use that code in Jitsi directly and write a new API layer, rather than writing a shim above the existing API.

I wasn't aware of Jitsi - I really like Twilio but Jitsi's ability to self host elevates it to another level.

Also BigBlueButton, which is geared towards education, but also does video conferencing well.


Thank you for sharing this. I had never heard of this project before -- and it looks pretty good!

Can someone explain why we need a Twilio backend to do WebRTC?

I've never done it so I don't know, but I thought the whole point was that it's peer-to-peer. I know there is some kind of "discovery" layer required, but this doesn't seem complex enough to warrant a SaaS.

From what I learned, for production use of WebRTC, a specialized signalling server (STUN/TURN) is needed to reliably coordinate communication among peers - in particular, to deal with network address translators (NATs) and firewalls.

Most open-source WebRTC implementations I've seen use a list of semi-public STUN/TURN servers, notably including Google's, which I believe are explicitly for dev/testing purposes.

In addition to this "discovery layer" - and possibly handling reconnections - I imagine it's non-trivial to provide video/audio transport, compression, etc.

I'd love it if it were simpler to use WebRTC, truly peer-to-peer, in the vein of how early Skype used to work. Unfortunately, from what I've learned so far, there are still technical hurdles to self-host such a service.




Let me try to take a stab at this:

If you have video chat with 10 people, then you have peer connections with 10 people. Each person will have connections with 10 other people, resulting in 10x10=100 connections.

With a streaming server, you would just need 1 connection to a server that would give you 10 media streams.

With a streaming server, you can also do recording and stuff.

I see. I didn't realize the Twilio backend did more than a simple p2p webrtc system.

Wire provides:

  - OSS client and server
  - E2E encryption
  - migration to IETF MLS protocol for E2EE messaging
MLS is a stepping stone to moving away from siloed messengers to interoperable services.

Is the iPhone screenshot accurate? Because they way it looks right now, with the border around the entire screen, is pretty awful :(

has anyone tries aws kinesis webstream ? i think the downside is that it is limited to 5 participants in a video chat . But the upside is that it does use STUN and TURN and the video data packets are not billable by AWS.

Shoot me for splitting hairs, but I think Twilio mean they build open source video apps, not that they "built" them.

> We built open source video apps so you don't have to

^^^ sounds like they did a one-off.

> We build open source video apps so you don't have to

^^^ sounds like they're now in the business of doing this for steady revenue, which I don't think is true.

Depends on if they're done or not.

These are interesting accusations to come from a throwaway account.

Not sure what throw away account has to do with it. Interesting accusations regardless.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact