Hacker News new | past | comments | ask | show | jobs | submit login
Youtube.js – full-featured wrapper around YouTube's private API (github.com/luanrt)
413 points by mahnouel on April 14, 2022 | hide | past | favorite | 107 comments

This is really cool, but maybe the README's disclaimer should also warn that using YouTube's private APIs is against their Terms of Service[0], specifically this section:

  The following restrictions apply to your use of the Service. You are not allowed to:
  3. access the Service using any automated means (such as robots, botnets or scrapers) except (a) in the case of public search engines, in accordance with YouTube’s robots.txt file; or (b) with YouTube’s prior written permission; 
It would be great if YouTube updated their ToS to permit this because that could unlock some really interesting innovation. Until then, devs should at least be aware that building a product with these APIs is risky.

[0]: https://www.youtube.com/static?template=terms

I don't see why YouTube would want to allow any of this; the APIs probably changes regularly, have documentation only available internally, and they can't attribute the use of these APIs to specific Client IDs for abuse (ie. bypassing rate limits by using these); this is not mentioning how there's $0 to gain from doing this, and it could actively cause them to lose money since the RIAA has DMCA'd even the mention of downloading music videos from YT[0]. If they wanted to introduce any more functionality to the API, they'll simply add it to the official API and extend the docs[1].

0: https://news.ycombinator.com/item?id=24872911

1: https://developers.google.com/youtube/v3/docs

> the APIs probably changes regularly

I'd go as far to say routinely. A massive pain point for third party youtube apps like newpipe which break every few months due to it.

As a regular user of NewPipe, I disagree that it's a "massive" pain point. NewPipe works very well almost all the time.

Coincidentally today is maybe the third time I've went to watch something and it's been broken. It's annoying yes, but soooo much less annoying that constant ads. It's a minor pain point that is easily solved by temporarily switching back to regular youtube.. by the time I've seen a few ads I'm more annoyed at the regular experience and NewPipe probably has a fix out by then anyway.

Youtube API's change frequently, but rarely in backwards incompatible ways.

After all, youtube runs on millions of client devices which can't or won't update the client application. Eg. Android devices with google play broken or logged out. Forcing those users to update in many cases will lose a user. A 5+ year old phone's copy of youtube still runs and plays videos.

Now the API's that the website uses... those can be updated with nearly zero notice...

as it happens, newpipe is currently broken

How so, exactly.

I have been playing around with NewPipe SponsorBlock. It has its flaws but generally seems to work. I have not seen it mentioned but NewPipe SB also supports Soundcloud, Bandcamp and PeerTube instances. I think it includes FramaSoft's and CCC's PeerTube instances as presets. It is interesting how smoothly PeerTube seems to work.

I would not rely on NewPipe for downloading or playing YouTube videos, even when it is "fixed", but the app does much more than just those two things.

0.20.2 is working for me (at least it's working enough that videos can be played and downloaded). Even in newpipe I prefer to download videos rather than stream them. VLC is a far better player, fewer ads, fewer distractions, zero comments, and what videos I re-watch and how often isn't being logged by google.

"Until then, devs should at least be aware that building a product with these APIs is risky."

To be fair, building products with Google APIs is also risky. Google used to offer a YouTube API for free. People like me used it for personal use. Then one day Google unilaterally decided to discontinue it.^1 There are other examples of APIs that Google has changed or deprecated then discontinued.

1. To me, this is the great disadvantage of "web APIs". A "web API" is like a non-public version of a public website that the operator can easily limit access to or shut down at any time, without affecting the public website. Arguably the public-facing website is more difficult and less likely to suddenly change or shut down than a "web API". Given a choice between retrieving public data/information from a "web API" versus from the pages of a public website (or wherever the pages source their contents), I prefer the later.

Building any sort of web-based business dependent on some irreplaceable third party, such as Google, is risky.^2 When the third party makes a decision that affects someone else's dependent business, there is usually nothing the business owner can do. We see stories about this on HN from time to time. This is a control issue, folks. It is what happens when people voluntarily cede control to a third party intermediary.

2. Is it worth the risk? That is another question.

One difference between something like this free JS library on Github and an "official" Google API is that we can edit and adapt the JS library if the structure or content of YouTube's video page template changes, whereas if a YouTube API is changed or discontinued, we can do nothing. (Except complain, which rarely ever results in an API being restored.)

There used to be a free YouTube API. I had scripts that used it. Today I use the video page's HTML and JSON, not an API. (I more or less make an API for myself.) Unlike the script I once used with YouTube's API that is now useless, the script I use today keep working for the long-term. I have only had to change it twice in the last decade. These were very small changes that only took a short time to fix. (It seems like folks wait much longer for fixes to youtube-dl.)

If Youtube exposed a real public API it'd be 1 week when someone makes a far better front to it. That's why it's not exposed.

Awesome!! A custom client for YouTube that bypasses all their front end code. I wish there was something like this for every single web site!

I wish there was one for Facebook

The reason I stopped integrating any API based system (FB, Twitter, etc...) into my code bases for services I don't pay for (or my customers) is because they all changed willy-nilly and broke on a regular basis.

This is more likely to break (be broken by google) than an official API, and those are bad enough. (hard pass on even trying this out, especially if it's good/nice I'll want to use it and kick myself later for being an idiot.)

Unfortunately it’s rarely that these kind of decisions are based on technical merit, but instead because they’re part of the business proposition, or requested by other parts of the business (sales/marketing).

Other than that, I fully agree that you should try to minimize your dependence on them, it’s not good a good position to be in; what’s in the best interest for YouTube today may not be the case in one year.

In my experience, private APIs are expected to be refactored, and you should not depend on them. I don't think sales/marketing are involved at all. For a public API, sure, but don't expect private, undocumented APIs not to break you, in fact, you SHOULD expect them to change, especially in any codebase that is actively being maintained.

Cool project but I wonder if the name will catch flak from the lawyers. Trademarks and all that. At they very least a big "This is not affiliated with Google/Youtube" seems like a wise precaution.

They should rename it to Innertube.js (which is the official name of the private API, and as far as I know not trademarked not used in any user-facing resources).

Valve.js would also work, since it allows access to the innertube.

I would advise against this on pure UX grounds due to the existence of Valve the game software company.

They could name it after a type of valve. Stopcheck.js has a nice ring to it and seems kind of apt here.

Presta.js if you want to please the cyclists :D

Bikeshedding.js if we want to continue this conversation

Red.js for the best color coordination

And that would totally not infringe on anything else, right.

It probably wouldn't, because it would be clear that it's got nothing to do with Valve the gaming software company. Trademarks aren't blanket coverage of any possible uses of a name.

Not really, there are rigid categorisation of trademarks (Nice Classification). All software is in Class 42 (https://www.wipo.int/classifications/nice/nclpub/en/fr/?basi...) so Valve would actually have standing here.

PS: Nice here refers to the French town (compare to Berne Convention, Geneva Convention, Treaty of Paris), not the common meaning in English.

Thanks for pointing that out


They should immediately rename it so as not to be caught the way YouTube Vanced was. A disclaimer is not a sufficient response to trademark issues.

Or do this:

1. Release it as youtube.js for the name recognition

2. Wait for the certified letter

3. Announce you're renaming your project and get another 24 hours of exposure in the news

This guy is playing the game.

Something like this lives or dies based on how much you annoy the people with lawyers to spare. Doesn't seem like the best plan.

Is there any better plan in an era where the real currency is attention?

I doubt having the project named after youtube is that important for attention.

The attention comes from the stories that happen after they leak the C&D that Alphabet is going to send them for naming it after youtube, then quickly roll out a rename.

Well ... good luck to them I guess. That does not sound like a plan I'd personally enjoy.

they have a disclaimer below


edit: why am i getting downvoted?

IANAL but I've worked with some of them and I'd guess that isn't sufficient.

My advice to avoid any negative attention from YT/Google (but again IANAL) would be to do two things:

1. Move the "this is an unofficial" disclaimer at the top and bold and/or italic it so it's totally unmissable. I would use the same approach as MarshallOfSound: https://github.com/MarshallOfSound/Google-Play-Music-Desktop...

2. Praise Youtube/Google for having such a great API that empowers and enables their users to get the most out of their subscription. Make sure to point out the (true) benefits that YT gets from having a library like this that is available but they don't have to maintain and promise nothing to.

The disclaimer doesn't really matter, as 1) we wouldn't really expect that many people to read it and 2) it is going to get used in a ton of contexts without that disclaimer... such as in the headline of this Hacker News post. I can't make a company "Apple Computer Backstage" and just put a little note on my website that says "Apple Computer Backstage has no involvement with Apple Computer". This project simply should not have "YouTube" in its name... or, if it absolutely must, it at least needs to be on the other side of some kind of preposition. This insistence upon using other peoples' trademarks in the names of third-party clients based on adversarial interoperability--something that the company often has every reason to "throw the book at" as it is inherently hostile--is what ends up killing the vast majority of these projects over time. This isn't "in for a penny, in for a pound": you need to carefully choose your battles, and flagrantly violating someone's trademarks is going to be your weakest link as it is such low hanging legal fruit and doesn't actually buy you anything.

are you people seriously under the impression that trademarks and such make any difference if a big US corporation wants to fuck you? they can sue you out of existence for violating laws you aren't subject to and/or EULA you've never agreed to

Well it certainly makes it easier for them if you've violated their trademarks.

So why give them an easy out?

If you’re YouTube or any site, and want to stop these sort of wrappers - what’s the easiest way to do so without breaking your own site?

I find this task to be an interesting engineering problem.

A related question is if there’s an unspoofable way to detect a client.

It's probably impossible to stop it, but you could make it pretty painful by aggressively breaking/changing the API contract. You could maximally engage human ID and fingerprinting techniques and CAPTCHAs, or do machine learning on usage patterns to find likely bots and ban them if you're willing to accept that false positives will hurt innocent customers. Whatever you decide to do, it won't be free because it will also hurt your own devs (such as making it harder and more complex to work on). Obviously your own devs can have advance notice for changes, but it's still a pain to deal with.

Overall though, I would seriously ask why? Anti-cheat for a game maybe? It will cost you time/money to prevent, and it will hurt people that like your product enough to hack on it. As a user who rejoices in having APIs I can use to automate products I like, I'd be far more likely to pay you if you have an API I can use.

Youtube in particular sort of famously had content protection code, which took in part the form of a VM implemented in Javascript that probed its runtime to detect non-browser or otherwise headless clients. I think Mike Hearn worked on it.

Found it:


There doesn't seem to be a whole lot in there on YouTube js obfuscation techniques?

The demos have Node.JS examples. If that's the case, it doesn't seem possible to block.

If it's running in a browser, Google can simply disallow those domains to make API calls.

Also, at least Apple will block apps that make unauthorized API calls to third parties.

The agent can be faked.

There are CORS rules, but those are enforced by the browser, a backend cannot prevent you from calling it, except by requiring an access token or something similar.

So a form of DRM on the APIs themselves? It wouldn't be easy. How would you determine if the request is originating from an official YT developed app (iOS, Android, ...) or client (YouTube.com); or a non-approved client?

I suppose you can have a system which works like TOTP except for machine-to-machine. Although it would probably be broken since anything on the client side can be disassembled. The UX would likely suffer as a result as well.

I think the best you could do is attempt detection/fingerprinting, but I don't think it's possible to stop it since you never have full control of the client.

I once saw a certificate-based implementation once where the server issued a temporary x509 cert to the client and then used mTLS[1]. It did reduce "unauthorized" clients by raising the bar, but it ended up making life way harder for their devs and the people they really wanted to stop just implemented the cert strategy and moved on.

[1]: https://freedomben.medium.com/what-is-mtls-and-how-does-it-w...

There's no perfect solution, but you can make it painful. One solution I've seen, which only works in a server-side rendered site, is for the server to generate a random name for each form field being rendered. The mapping of random id to real field name is kept in the user's session information server-side, so the translation is then done server-side as well whenever the user performs an action.

At that point anyone writing a library like this would need to actually pull in the rendered page on which the user is supposed to be navigated, scrape the field names off of that (which won't be easy), and only _then_ could they perform the form action.

But if you're a big enough site, someone will likely still take the time to do it.

Every few requests put up a catchpa. That is how google did it for their search


Wouldn't there be all sorts of human detection that they could do, similar to how game cheat engines work? A human is going to move the mouse across elements, drag, poke the screen, be slow, etc, and all in fairly predictable ways. Some API calls almost certainly require human interaction, where some interaction graph could be feed as a key to the API. It's cat and mouse, but at some point the mouse is going to get tired.

This is close to how recaptcha v3 works. It can look at the users behavior on the site and classify normal users vs bot users. You have to do some setup to feed your own set of user action data into recaptcha though.

Given the power of Google they may be able to force you to sign in after watching a few videos.

Or only serve videos if they detect some kind of physical input(mouse/keyboard)

But they don't seem to care that much for non-mainstream tools, or this would be blocked already

For small scale sites it's definitely not easy to block

They may be some paid libraries that promises to do it, but I've never seen one that seemed really unbreakable

Even if you go all the way and block EVERY possible way of doing this, you can make a puppeteer script that could watch and record sound/video directly on the screen in like 15 lines of code, even if it would be really slow to get long videos

The only bulletproof way is to change your business model to make these useless, irrelevant or no longer threatening. Yours and the user’s incentives should be aligned.

If you serve content, put it behind a paywall and rate-limit based on the maximum amount a human can reasonably consume and stop caring whether the user uses your own client or something like this - after all you’re getting paid either way.

The only businesses that are threatened by unofficial API clients are cancerous “growth and engagement” crap where the “value” is the wasting of the user’s time. Don’t be such a business and you’ll be fine.

What about something like Craigslist? I would rather that not be behind a paywall. I would think they would also enjoy not having their site scraped and re-created.

Keep changing the implementation, keep changing names, keep changing the API formats.

I'm definitely curious if there's a way to do a rotation that resists easy automatic code analysis.

Facebook does something similar to combat adblockers. They mangle the names of div elements to make sponsored posts indistinguishable from friends/group posts. I'm not aware of any browser plugins which are effective at blocking FB ads.

Anyone know if other websites put as much effort into anti-adblock engineering?

A vision (ML)-based blocker should easily blow rigid rule-based cosmetic blockers out of the water and render div soup obfuscation completely useless. Not sure anyone’s investing in that space though.

Once you’ve defeated obfuscation you’ll be dealing with DOM integrity-based defenses.

I think Workday does the same. Trying to scrape data from there was a nightmare (team didn't have api access).

Could you not base a filter on whether a div contains text matching your list of groups and friends?

FB Purity seems to be keeping up. But it is not a general purpose adblocker.

you don't even have to look at the script, just at the network requests

you'd basically have to make your own stealthy video format, otherwise you can just catch network requests

Require request tokens. Authorized clients would have some sort of generator, anything else would fail to execute.

Provide a public API (charge for it if you have too). The videos on YouTube are the property of the creators, not YouTube.

YouTube already has a public API

Firebase app check does this and I wouldn’t be surprised if YouTube adopts it eventually.

I mean, offering your own API gateway is pretty effective.

Seems like a use case for remote attestation.

Indeed. Yet another reason why we must rally against such efforts.

Yeah, it's not a future that I'm looking forward to.

I suppose it will break more often than youtube-dl.

use yt-dlp - seems to be more actively maintained, at least for the youtube downloader which runs at full speed for me, while youtube-dl downloads videos at <50kbps.

It would be great if this had a CLI tool, so that it could be used as an alternative to yt-dlp. Or a web frontend as an alternative to Invidious, which breaks more often than not.

That said, I wouldn't be surprised if Google issues a C&D, or just inevitably breaks it, especially if it uses undocumented APIs.

Its VERY easy to make one.

  1. Install NodeJS and NPM (if you don't already have it)
  2. Create a new folder
  3. Run "npm install youtubei.js@latest" in that folder
  4. Create a new file in this folder called ytdl.js (or whatever you like)
  5. See section "Downloading videos:" on the github page. Make the contents of the file exactly like that. i.e. just cut/paste that example.
  6. Replace line 'Looking for life on Mars - documentary' with the name of the youtube video you want to download (ideally this should come from args)
  7. Run "node ytdl.js" and it should download

Well, sure, but I'd rather have this as part of the official project, instead of me messing with the JS ecosystem, or trusting a 3rd party to do it. It should be trivial for the project maintainers to do this, as you say.

Love this Q&A:

    Do I need an API key to use this?

    No, YouTube.js does not use any official API so no API keys are required.

I'm surprised they still support video dislike via API, but have removed it from the user interface... I understand they are likely worried about backwards compatibility with the abundance of client-devices, and not inadvertently breaking some app somewhere, but why not just make it a noop...

Unrelated: My treadmill has the absolute worst YT client I've ever used.

YouTube hasn’t removed the dislike button. They have removed the dislike count.

My mistake

My understanding was that they still want to hoover up that sweet sweet preferences data (of which dislike is a useful metric). It's just not going to be exposed externally.

They do and you can install extensions to see the dislike count though sadly they slow down my experience every time Ive tried them (at least on Windows Chrome).

You cannot longer see the true like/dislike ratio. They removed the dislike count even from the API since December 13th [0]. The extension does some guesswork to calculate that for you [1].

This library also doesn't give you the dislikes anymore (just tested it).

[0] https://support.google.com/youtube/thread/134791097/update-t...

[1] https://github.com/Anarios/return-youtube-dislike#what-it-do...

Ah I wasn't aware. For what is worth the gueswork does seem accurate enough to be useful on the videos I tested last time.

I've been wanting to use the Youtube Music API to automate some personal chores (like building/cleaning playlists, etc) and was very discouraged. I actually switched to Spotify (trial) partially over it but there were a couple of other (off-topic) reasons I didn't want to stay with Spotify.

This looks like a wonderful tool! And it's not in Python :-D (sorry python people). Like others I'm a little concerned about breakage as youtube APIs churn. Does anyone know what Youtube's approach to backwards compatibility is for internal APIs? Some companies just wait until 98% of user's are on the new clients and then rip stuff out, but others I've worked with basically don't allow breaking the API except in important circumstances and they stick around deprecated for a while.

Ths drove me nuts because Google Music had an actual public API. The service was deprecated in favor of YT Music, which has less features and no equivalent API.

That deprecation was a catalyst to move my music collection elsewhere, so it's not threatened by the whims of Google anymore.

Makes a ton of sense. I almost bailed for the same reason. It was such a ridiculous downgrade. I just barely (today actually) got back a super important feature to me that Play Music had years ago: Save a queue as a playlist. If this private API gets a C&D or gets aggressively broken, I absolutely will bail too.

Fwiw I've played with the Spotify api before for managing playlists in my account and that was not complicated. The GDPR export is also computer-readable (json) though it takes a literal month (sleep(rand(10,20)*days) for the initial set, then contact support to also get the "diagnostic" data associated with your account and wait some more). But it's probably more work for you to switch than it's worth.

+1 Social Credit Points for posting "SPOTIFY BAD" in a public forum


Worth noting that if you automate too much with this API, you'd be smart to not do it with a Google account you care about, or it'll get banned.

And they ban any accounts with matching recovery or verification phone numbers and email addresses too, or part of the same gsuite domain.

Sadly, I don't think it will let users view video that require login.

For example, some 6min show that goes on live french TV everyday got flagged and requires login to be viewed, for age reason.

Maybe there was curse words, or some butt-shaped thing in it?

Does the OAuth login give you full access like a cookie login?

Personally I'm using Firebase's private API because the public API is missing a couple of features. Currently I just regularly extract my Google cookies from Firefox and use those. But an easy login that grants access to the private API would be cleaner of course.

Very interesting. Does anyone know how stable the InnerTube API is?

I'm always nervous about 3rd party API wrappers. It's basically saying "Here's the keys to my (users) Google account, please don't do anything bad". Even if it's open source, there's no guarantee there's no malicious change in version x.x.N+1

You could just pin to a version like a good dev and rid yourself of the problem.

Just do a quick diff check when you upgrade. If you're too lazy to do that then you never really cared about security all that much in the first place.

> just pin to a version like a good dev

Good devs don't keep up with dependencies' security updates or was this meant ironically?

Also, you'd have to pin to a hash, I imagine. Changing the version tag and pushing with -f doesn't sound like rocket science for someone with malice in mind. Next time you deploy it, you'll still get that code.

WHAT? How am I supposed to make a social media calculator flashlight app without pulling in 200 random libraries and their 5000 dependencies?

Any software you use might steal your YouTube credentials.

It might be this API wrapper or it might be any other dependency. It might even be the scientific calculator you installed that had nothing to do with that project.

What makes this especially scary?

Technically nothing can be trusted e.g. can anyone trust their silicon, wires, device drivers, compilers, OS, routers, SSL CAs, etc.? Trust has to happen at some point.

The difference is that it's trivial for this developer to insert a backdoor to steal Google credentials since they know exactly how and where the oauth tokens are located. It's significantly harder for e.g. a webpack developer to insert a backdoor to steal Google credentials since they would have to first determine the code it's processing is handling oauth tokens and then figure out where they are stored.

The barrier to entry is the key in assessing your threat model. 3 letter agencies may not care about the cost if the target is valuable but a bored kid on the other side of the world will give up pretty quickly.

> The difference is that it's trivial for this developer to insert a backdoor to steal Google credentials since they know exactly how and where the oauth tokens are located

So does all the malware: your browser's cookie store

Not every application runs in the browser i.e. not SPA and any injected frontend js would have to bypass browser's sandboxing to steal another domain's cookies i.e. a zero day which is beyond your threat model

> Not every application runs in the browser

We're talking about YouTube in this thread.

> any injected frontend js would have to bypass browser's sandboxing to steal another domain's cookies i.e. a zero day which is beyond your threat model

Where did this random unrelated attack vector come from? We're going talking about running untrusted software on your computer, remember? That's the attack vector we're discussing.

Your point was "malware in random unrelated software won't know where to look for my YouTube session key", my response was "it will".

> We're going talking about running untrusted software on your computer, remember?

You are talking about running untrusted software on your computer. This thread is about the "youtubei.js" npm package that is acting as a wrapper around YouTube's API.

My point is it's trivial for this developer to add additional code to `youtube.signIn(creds)` that I'm calling vs. any developer of one of my dependency to inject code to steal the same creds. If it's frontend code, their injected code cannot execute beyond mydomain.com. If it's backend code, their injected code would have to guess how my application works. I guess they can trivially dump my env variables but that's about it.

After giving this topic a bit more thought, you are partially right. Any malicious dependency developer can simply pollute the XMLHttpRequest prototype to steal the results of oauth calls. If this is the threat model, then we can simply Object.freeze(XMLHttpRequest). But at this point, we're in a game of cat and mouse.

Ultimately, it's still (somewhat) harder for you to steal credentials from injecting code into my app vs I explicitly calling your code with the credentials.

> If it's frontend code

Since you cannot use youtube.js in a browser (because it's not supported & because of CORS checks would fail), we can only be talking about 'backend' code here.

Thus the comparison has to be with other backend dependencies and software running on your computer/servers.

Nice, thanks for sharing!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact