Hacker News new | past | comments | ask | show | jobs | submit login
Tell HN: Toptal's HTML minification API occasionally injects tracking JavaScript
97 points by cududa on Sept 28, 2022 | hide | past | favorite | 25 comments
Just a heads up for anyone using their API - about 1 in 5 requests will return with Cloudflare Insights tracking JS. It's not mentioned anywhere in the API documentation, Privacy Policy, or ToS.

Fairly certain this is the package they based their service on https://github.com/tdewolff/minify

Edit: Here's a tweet I posted with a screenshot of the code: https://twitter.com/cullend/status/1575243757624360960?s=20&t=JVhqXDJExBnrEVOXeFH4Jg




My guess as to what is happening:

1. Toptal uses Cloudflare for DNS

2. Toptal uses Cloudflare Insights, which automagically inserts the tracking snippet of `https://static.cloudflareinsights.com/beacon.min.js/\\*`

3. You send the request to Toptal and somewhere along the line, Cloudflare misinterprets the request and injects the tracking code

It's not anything nefarious by Toptal. It's not their minify script. It's a misconfigured page rule, or similar. It's a bug.


Sounds pretty accurate! Hopefully they get it configured properly!


I really would like to know why anyone would use an API to minify javascript/css/html instead of using a local library, built-in webserver features, or relying on some front-end service like Cloudflare.

Feels like we've jumped the shark if you need to send un-minified stuff to another server to get minified, sort of defeating the point in the first place...


Using some node library you pull down is still vulnerable to exactly this same kind of injection attack vector.


Except you're not then pounding on some pet side-project "API" stood up by a poaching agency...

Seems insane anyone would even entertain the thought... it's trivially easy to minify whatever you want right within your app.

And now here's OP bagging on this company for not disclosing this in some ToS or whatever, even though OP is the one abusing a free, obviously not-serious service.


Well when you google "CSS Minify" or "JS Minify" it's the top result - so it clearly isn't some low use tool. I just decided to use it for HTML as well. Wasn't "pounding it". The API limits to 30 requests per minute. I made around 50 requests in 3 minutes.

Most local minification packages I tried spit out garbage, so just went with their API. Took a few hours to get a proper package stood up (the one I mentioned above).

I'd generally just assume that if you're injecting tracking/ analytics into an output you'd tell folks. As they didn't mention it, thought I'd just alert folks. Not sure why you're so outraged I mentioned this is happening.


I'm not outraged, I'm flabbergasted.

Flabbergasted this idea made it's way into a real app, and flabbergasted this post made it's way onto the front page of HN.

You went through great lengths here to make sure everyone knows a random for-funsies "API" from a random google result doesn't behave like you expected it, complete with the multi-tweet write-up and everything.

Looking at what Toptal does, it seems glaringly obvious this "API" was setup for interview coding questions - nothing more.

You've drug this company's name and reputation through the mud, right onto the front page of HN. All over a misunderstanding on your part, as we all now know (Cloudflare doing what Cloudflare does and all...), and a misconfiguration on the web admin's part (missing page rule apparently).

I'll leave it at that... but perhaps your post/thread should be updated to indicate this company is in fact not doing what you have accused them of.


It's like left-pad.io but real. Incredible


Closure compiler has been a thing since forever. Some people don't want to fiddle and don't care about owning their tools.


Well the API was easy to implement, took minutes. Was more just out of convenience for producing a lot of assets that then get served to a lot of people. None of it ever went into production because we caught the requests pretty quickly this morning. As well, most of the python packages would mangle the minification - the one I linked to in the top post works great - documentation isn't up to date for python, but not horribly hard to setup


I think it is part of cloudflare's browser insights which is now part of their "Web Analytics". You will see this almost with any site using cloudflare that has web analytics enabled.

https://community.cloudflare.com/t/beacon-min-js-as-malware/...

https://developers.cloudflare.com/analytics/web-analytics/

The site owner needs to configure cloudflare correctly to add analytics on certain pages only otherwise cloudflare injects it by default.


Are you sure that's not Cloudflare injecting the JS it on behalf of TopTal because they have CF analytics enabled?

But I agree with your point, this should have been mentioned in their ToS and/or Privacy Policy.


Can't be sure, but that would be rather frightening if cloudflare is just injecting JS willynilly


If CF is acting as a reverse proxy as it so often does — it's one of their core features — it's not exactly willynilly.


This is 90% in similarity to one of the top reasons we are pushing for HTTPS and DoH! The supposed "ISP" inserting "stuff" into our requests. Would be 100% except it's 5% less because the server "agreed" to it, and another 5% because it's not ads, just "tracking js". Maybe knock off an extra 80% because it's "Cloudflare" and they're the new "do no harm" giant like Google.


"about 1 in 5 requests" maybe makes it a little willy-nilly. It's not yet clear what's happening.


I wasn’t able to reproduce. The error pages do return the code, though, but I was expecting that. I’m hoping that OP can provide a request that does reproduce it.


Sure thing - there's a link at the top of the post with an example of code. HTML is about 200 lines, each one slightly different, did a 3 second wait before each post request (25 posts, 5 with the offending tracking code), used their python API


Tried with all my might and no dice: https://gist.github.com/joshmn/c87a52b5328aa4f452be7970af4ba...

Now, are _you_ serving from Cloudflare and you're seeing this in production? Or is this coming from the output?


Ah - guess I can reply now here, but HN seems to be rate limiting me - lets perhaps DM on twitter to find repro steps?

But to your main question - the resultant HTML is occurring after running a python script fully locally, no hosting, prod, dev server, or anything related to Cloudflare coming into play.


My email's in my profile :)


That's sampling. Don't inject everytime, just inject 20% of the time.


Cloudflare doesn't have any sampling setting anywhere close to 20%.


If you proxy your site through Cloudflare and turn on Cloudflare analytics, its default behavior appears to be to automatically add the JS tag for you in HTML responses [1] (you can also disable that and manually add the tag to your HTML)

They provide a lot of this type of thing - rewriting HTML responses with optimizations "at the edge" instead of you doing it at your origin.

[1] https://developers.cloudflare.com/analytics/web-analytics/ge...


Good lookin out. Appreciate the heads-up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: