Hacker News new | past | comments | ask | show | jobs | submit login

Use selenium or puppeteer with stealth plugins and a real user agent and I suspect it will be A LOT harder for them to block you.



Selenium is still relatively easy to detect by JS, but on the other hand it does require some dedicated[0] effort even if just the user-agent is overriden and that itself might be enough to work for relatively long time. Other important thing is rotating over different IPs that appear realistic[1]

[0] dedicated in a sense of lets include some script specifically designed to detect Selenium. [1] for example it's kind of unlikely that non-bot visitor is using IP from a range used by amazon instances. not sure how often this is used but I assume that most bot-detection systems would use that information at least as one of the metrics


I have written browser extension.


how sophisticated is it? a browser extension sounds pretty high-level and limited to me. does it just use the host browser's user agent? or is it able to spoof multiple legitimate user agents, connect to fb via proxy servers, handle queueing, rotation, retirement, etc.? in which case how is a browser extension more useful than [scripting language of choice]?


There are a lot of JS-based detection techniques that rely on things being or not being available. By using an actual browser you have JS/WASM environment that is identical to what is expected from a real user. By using [scripting language of choice] you would need to emulate everything that a given bot-detection system tries to check.

From my very limited experience there are two main categories of websites:

1. Using curl and/or [library of your choice] in a [scripting language of choice] is enough

2. Forget about not being detected without a full blown browser unless you want to spend endless hours trying to emulate whatever is needed to be emulated and also willing to burn some accounts and IPs in the process.


A lot of questions here. In short I have tried to imitate how I browse Instagram myself but not too sophisticated. I think that higher sophistication might have helped a little bit. Scripting based solutions are detected in minutes, browser based solution was blocked only after 20+ hours. I have not used any proxies while I think it can be done - however that does not solve the problem.

Answering last question, it depends what do you mean by scripting language. Let's assume it is Python then you have two choices: imitate browser or risk to be detected as bot quite fast. Writing browser extension is quite easy and you can imitate real user quite easy. The only problem that you will have is to imitate real human being in a way that does not match Instagram's bot detection algorithm.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: