Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tried to build a scraper in Rust just a few days ago and got stuck trying to concurrency limit my calls (the website I was scraping appropriately had a rate limit). I couldn’t figure out how to get tokio::stream/tokio_stream to work. Does this fix that problem?


I assume the website has something like "requests per second" quota, which then you'd want the `governor` crate [0].

It was recently used to implement rate limiting middleware for both tide[1] and actix[2]

[0] https://docs.rs/governor/0.3.1/governor/_guide/index.html

[1] https://github.com/ohmree/tide-governor

[2] https://github.com/AaronErhardt/actix-governor


Yes, you can either limit how many request can be send concurrently or enforce a fixed or random delay in between consecutive requests.

Basically this a just a futures_timer::Delay [0] that is reset after each request, which is non blocking.

[0] https://docs.rs/futures-timer/3.0.2/futures_timer/


How do you set the concurrency limit? I'm down to use your framework (thanks!), just curious how you implement it. I couldn't get Stream::buffered to work correctly.


You can use `Futures::StreamExt::buffer_unordered` for this. [1] is an example where I used it for benchmark which creates a certain amount of QUIC connections at a time.

But you can also spawn all tasks upfront via `tokio::spawn`, and let them wait on a `tokio::sync::Semaphore` before making the request. The drawback of this is that you might allocate more memory for tasks upfront - but if you don't have an extremely high number it might not matter.

[1] https://github.com/quinn-rs/quinn/blob/de627437bc7d836564c36...


This is a late comment, but this helped me out a lot. Thank you for the detailed explanation and code example, I appreciate it.


You can limit the numbers pf concurrent calls using a semaphore and acquiring it before making a http request.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: