Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Crawlab: Open-Source Web Crawler Admin Platform That Runs Any Language (github.com)
116 points by tikazyq 63 days ago | hide | past | web | favorite | 18 comments



Hi,

Thanks for the upvotes.

Crawlab is a golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Technically you can run any spider on it. It has both English and Chinese language support.

Github Repo: https://github.com/tikazyq/crawlab Demo: http://crawlab.cn/demo

Since its launch in March, Crawlab has received a lot of positive feedbacks, especially about the flexibility and appealing Web UI. And Crawlab is evolving fast, we have developed many features through continuous iterations.


A good project, how does it compare to python based scrapy.org.

For admin you can use scrapyhub open source or also another project https://github.com/Gerapy/Gerapy


Scrapy is a web crawler. This project is a web crawler management UI/platform, so it presumably manages your scrapy crawlers/instances and schedules them.


So if I understand it correct Crawlab is another simple easy to use admin for managing web crawlers, one still needs to use scrapy or write their own crawlers. It should be similar to the admin tool I mentioned in my earlier comment and at:

https://github.com/topics/scrapy-ui

https://github.com/topics/scrapyd-ui


There are a couple of crawler management projects: scrapydweb, spiderkeeper, gerapy, crawlab. The first three are based on scrapyd.


Great to see an awesome product - primarily in Chinese!!! That will teach me to not take English language domination for granted!!


Certainly interesting to see English's domination increasingly challenged on open source tech projects. However this makes contributing harder for non-Chinese speakers. I had a look at the git's issues page and all the discussion is in Chinese. Google translate can help, but I'm not sure it would be enough for some subtle problems. Also not sure how communication would go with PRs if part of the team is strictly sinophone.

Great project nonetheless. Will likely give it a try. Keep up the excellent work!


We'll have to get used to this and I think it's actually a good thing.

Of course a lot of folks speak English but Chinese is also very important and will be more so in the world.


I really appreciate having a "lingua franca" of programming. Projects in other languages are certainly interesting to see, but I also appreciate that most authors use English, it contributes to a larger worldwide community.


do you really believe that?


Thanks for the feedback. Actually I saw a lot of great Chinese projects on Github trending and sadly they are Chinese only. I would definitely agree they can do better by translating into English!


Thanks all for the upvoting and positive feedbacks for Crawlab. The reason why Crawlab is mainly focused on Chinese is because it was initially promoted in mainland China tech sites (Juejin, V2ex, etc). Due to the GFW we cannot access the info outside China, therefore it would be difficult for us to know the feedback from non-Chinese developers.

We definitely would be happy if more contributors can join Crawlab development, so we will be working on the improvement of multi-language support including English documentation, Code of conduct, Contributing.md and English communities. Our team is small (please check out the Contributors section) but from top companies in China and we would be happy to share knowledge between Chinese and non-Chinese developers.

Btw, what is the best tech community? (In China we have Wechat group)


Looks like a cool project, however I can't seem to get into the demo (it seems to indicate using admin/admin but that doesn't work).

Would be great to have an english language option on the demo login :)


Thanks @atymic for the feedback. The initial password for admin is changed so that no harmful action would be done on the demo. Instead, you can still sign-up to checkout the demo.

And we do have an English version but not on the Login page. Will definitely add into it.


Sounds like you're using redis as a message broker for tasks here. Are you using redis streams?


No, we are using SubPub for message communication between nodes. For tasks, we are using hashed list. English documentation missing but we will add it later.


Cool stuff. Does it really run any language, or only languages that have had integrations written?


Crawlab is based on shell execution, so basically anything that is runnable in shell, it can be run on Crawlab, i.e. any language.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: