Common Crawl Foundation | REMOTE | Full and part-time | https://commoncrawl.org/... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

ccgreg 7 months ago | parent | context | favorite | on: Ask HN: Who is hiring? (May 2024)

Common Crawl Foundation | REMOTE | Full and part-time | https://commoncrawl.org/ | web datasets

I'm the CTO at the Common Crawl Foundation, which has a 17 year old, 8 petabyte crawl & archive of the web. Our open dataset has been cited in nearly 10,000 research papers, and is the most-used dataset in the AWS Open Data program. Our organization is also very active in the open source community.

We are expanding our engineering team. We're looking for someone who is:

* Excited about our non-profit, open data mission

* Proficient with Python, and hopefully also some Java

* Proficient at cloud systems such as Spark/PySpark

* Willing to learn the rest: crawling parsing indexing etc.

Contact me at jobs zat commoncrawl zot org.

Gloria_Kambua 7 months ago | [–]

This is interesting. I just send my application. A little bit late but I'm hopeful.

yusufgp 7 months ago | | [–]

this looks an exciting opportunity. Sent you an email just now. Thanks.

rcshubhadeep 7 months ago | [–]

Very excited. Just applied.

ccgreg 7 months ago | [–]

Thanks!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact