Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I added public H-1B data to my automated job board (techjobs.xyz)
3 points by james_dev_123 5 months ago | hide | past | favorite | 2 comments
I maintain https://techjobs.xyz, which is an automated tech job board. Every morning I scrape the websites of ~20k tech companies and feed the data through GPT, which labels each job's category, location, seniority, etc.

I recently downloaded the publicly available H-1B dataset from USCIS and ran a bunch of scripts to sort / categorize the data.

I merged this data with each company's listing on techjobs.xyz, and now you can see the number of petitions made by each company.

Furthermore, you can specifically filter for companies who have offered H-1B sponsorships in the past (through the "Advanced Filters" tab), and you can see exactly how many applications they've filed each year.

As a Canadian working in the US, I initially built this feature for myself.

But, I know that a lot of Hacker News users are also working in the US on some sort of visa and need to know if a company can provide visa sponsorship before applying, so I wanted to share my work with the wider community.

Please leave any feedback here, or email me at dorfmanjames@gmail.com




How long does it take to run your scraping pipeline?


End-to-end it takes ~10 hours. The biggest bottleneck is the actual scraping itself. The GPT part is relatively quick.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: