Last month (https://news.ycombinator.com/item?id=13080505) I’ve started a small campaign to update thread format and make it more parser friendly for whoishiring.io and others website I know that at least few websites that do similar thing.
As a result off this calling, many posters actually complied. Which resulted in more accurate map positions, better tagging (REMOTE, VISA, INTERNSHIP, …) and for some I was even able to get logos. Thanks!
Here is the format.
1) {company} | {job title} | {locations} | {attrs: REMOTE, INTERNS, VISA, company url}
Google | Software Developer | SF | VISA https://google.com
DuckDuckGo | Software Developer | Paoli PA | REMOTE, VISA
or
2) {company} | {job title} | {locations}
Google | Site Reliability Engineer | London, Zurich, Sydney
Facebook | Web-developer | London, Zurich
* I have limited character allocation to wax poetic, and my listing is for a company (Pivotal) with 19 established offices and more that aren't really publicised yet.
* The requirement for positions in given locations changes constantly. One month I went through our list and posted those locations. By the end of the month, it was out-of-date.
* Then there's the problem that I'm listing for multiple disciplines. Adding an ad for every single role seems like it would be detrimental.
* Last but not least: I can earn a referral bonus from the ads I post. I'm not sure how my ad being slurped into a different site helps me, given that I expect my link might not be very prominent. (edit: except you seem to reserve a spot for URLs, so let's drop this one and chalk it up to "Jacques speaks before he reads, episode 20 kajillion")
Still, I'd be interested in making it work better.
I modified it a tad bit when you're downloading the html job files, (If that's what you're doing):
e.g.
m = re.match("\s(?P<company>[^|]+?)\s\|\s(?P<title>[^|]+?)\s\|\s(?P<locations>[^|]+?)\s\|\s*(?P<attrs>[^|]+?)$)?", "spencertechconsulting.com | web developer | New York | REMOTE, INTERNS")
EDIT:
You missed the last \| pipe symbol for separating the location group from the attribute group
https://whoishiring.io
If you post here:
Last month (https://news.ycombinator.com/item?id=13080505) I’ve started a small campaign to update thread format and make it more parser friendly for whoishiring.io and others website I know that at least few websites that do similar thing.
As a result off this calling, many posters actually complied. Which resulted in more accurate map positions, better tagging (REMOTE, VISA, INTERNSHIP, …) and for some I was even able to get logos. Thanks!
Here is the format.
or I’m using this regex to test the firstline. You can test it in Python or here https://regex101.com/r/relwQD/3 (for the match look right).