Hello everyone. I quickly built this little tool based on public records provided by the government.
I did the same a couple weeks ago with H-1Bs and I just added over 1.1million green cards records.
The data provided is not perfect so i'm still working on cleaning it but it should give you an idea of what is going on.
so just dump the data in and process via sql. What about pre-processing ie: different data sources (pdf, cvs), incomplete or overlapping data? I imagine some code had to be written to do this?
All my datasource where csv and mdb files. So, I directly imported them to PostgreSQL. Very little code was written to clean the data. 99% of the cleaning was done with SQL queries.
The code is mostly used to display the data. Cleaning the data with SQL queries is much faster than writing code.
Great tool! There seems to be a bug though when I drill down to year plus country of citizenship and then try to filter by job description and state or city. The job description search term ends up in the state input field.
This looks great!
Can the green card data be shown by categories? Like how many green cards applications were filed in each category like EB1, EB2, etc.
kinda weird that all the timeseries charts have time going backwards. going lower to higher is certainly the standard way to do it. for eg: http://data.jobsintech.io/companies/google-inc
Just as an FYI, citizenship is not the same as country of chargeability (usually country of birth), which is what the USCIS looks at for placing you in EB-ROW/India/China/Mexico/Philippines. So you could have an Indian passport but if you are born in, let's say, Saudi Arabia, you are placed in EB-ROW.
Thats correct. Also, if your spouse is born in a different country than you, you could charge your application to that country. For example, if your spouse if born in Kenya but you are born in India you can charge your GC app to Kenya which is ROW ( Rest of World).
What is this?
This website indexes all available LCAs from 2001 to 2015.
Where does the information come from?
LCAs are public records and provided by the "Office of Foreign Labor Certification".
IMHO, this is misleading. LCAs don't have a 1-1 corelation to 'Green Cards'. This data is based on the PERM process which is just 1 stage in the green card process. LCAs aren't green card applications, but a labour market test based on which green cards are applied for.
this is roughly consistent with what I know, that India engineers took nearly half of all H1Bs, while its major peer China is taking less than 10%.
I could never figure out what's going on to make such a big gap, I would think each takes roughly 15~20% makes more sense.
One theory is that India IT giants are applying for lots of H1Bs then filling them when they're approved, they know the system too well. Meanwhile the Chinese IT workers/students don't have those group-effort.
Also India managers like to hire Indians, while Chinese does the opposite, over time that also make a big difference.
I work at Microsoft and the number of Indians there seems disproportionate to me. I too wonder why so many of them come to the US. Maybe IT education in India is really strong?
Two funny things I observe regularly:
1) Sometimes I take an MS shuttle to work. The shuttle is full, and I'm the only non-Indian in it.
2) Sometimes I'm in my building's lobby and when I look around, I'm the only (or one of 2-3 people) non-Indian there.
I pity those guys because they have to wait for 10+ years to get their Green Cards, even if they're EB-2.
More like systemic visa sponsorship fraud for reasons of cost saving. The visa candidates are often as not frauds as well: http://www.dawn.com/news/1080040
Indian IT community is much more entrenched in the US market. If a Chinese engineer wants to find H1-B work, he will most likely have to go at it by himself, and you know how hard that is.
Language is another barrier, while Chinese engineers most likely read English, few speak fluently.
Lastly, this is subjective, finding work in China is easy for good engineers. They also enjoy upper middle class pay AND social status. So while moving to the States means higher salary, it does not necessarily translate to better life.
In general the culture in China encourages individuals to be its best and care less on teamwork/group-leadership. It's slowly changing but this will take some time.
The 'salary' feature is pretty telling. I just looked up a former coworker by searching for the company and narrowing down by the hire year. I knew what country he was from so I was able to see his (starting) salary.
But I work for a private company, and I would bet the majority of HN readers do too. Imagine if your starting salary was known to your coworkers on the day you started, but not vice versa.
Can somebody explain to me why Americans need to have green card to work in the US? According the the data, there are 1659 applicants with a success rate of 68%.
Maybe it refers to US nationals as opposed to US citizens. For example, the people of American Samoa and Guam are US nationals but not US citizens, at least not automatically. But I don't know what all the differences between the two categories are.
Ah, I should have made it more clear. It's the decision date, note the application date. So probably those applicants applied when the USSR was still a thing.
yes, and that's why I'd love to have another column to figure out if there are any countries that have statistically significant higher/lower Greencards per capita :)
It's probably not a super useful metric, but it would be interesting to see
Is there some filter on this data? Green Card for a certain type of job? India has 280K greencards and Mexico only 40K greencards in 16 years? That doesn't seem right.
I assume this is only employment green cards. Most Mexican Americans probably come in on family based green cards, or just got citizenship by birth in the US.
Even China has only 45K greencards. They seem to follow the same pattern as India for immigration, so what category are they coming in. Last I checked both countries (India, China) have similar number of immigrants in the US.
I don't know, but the EB2 and EB3 backlogs for India are longer than for China so there must be some difference.
Assuming those numbers are from PERM applications though, they don't correspond one-to-one to immigrants. For example Indians have a multi year wait to get a green card after their PERM is approved, and anytime they change employers while waiting they will likely apply for a new PERM, so by the time they get their green card one Indian could have gone through many PERM approvals (which this website will probably count separately).
As mentioned in one of the other comments, you're getting NaN% for some success rates. I'm guessing it's because you're using the number of certified as the denominator, which will return NaN if it's 0.
Also, maybe this is a dumb question, but is this all green cards? For example, your site says there were 261 green card recipients from Nigeria, which seems quite low.
> Also, maybe this is a dumb question, but is this all green cards? For example, your site says there were 261 green card recipients from Nigeria, which seems quite low.
I wondered the same thing. Fiji is listed with 2 recipients (last I checked Fiji's annual quota was 600).
Excellent way of drawing attention to the subject! Kudos.
Might be interesting to explore the option of adding a beautiful data vis on the home page to capture the user's attention from the get-go. Looking at the sub pages you're obviously skilled enough at creating fascinating visualizations.
Are these across all LPR categories—i.e. employment based, family based, DV, asylum, etc.? The numbers make me inclined to think not, but I can't find any explanation on the website of what it covers.
If you ever get the chance I would consider redoing the graphs using https://dc-js.github.io/dc.js/ Being able to interact with data by clicking on graphs can make a massive difference to the way you can interrogate data. Kudos though.
Search fields don't appear to work properly (Safari on OS X) - if you key in a full search (title, city, state) and hit 'refine' it puts the title in all three fields while showing you everything, not just search results. Interesting site though - I want to find me :)
So reading into India's stats, there was dip in 2013 followed by a rise to maximum GC applications by indians ever? Is there a way to tell how many of those were upgrades? that would be interesting because then you can tell how many might just be repetitions.
The dates in this dataset are "decision" date, not the application date.
In that year, there was a fast forward movement in backlog of India's GC application... Green cards have a per country cap-- at the end of fiscal year, USCIS can use the "unused" GC quota of other countries for the backlogged countries. This is why you see that "priority date" for India stays at a fixed place (e.g., 2005 for EB2 India) for most of the year, and then suddenly moves forward at the end of fiscal year.
The data provided is not perfect so i'm still working on cleaning it but it should give you an idea of what is going on.
If you have any questions, feel free to ask.