When people say to have a github repository, I often worry that people think they have to have some huge project. Like a fork of Node or something.
This analysis would be more than enough for me to give someone an interview.
The math isn't complex, the analysis is pretty shallow but it shows that the author knows their way around the basics of:
- finding data, this is often the hardest part of data analysis
- working with data, unpacking, storing, retrieving, etc
- basic analysis with R or python
As a first introduction to a potential employee, this is more valuable than having a good resume and its well within reach of most people, regardless of how busy you are!
TL/DR, don't overthink having a public github account. Basic analysis like this will put you above most other candidates, oh and good job to the author!
Would love to have a project I just don't have the time but since we use use open-source libraries the better the library the less work I have to do patching it :)
Indeed. The OP author used to perform precisely this kind of data analysis for a hedge fund that invested in residential mortgage securities.
The other thing is that securities, mortgage loans themselves, securitizations and their derivatives will not trade in the market at fundamental value. This type of fundamental analysis is great, and you can make a lot of money by understanding value better than your competitors. When the market goes crazy, your portfolio market value can fall far below values calculated with these models. If that happens and you've leveraged your portfolio, you will go out of business. I know it seems like stating the obvious right now, but I saw a lot of very smart people who were caught in this trap.
Not being from the US, unaware of these institutions, and boggled how the concept of the state backing fixed rate mortgages was sensible, I wrote my 3 pages and somehow got the job.
> It should not be overlooked that in the not-so-distant past, i.e. when I worked as a mortgage analyst, an analysis of loan-level mortgage data would have cost a lot of money. Between licensing data and paying for expensive computers to analyze it, you could have easily incurred costs north of a million dollars per year.
If it existed. It did not. Computers were not needed to analyse a nice big data set, because a nice, big, transparent, data set, did not exist. Those that dug did quite nicely realizing that a big data set didn't exist did so by digging themselves, being confused, and realizing everyone else was confused / delusional too.
Splitting things by state and making data available is a level in transparency. But it is fine-tuning an organ based on where the horn is, and not understanding what the notes played are.
Providation of this type of data is badly stitching a bad gash. It confirms what has been known for years. A better question would be "If you're issuing bonds based on loans to people you have a FICO 'thin file' score of 600 for, that you've not done basic background checks for, and they're seeking to borrow 10 times their annual income, don't you see something wrong?"
Basic questions and understanding underlying data are more important than optimization of headline metrics.
I'm from the US, and I can't figure it out myself, without resorting to either stupidity or unethical intent. The corporate charter for Fannie Mae reads like the encyclopedia entry for moral hazard.
All gains go to private investors. All losses are implicitly insured by federal government bailout.
For those completely unfamiliar with the system, this is how Fannie Mae works:
We start with the prospective home buyer. This person, being American, wants a big house to hold all of his fancy consumer goods, and doesn't have quite enough in savings to pay cash for it. But never fear, there is money at the bank, and they will loan it out, at a price. The buyer takes out a loan. The bank also makes the buyer sign a security agreement that makes the purchased home collateral for the loan. The buyer moves in, and the bank takes his payments for the next 30 years. The seller, in all likelihood, uses the loan to pay off his loan, and deposits the rest in a bank.
This seems like a workable arrangement all by itself, so far.
But once upon a time, the US Congress, in its infinitesimal wisdom, determined that this resulted in too many fragmented and inconsistent housing markets across the nation. And also investors wanted to be able to achieve the low-risk gains of long term lending secured by property without actually having to set foot in the nasty barbarian backwaters that surrounded the civilized cities. They established Fannie Mae and Freddie Mac, for the sole purpose of buying the individual negotiable instruments from the local banks and selling them as risk-pooled investment bundles. If a bank has made a "conforming" loan, FNMA will buy it with few questions asked. This returns otherwise inaccessible cash to the bank, so it can make more loans, which can then be resold to FNMA. Fannie Mae doesn't actually care much about the loan itself. It might just contract it right back to the originating bank as the "servicer", which basically just means they do all the work, and get paid a fee from FNMA for doing it.
FNMA takes the income streams from those loan payments, collects them into handfuls, and ties a ribbon around them. Then big, institutional investors can then buy pieces of the action in more manageable chunks. The stated purpose of the organization is to "ensure uniform access to housing loans". The actual purpose is to make it easier for those big investors to suck money out of every small town in the US with almost zero risk or effort.
And its very existence distorts the market, such that looser conforming loan requirement increase housing prices across the board, and stricter requirements decrease those prices. When FNMA opens up the throttle for a few years, then hits the brakes hard, the housing market inflates, then crashes, and defaults spike. It turns what would otherwise be a local phenomenon into a country-wide catastrophe.
The loan-servicers have the same data and many have provided it to Black-Knight (LPS/McDash) for years. http://www.bkfs.com/Data-and-Analytics/CorporateInformation/...
While you mention that Freddie provided data from 2006, the loan-level performance data is even more recent. The original releases were just origination info and thus worthless by themselves for risk assessment.
I did OK in most of the above. Came back with piles of red ink mainly focusing on how an argument could be made better vis grammar, phrasing. The main judge (for this interview) seemed power of argument, read as one may.
Freddie is a bit behind in their dataset, only offering data through 2013. IMO, this kind of defeats their effort to increase transparency. If 2014 vintage loans are performing much worse (or better), it won't be known in time for many investors/modelers to react.
I also wish GinnieMae would release loan level data like this for FHA/VA/USDA loans, which are a huge part of the market. I could only find MBS pool aggregated data on their web-site: http://www.ginniemae.gov/doing_business_with_ginniemae/inves...
Check out LendingHome (http://lendinghome.com) if you're looking for an awesome company in SF that's doing some really cool work in the space.
How exactly anyone expects to make money by lending out money for thirty years at 75 basis points above the risk free rate, with a zero premium call option, levered 4:1 or greater, and with low recovery percentages if the security needs to be seized is beyond me. That's even before getting into the high overhead to deal with servicing and regulatory compliance.
> How exactly anyone expects to make money ... and with low recovery percentages if the security needs to be seized is beyond me
which is probably why you're not a mortgage banker :).
If you take a look at just how well mortgage bankers have done over the past 50 years, I'm not convinced they create any value over the course of a cycle for their employers. Indeed they seem especially prone to blowing up said employers every 15 years or so.
Picking up nickels in front of steam roller doesn't seem like a great business model to me. In fact, it looks like a pretty reliable indicator of a principal-agent problem.
At Blend (https://blendlabs.com/), we're working on a modern mortgage lending platform. For anyone who is interested in this, I'd love to chat!
It's like being the amtrack bathroom cleaner. Important, but shitty.
I had no idea it was that high, and I just assumed it meant 0.4% until I saw the numbers later.
Can anyone unpack this a bit? By my (fuzzy) understanding, this was something a lot of people thought in the 80's with neural networks but there wasn't a lot of theory to back it up. Later, applied math people introduced the kernel SVM which could solve non-linear problems with power equivalent to neural networks . RNNs are back in style now (and a lot more theory has been developed), but is this the type of agent-based model that would be useful for this problem and why so?
In this case, as cool as neural nets and SVM and all the rest can be, I'd rather write some code that I really, really understand than have a more-or-less opaquely-trained AI. (I am aware of various efforts to read out "meaning" from our various trainable AIs, but it's still even easier to directly put the meaning there from the start.) Then if I see something surprising, I pretty much know it's either a bug, or an unexpected interaction (the thing I'm looking for), and not merely some form of training error.