Hacker News new | past | comments | ask | show | jobs | submit login
Open-Source Loan-Level Analysis of Fannie and Freddie (toddwschneider.com)
280 points by lil_tee on June 9, 2015 | hide | past | web | favorite | 35 comments

This is pretty cool.

When people say to have a github repository, I often worry that people think they have to have some huge project. Like a fork of Node or something.

This analysis would be more than enough for me to give someone an interview.

The math isn't complex, the analysis is pretty shallow but it shows that the author knows their way around the basics of:

    - finding data, this is often the hardest part of data analysis
    - working with data, unpacking, storing, retrieving, etc
    - basic analysis with R or python
and to be honest, this counts for alot!

As a first introduction to a potential employee, this is more valuable than having a good resume and its well within reach of most people, regardless of how busy you are!

TL/DR, don't overthink having a public github account. Basic analysis like this will put you above most other candidates, oh and good job to the author!

I think the analysis is not actually that shallow. There may be a lot of sophisticated algorithms and analysis that can be run, but the basic regressions are very effective and commonly used in practice. I also appreciate the use of a sql database with indexing.

I was actually most impressed by the interactive maps. I'm assuming there are some readily available tools for making those, but it's outside my normal experience so seems more impressive. I totally wanted to do geo-data stuff when I was a kid, but the data and tools just didn't exist at the time.

They seem to be made with Highcharts [1] a good tool for this kind of thing with a very shallow and short learning curve. Not particularly flexible though by comparison to what D3 has to offer [2]

[1] http://www.highcharts.com/maps/demo

[2] http://d3js.org/

The toolchain isn't the point. That's the difference between someone with business judgment and a plain ol' geek.

All I got in in my github is bug-fixes to other open-source code.

Would love to have a project I just don't have the time but since we use use open-source libraries the better the library the less work I have to do patching it :)

In statistics elegance in the math usually means that you've framed the problem well. That said, while the Cox model is the most popular survival regression model, I wouldn't say it's _simple_ by any means. It involves understanding the relationship between a hazard rate, cumulative density f(), probability density f(), survival f(), cumulative hazard f(), maximum likelihood estimation techniques, censoring, I could go on.

"shows that the author knows their way around..."

Indeed. The OP author used to perform precisely this kind of data analysis for a hedge fund that invested in residential mortgage securities.

Agreed. This kind of thing would definitely get an interview from me as well. Stands out very well against a sea of indistinguishable resumes from big-name schools. There's simply no substitute for actually showing that you can do quality work!

The challenge in MBS analysis isn't the complexity of the math involved (it's regressions and stats) it's understanding the business drivers, as it's hard to mine data that's so big.

This is a really good write up and analysis. I'd like to point out a couple of things: the subprime market has moved to FHA and Ginnie Mae securities. I'm not sure if they have the same detail of loan level data available online, but it would be interesting to analyze.

The other thing is that securities, mortgage loans themselves, securitizations and their derivatives will not trade in the market at fundamental value. This type of fundamental analysis is great, and you can make a lot of money by understanding value better than your competitors. When the market goes crazy, your portfolio market value can fall far below values calculated with these models. If that happens and you've leveraged your portfolio, you will go out of business. I know it seems like stating the obvious right now, but I saw a lot of very smart people who were caught in this trap.

I was asked to do an analysis of Fannie and Freddie during a job interview in 2001. A 3 page report of 2 institutions I'd never heard of before, with a stack a papers around 3 feet high consisting of a variety of financial statements, promotional material, and news clippings, to be completed in pen within 3 hours.

Not being from the US, unaware of these institutions, and boggled how the concept of the state backing fixed rate mortgages was sensible, I wrote my 3 pages and somehow got the job.

> It should not be overlooked that in the not-so-distant past, i.e. when I worked as a mortgage analyst, an analysis of loan-level mortgage data would have cost a lot of money. Between licensing data and paying for expensive computers to analyze it, you could have easily incurred costs north of a million dollars per year.

If it existed. It did not. Computers were not needed to analyse a nice big data set, because a nice, big, transparent, data set, did not exist. Those that dug did quite nicely realizing that a big data set didn't exist did so by digging themselves, being confused, and realizing everyone else was confused / delusional too.

Splitting things by state and making data available is a level in transparency. But it is fine-tuning an organ based on where the horn is, and not understanding what the notes played are.

Providation of this type of data is badly stitching a bad gash. It confirms what has been known for years. A better question would be "If you're issuing bonds based on loans to people you have a FICO 'thin file' score of 600 for, that you've not done basic background checks for, and they're seeking to borrow 10 times their annual income, don't you see something wrong?"

Basic questions and understanding underlying data are more important than optimization of headline metrics.

> Not being from the US, unaware of these institutions, and boggled how the concept of the state backing fixed rate mortgages was sensible

I'm from the US, and I can't figure it out myself, without resorting to either stupidity or unethical intent. The corporate charter for Fannie Mae reads like the encyclopedia entry for moral hazard.

All gains go to private investors. All losses are implicitly insured by federal government bailout.

For those completely unfamiliar with the system, this is how Fannie Mae works:

We start with the prospective home buyer. This person, being American, wants a big house to hold all of his fancy consumer goods, and doesn't have quite enough in savings to pay cash for it. But never fear, there is money at the bank, and they will loan it out, at a price. The buyer takes out a loan. The bank also makes the buyer sign a security agreement that makes the purchased home collateral for the loan. The buyer moves in, and the bank takes his payments for the next 30 years. The seller, in all likelihood, uses the loan to pay off his loan, and deposits the rest in a bank.

This seems like a workable arrangement all by itself, so far.

But once upon a time, the US Congress, in its infinitesimal wisdom, determined that this resulted in too many fragmented and inconsistent housing markets across the nation. And also investors wanted to be able to achieve the low-risk gains of long term lending secured by property without actually having to set foot in the nasty barbarian backwaters that surrounded the civilized cities. They established Fannie Mae and Freddie Mac, for the sole purpose of buying the individual negotiable instruments from the local banks and selling them as risk-pooled investment bundles. If a bank has made a "conforming" loan, FNMA will buy it with few questions asked. This returns otherwise inaccessible cash to the bank, so it can make more loans, which can then be resold to FNMA. Fannie Mae doesn't actually care much about the loan itself. It might just contract it right back to the originating bank as the "servicer", which basically just means they do all the work, and get paid a fee from FNMA for doing it.

FNMA takes the income streams from those loan payments, collects them into handfuls, and ties a ribbon around them. Then big, institutional investors can then buy pieces of the action in more manageable chunks. The stated purpose of the organization is to "ensure uniform access to housing loans". The actual purpose is to make it easier for those big investors to suck money out of every small town in the US with almost zero risk or effort.

And its very existence distorts the market, such that looser conforming loan requirement increase housing prices across the board, and stricter requirements decrease those prices. When FNMA opens up the throttle for a few years, then hits the brakes hard, the housing market inflates, then crashes, and defaults spike. It turns what would otherwise be a local phenomenon into a country-wide catastrophe.

The data has certainly been available for a fee since earlier than 2001:


Not for Fannie or Freddie. Fannie only started releasing it in 2013, and Freddie was earlier (2006 or 2007 I believe).

The data has been available; acquiring it would have cost six figures until recently.

Details: The loan-servicers have the same data and many have provided it to Black-Knight (LPS/McDash) for years. http://www.bkfs.com/Data-and-Analytics/CorporateInformation/...

While you mention that Freddie provided data from 2006, the loan-level performance data is even more recent. The original releases were just origination info and thus worthless by themselves for risk assessment.

Do you happen to still have a copy of the analysis you wrote? I'd be interested to read it.

Unfortunately not. And nor can I remember what I exactly wrote. I believe it was along the lines that swapping floating rates by guaranteeing fixed rates was a subsidy based on term risk premium that might not work out especially now (2001) given flat yield curves (that got flatter); volatility risk (of stuff, not sure what, but pretty sure I mentioned it) was not compensated; and that the data presented was not sufficient to make sense of this in the 'long run' which was long enough to know 20 years was not that long.

I did OK in most of the above. Came back with piles of red ink mainly focusing on how an argument could be made better vis grammar, phrasing. The main judge (for this interview) seemed power of argument, read as one may.

I work in the mortgage industry and have analyzed large datasets of subprime and alt-a mortgages. These findings are very consistent with mine, although the default and severity rates are (obviously) even worse than conventional.

Freddie is a bit behind in their dataset, only offering data through 2013. IMO, this kind of defeats their effort to increase transparency. If 2014 vintage loans are performing much worse (or better), it won't be known in time for many investors/modelers to react.

I also wish GinnieMae would release loan level data like this for FHA/VA/USDA loans, which are a huge part of the market. I could only find MBS pool aggregated data on their web-site: http://www.ginniemae.gov/doing_business_with_ginniemae/inves...

Freddie does releases data monthly on the fourth business day. Both loan and pool data.

Mortgages get disproportionately low airtime in the startup world, which I've always thought was strange, especially considering how significant they are to the US (and global economy).

Check out LendingHome (http://lendinghome.com) if you're looking for an awesome company in SF that's doing some really cool work in the space.

The federal government dominates the mortgage market -- through direct insurance, guarantees, and purchases. It's tough to compete with them. The only comparable market I can think of is student loans where some companies have found a way to pick off the most credit worthy borrowers (e.g. SoFi). But in student loans the government does no underwriting and offers a uniform rate, some of which are quite high in the current interest rate environment. Whereas in mortgages the government underwrites and offers very low rates.

How exactly anyone expects to make money by lending out money for thirty years at 75 basis points above the risk free rate, with a zero premium call option, levered 4:1 or greater, and with low recovery percentages if the security needs to be seized is beyond me. That's even before getting into the high overhead to deal with servicing and regulatory compliance.

Not sure what you mean by "dominate", but there's $13 trillion in mortgage debt outstanding as of Q4 2014, $5T of that is held by federal agencies, and $4.5T is held by financial institutions [1]. So it's actually a much bigger business than you imagine.

> How exactly anyone expects to make money ... and with low recovery percentages if the security needs to be seized is beyond me

which is probably why you're not a mortgage banker :).

[1] http://www.federalreserve.gov/econresdata/releases/mortoutst...

>which is probably why you're not a mortgage banker :).

If you take a look at just how well mortgage bankers have done over the past 50 years, I'm not convinced they create any value over the course of a cycle for their employers. Indeed they seem especially prone to blowing up said employers every 15 years or so.

Picking up nickels in front of steam roller doesn't seem like a great business model to me. In fact, it looks like a pretty reliable indicator of a principal-agent problem.

I wonder if it would be legal/feasible to underwrite student loans based on major and GPA and how they correlate with loan risk rates.

Yeah, this is the first time I've seen anything about mortgages on HN. It's surprising considering the enormous amount of technical opportunity in the mortgage world right now.

At Blend (https://blendlabs.com/), we're working on a modern mortgage lending platform. For anyone who is interested in this, I'd love to chat!

Although they are significant, they are mind-numbingly boring. Compensation being remotely equal, would you rather work at a techy company like Tesla or Google... or the most boring part of the finance world.

It's like being the amtrack bathroom cleaner. Important, but shitty.

It was unclear to me at first that a default rate of 0.4 on the map is actually 40% !!!

I had no idea it was that high, and I just assumed it meant 0.4% until I saw the numbers later.

> So-called agent-based models attempt to model the behavior of individual borrowers at the micro-level, then simulate many agents interacting and making individual decisions, before aggregating into a final prediction. The agent-based approach can be computationally much more complicated, but at least in my opinion it seems like a model based on traditional statistical techniques will never explain phenomena like the housing bubble and financial crisis, whereas a well-formulated agent-based model at least has a fighting chance.

Can anyone unpack this a bit? By my (fuzzy) understanding, this was something a lot of people thought in the 80's with neural networks but there wasn't a lot of theory to back it up. Later, applied math people introduced the kernel SVM which could solve non-linear problems with power equivalent to neural networks [0]. RNNs are back in style now (and a lot more theory has been developed), but is this the type of agent-based model that would be useful for this problem and why so?

[0]: http://www.scm.keele.ac.uk/staff/p_andras/PAnpl2002.pdf

I suspect they're talking about something much more brutish, where you build some agents that by any mechanism read some stuff out of the environment and take some actions, then put lots of them in the same environment and see what happens. This less "neural nets" or "SVM" and more "game AI being run at scale", probably via heuristics and brute-force coding like in a game, except the game is a model of the real world.

In this case, as cool as neural nets and SVM and all the rest can be, I'd rather write some code that I really, really understand than have a more-or-less opaquely-trained AI. (I am aware of various efforts to read out "meaning" from our various trainable AIs, but it's still even easier to directly put the meaning there from the start.) Then if I see something surprising, I pretty much know it's either a bug, or an unexpected interaction (the thing I'm looking for), and not merely some form of training error.

Direct link to Github repository: https://github.com/toddwschneider/agency-loan-level

It would be interesting to cross-reference the records from Freddie and Fannie with the HMDA data, which has additional fields about each mortgage application: https://www.ffiec.gov/hmda/hmdaflat.htm Would the HMDA "loan amount" field match the "ORIGINAL UNPAID PRINCIPAL BALANCE" field in the Fannie data? Since HMDA data is geo-located to the Census tract, it could then be linked to Census and other public data sets.

Thanks for the detailed analysis! It's very thorough and really interesting, and it's awesome that people are making intelligent use of this data now that it's available.

Private securitizations were inflating the subprime bubble well before Fannie and Freddie jumped in. If there's any data on, e.g., Countrywide/IndyMac it'd be valuable to add.

I like the source code.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact