

Open-Source Loan-Level Analysis of Fannie and Freddie - lil_tee
http://toddwschneider.com/posts/mortgages-are-about-math-open-source-loan-level-analysis-of-fannie-and-freddie/

======
chollida1
This is pretty cool.

When people say to have a github repository, I often worry that people think
they have to have some huge project. Like a fork of Node or something.

This analysis would be more than enough for me to give someone an interview.

The math isn't complex, the analysis is pretty shallow but it shows that the
author knows their way around the basics of:

    
    
        - finding data, this is often the hardest part of data analysis
        - working with data, unpacking, storing, retrieving, etc
        - basic analysis with R or python
    

and to be honest, this counts for alot!

As a first introduction to a potential employee, this is more valuable than
having a good resume and its well within reach of most people, regardless of
how busy you are!

TL/DR, don't overthink having a public github account. Basic analysis like
this will put you above most other candidates, oh and good job to the author!

~~~
phkahler
I was actually most impressed by the interactive maps. I'm assuming there are
some readily available tools for making those, but it's outside my normal
experience so seems more impressive. I totally wanted to do geo-data stuff
when I was a kid, but the data and tools just didn't exist at the time.

~~~
tomgp
They seem to be made with Highcharts [1] a good tool for this kind of thing
with a very shallow and short learning curve. Not particularly flexible though
by comparison to what D3 has to offer [2]

[1] [http://www.highcharts.com/maps/demo](http://www.highcharts.com/maps/demo)

[2] [http://d3js.org/](http://d3js.org/)

~~~
joncooper
The toolchain isn't the point. That's the difference between someone with
business judgment and a plain ol' geek.

------
nissimk
This is a really good write up and analysis. I'd like to point out a couple of
things: the subprime market has moved to FHA and Ginnie Mae securities. I'm
not sure if they have the same detail of loan level data available online, but
it would be interesting to analyze.

The other thing is that securities, mortgage loans themselves, securitizations
and their derivatives will not trade in the market at fundamental value. This
type of fundamental analysis is great, and you can make a lot of money by
understanding value better than your competitors. When the market goes crazy,
your portfolio market value can fall far below values calculated with these
models. If that happens and you've leveraged your portfolio, you will go out
of business. I know it seems like stating the obvious right now, but I saw a
lot of very smart people who were caught in this trap.

------
zhte415
I was asked to do an analysis of Fannie and Freddie during a job interview in
2001. A 3 page report of 2 institutions I'd never heard of before, with a
stack a papers around 3 feet high consisting of a variety of financial
statements, promotional material, and news clippings, to be completed in pen
within 3 hours.

Not being from the US, unaware of these institutions, and boggled how the
concept of the state backing fixed rate mortgages was sensible, I wrote my 3
pages and somehow got the job.

> It should not be overlooked that in the not-so-distant past, i.e. when I
> worked as a mortgage analyst, an analysis of loan-level mortgage data would
> have cost a lot of money. Between licensing data and paying for expensive
> computers to analyze it, you could have easily incurred costs north of a
> million dollars per year.

If it existed. It did not. Computers were not needed to analyse a nice big
data set, because a nice, big, transparent, data set, did not exist. Those
that dug did quite nicely realizing that a big data set didn't exist did so by
digging themselves, being confused, and realizing everyone else was confused /
delusional too.

Splitting things by state and making data available is a level in
transparency. But it is fine-tuning an organ based on where the horn is, and
not understanding what the notes played are.

Providation of this type of data is badly stitching a bad gash. It confirms
what has been known for years. A better question would be "If you're issuing
bonds based on loans to people you have a FICO 'thin file' score of 600 for,
that you've not done basic background checks for, and they're seeking to
borrow 10 times their annual income, don't you see something wrong?"

Basic questions and understanding underlying data are more important than
optimization of headline metrics.

~~~
nissimk
The data has certainly been available for a fee since earlier than 2001:

[https://www.corelogic.com/solutions/loan-performance-
seconda...](https://www.corelogic.com/solutions/loan-performance-secondary-
market-analytics-for-capital-markets.aspx)

~~~
phdp
Not for Fannie or Freddie. Fannie only started releasing it in 2013, and
Freddie was earlier (2006 or 2007 I believe).

~~~
absherwin
The data has been available; acquiring it would have cost six figures until
recently.

Details: The loan-servicers have the same data and many have provided it to
Black-Knight (LPS/McDash) for years. [http://www.bkfs.com/Data-and-
Analytics/CorporateInformation/...](http://www.bkfs.com/Data-and-
Analytics/CorporateInformation/AboutUs/Pages/default.aspx)

While you mention that Freddie provided data from 2006, the loan-level
performance data is even more recent. The original releases were just
origination info and thus worthless by themselves for risk assessment.

------
joeriel
I work in the mortgage industry and have analyzed large datasets of subprime
and alt-a mortgages. These findings are very consistent with mine, although
the default and severity rates are (obviously) even worse than conventional.

Freddie is a bit behind in their dataset, only offering data through 2013.
IMO, this kind of defeats their effort to increase transparency. If 2014
vintage loans are performing much worse (or better), it won't be known in time
for many investors/modelers to react.

I also wish GinnieMae would release loan level data like this for FHA/VA/USDA
loans, which are a huge part of the market. I could only find MBS pool
aggregated data on their web-site:
[http://www.ginniemae.gov/doing_business_with_ginniemae/inves...](http://www.ginniemae.gov/doing_business_with_ginniemae/investor_resources/mbs_disclosure_data/Pages/monthly_consolidated_data.aspx)

~~~
phdp
Freddie does releases data monthly on the fourth business day. Both loan and
pool data.

------
omarish
Mortgages get disproportionately low airtime in the startup world, which I've
always thought was strange, especially considering how significant they are to
the US (and global economy).

Check out LendingHome ([http://lendinghome.com](http://lendinghome.com)) if
you're looking for an awesome company in SF that's doing some really cool work
in the space.

~~~
bradleyjg
The federal government dominates the mortgage market -- through direct
insurance, guarantees, and purchases. It's tough to compete with them. The
only comparable market I can think of is student loans where some companies
have found a way to pick off the most credit worthy borrowers (e.g. SoFi). But
in student loans the government does no underwriting and offers a uniform
rate, some of which are quite high in the current interest rate environment.
Whereas in mortgages the government underwrites and offers very low rates.

How exactly anyone expects to make money by lending out money for thirty years
at 75 basis points above the risk free rate, with a zero premium call option,
levered 4:1 or greater, and with low recovery percentages if the security
needs to be seized is beyond me. That's even before getting into the high
overhead to deal with servicing and regulatory compliance.

~~~
omarish
Not sure what you mean by "dominate", but there's $13 _trillion_ in mortgage
debt outstanding as of Q4 2014, $5T of that is held by federal agencies, and
$4.5T is held by financial institutions [1]. So it's actually a much bigger
business than you imagine.

> How exactly anyone expects to make money ... and with low recovery
> percentages if the security needs to be seized is beyond me

which is probably why you're not a mortgage banker :).

[1]
[http://www.federalreserve.gov/econresdata/releases/mortoutst...](http://www.federalreserve.gov/econresdata/releases/mortoutstand/current.htm)

~~~
bradleyjg
>which is probably why you're not a mortgage banker :).

If you take a look at just how well mortgage bankers have done over the past
50 years, I'm not convinced they create any value over the course of a cycle
for their employers. Indeed they seem especially prone to blowing up said
employers every 15 years or so.

Picking up nickels in front of steam roller doesn't seem like a great business
model to me. In fact, it looks like a pretty reliable indicator of a
principal-agent problem.

------
chrisBob
It was unclear to me at first that a default rate of 0.4 on the map is
actually 40% !!!

I had no idea it was that high, and I just assumed it meant 0.4% until I saw
the numbers later.

------
rgbrgb
> So-called agent-based models attempt to model the behavior of individual
> borrowers at the micro-level, then simulate many agents interacting and
> making individual decisions, before aggregating into a final prediction. The
> agent-based approach can be computationally much more complicated, but at
> least in my opinion it seems like a model based on traditional statistical
> techniques will never explain phenomena like the housing bubble and
> financial crisis, whereas a well-formulated agent-based model at least has a
> fighting chance.

Can anyone unpack this a bit? By my (fuzzy) understanding, this was something
a lot of people thought in the 80's with neural networks but there wasn't a
lot of theory to back it up. Later, applied math people introduced the kernel
SVM which could solve non-linear problems with power equivalent to neural
networks [0]. RNNs are back in style now (and a lot more theory has been
developed), but is this the type of agent-based model that would be useful for
this problem and why so?

[0]:
[http://www.scm.keele.ac.uk/staff/p_andras/PAnpl2002.pdf](http://www.scm.keele.ac.uk/staff/p_andras/PAnpl2002.pdf)

~~~
jerf
I suspect they're talking about something much more brutish, where you build
some agents that by any mechanism read some stuff out of the environment and
take some actions, then put lots of them in the same environment and see what
happens. This less "neural nets" or "SVM" and more "game AI being run at
scale", probably via heuristics and brute-force coding like in a game, except
the game is a model of the real world.

In this case, as cool as neural nets and SVM and all the rest can be, I'd
rather write some code that I really, really understand than have a more-or-
less opaquely-trained AI. (I am aware of various efforts to read out "meaning"
from our various trainable AIs, but it's still even easier to directly put the
meaning there from the start.) Then if I see something surprising, I pretty
much know it's either a bug, or an unexpected interaction (the thing I'm
looking for), and not merely some form of training error.

------
stadeschuldt
Direct link to Github repository: [https://github.com/toddwschneider/agency-
loan-level](https://github.com/toddwschneider/agency-loan-level)

------
danny8000
It would be interesting to cross-reference the records from Freddie and Fannie
with the HMDA data, which has additional fields about each mortgage
application:
[https://www.ffiec.gov/hmda/hmdaflat.htm](https://www.ffiec.gov/hmda/hmdaflat.htm)
Would the HMDA "loan amount" field match the "ORIGINAL UNPAID PRINCIPAL
BALANCE" field in the Fannie data? Since HMDA data is geo-located to the
Census tract, it could then be linked to Census and other public data sets.

------
magicmu
Thanks for the detailed analysis! It's very thorough and really interesting,
and it's awesome that people are making intelligent use of this data now that
it's available.

------
bbanyc
Private securitizations were inflating the subprime bubble well before Fannie
and Freddie jumped in. If there's any data on, e.g., Countrywide/IndyMac it'd
be valuable to add.

------
GutenYe
I like the source code.

