Thanks. I really appreciate you taking the time to write so much. Honestly, this seems to be the best description of potential careers that I have ever gotten and I feel so much more assured about my choice to study Maths now. It is also really interesting to hear about how colourful such a career could be! You sound like, despite the politics, you have had a really interesting career and have done the kind of things that I would exactly like to do!
Again, thanks so much. One last question, what are you doing with your life right now? It seems like you have had a pretty broad career and I am interested in the end game. If you are retired, what is your lifestyle like and in particular how has your career impacted your life as a retiree.
Well, in part we aimed too high: My wife
was brilliant, quite broadly -- yes,
Valedictorian, Summa Cum Laude, Woodrow
Wilson, PBK, piano, clarinet, voice,
prizes in cooking, sewing, raising
chickens. But her family had her try to
be perfect and to dedicate herself to
saving the world.
She wanted a Ph.D. in mathematical
sociology to do social engineering of
social change to save the world and,
thus, get her praise, acceptance, and
emotional and financial security. And
those concerns filled her plate.
Well, at the roots there some anxiety,
from nature and/or nurture, was involved,
then some perfectionism and some fear of
criticism from powerful, influential
people -- net, call it a special case of
social phobia. That brought stress
which can bring depression. That slows
the work and in her case caused more
stress and more depression then clinical
depression. She was in a clinical
depression the day she got her Ph.D. She
never recovered and, net, didn't make it.
Took me a while to reinvent and learn the
basics of clinical psychology to
understand what was going on and what to
do about it. I did learn but nearly
always too late.
In trying to have a stable job so that I
could take care of her, I took a job at
IBM's Watson lab, in what we called
artificial intelligence (AI) to do
monitoring and management of large server
farms and networks.
Then IBM got sick and the lab phone book
went from 4500 full time names down to
1000 or so. The guy who hired me, a big
star who was deliberately ignored by the
higher ups, left for greener pastures --
eventually ended up with a nice place in
Malibu.
IBM Watson Research was run by a clique
of people who stuck together and blocked
out everyone else. At one point it
appeared that the company HQ in Armonk
tried to correct that situation.
Instead of or better than the AI, I had
some ideas: Detecting a problem in a
server farm or network is necessarily
essentially a statistical hypothesis test
with Type I (false alarm) and Type II
(missed detection) errors. So, want
hypothesis tests that keep down the
rates of the errors.
Since there's a lot of data readily
available and want to exploit it to keep
down the rates of the errors, also want
multi-variate inputs -- nearly all
hypothesis tests have only univariate
inputs. Also for such multi-variate data
can't hope to know any of the probability
distributions so need tests that are
distribution-free.
I don't think there were any such tests,
so I dreamed up some. I used some group
theory, summed used the classic S. Ulam,
guy on the left in
result LeCam called tightness (see P.
Billingsley, Convergence of Probability
Measures), and was able to permit
selecting false alarm rate in advance and
then getting that rate exactly. Asking
for all of the Neyman-Pearson result would
take more data than we had any chance of
having, but there was a somewhat useful
sense in which my work gave
asymptotically, for any selected false
alarm rate, the highest possible detection
rate with that false alarm rate.
I cooked up some synthetic data that was
challenging -- the critical region was
something like the red squares on a
checkerboard in several dimensions. My
work did fine.
I got some data from a cluster of
computers at Allstate, wrote some
prototype software, and confirmed the
false alarm rate empirically. And I
cooked up an algorithm to make the
computations nicely fast (in part I used
k-D trees -- reinvented those -- a few
years earlier and k-D trees would have
been mine).
Politics: A guy in the clique up there
didn't like me. But it took two levels of
management to fire someone, so he
reorganized to put me under a wuss who
would go along with firing me. They
claimed my research was not publishable
(reviewed by a guy in the clique who
admitted he couldn't read my math but
claimed to have found someone who could
but found nothing wrong with my paper) and
walked me out the door.
The next day the wuss was demoted out of
management. Two weeks later the main
nasty guy was moved down to have him under
an additional level of management and
given a six month performance plan which
he failed. He was demoted out of
management -- lost his corner office,
budget, secretary, and 55 subordinates.
I got a PC and Knuth's TeX and submitted a
paper on my research. Since the paper had
some measure theory in it, e.g., Ulam's
result, much of the computer science
community couldn't read it. But the
journal that offered to review the paper
kept at it; apparently the editor in chief
walked the paper around his campus, to a
CS department to see if the problem was
important and to a math department to see
if the math was correct, and accepted the
paper. He invited me to present at a
conference he was running, but I didn't
want to bother going. The paper was
published. IBM was wrong.
So it appears that I have the world's only
collection, and it's large, of statistical
hypothesis tests that are both
multi-variate and distribution-free, with
some nice properties, with a fast
algorithm, with some confirmation of the
false alarm rate calculations from some
real data, etc.
Asymptotically the critical region can
be a multi-dimensional fractal. Nice.
People should my work. I did give a talk
on the work at the main NASDAQ site in
Trumbull, CT.
IBM didn't pay very well, and cost of
living was high -- I'd always saved money,
even in grad school, but I lost quite a
lot of money working at IBM.
When I joined IBM, it'd just won the Nobel
prize in physics two years in a row and
had a long string of being "the most
admired company in the world". Now I'd
advise anyone just to stay the heck away,
a long way away.
But being pushed out of IBM and age left
me 100% permanently unemployable. I sent
1000+ resumes. Zip, zilch, zero.
If in computing, be sure by age 40,
hopefully by age 35, to have a rock solid
stable career and/or be wealthy. So,
really about have to own your own business
and make it successful.
For now, sure, I know some math and can
stir up some more, and I know some
computing and can learn more.
So, I've got a 1.8 GHz AMD single core
processor, Windows XP Professional SP3 (I
have an official copy of Windows 7
Professional on DVD but have seen no great
reason to go to all the trouble to install
it and rebuild all my software
environment yet), three hard disks, 100
million files, .NET Framework 4.0, a good
text editor (KEdit), Visual Basic .NET
(comes with the .NET Framework), ASP.NET,
ADO.NET, IIS (low level Web server), SQL
Server Express, etc.
I thought, why not go after about 2/3rds
of Internet search, the safe for work
part served at best poorly by looking for
keywords/phrases?
So, how to do that? Not with just routine
software! And not with anything commonly
talked about for search!
I stirred up some math, typed the theorems
and proofs into Knuth's TeX, worked up a
scalable architecture, and typed in the
software.
Maybe I should have used Redis, but
instead I thought that writing my own
session state server would be faster than
even understanding Redis. So, my session
state server is single threaded ("Look,
Ma, no concurrency problems"), for faster
lookup rates is trivial to run as several
instances as in sharding, and is just
some TCP/IP sockets, some de/serialization
of instances of my session state class,
and uses two instances of a .NET
collection class. Simple.
It's fast! A server for less than $1500
should be able to keep session state for
an hour of inactivity for each user and do
the session state work for sending 5000+
Web pages a second. Two standard racks
should be able to handle session state for
the world.
The rest of the software is also readily
scalable also from just simple sharding.
Currently I have one bug in one Web page
-- it's not handling session state just
right! But I have the fix in code in
another Web page and should copy it over
today!
My interactive development environment
is just KEdit with about 100 macros and
some careful use of file system
directories.
I'm using SQL Server only to record the
data from the users and for the results of
some batch computations; at one point I
actually do make use of a transaction;
the data for the searches is drawn from
SQL Server with a batch program (run it
maybe once a day); some solid state drives
(write rarely, read thousands of times a
second) should do wonders for the data for
the searches.
I was about to fix the bug in the Web page
but took some opportunities to gather some
good initial data. The site will start
focused and only slowly grow to be
comprehensive -- right, at first do some
things that don't scale and please some
niche group of users a lot instead of
trying to please 2+ billion users a
little.
Currently the database has only some
meaningless data I put in for first
testing of the software. It's about time
to load in some of the good initial data I
have. Then give a critical review, go
live, etc.
It's getting there.
All the work uniquely mine has been fast,
fun, and easy, but the whole project has
taken far, far, far too long. Why? I
worked through about a cubic foot of books
and 6000+ Web pages of documentation of
Windows, .NET, Visual Basic .NET, ASP.NET,
ADO.NET, SQL Server, etc.
The main problems: (1) Badly written,
obscure documentation (worst bottleneck in
the future of computing); (2) computer
viruses; (3) SQL Server installation bugs
destroying my boot partition requiring
rebuilding starting with the XP DVD
(barbed wire enema with an unanesthetized
upper molar root canal procedure); (4) SQL
Server management and administration
(e.g., a week of throwing stuff against a
wall to see what sticks just to get a SQL
Server connection string that will let
code for a server side Web page connect
with SQL Server); (5) clean, smooth means
of system backup and recovery (including
for both user data and bootable
partitions); (6) Sony DVD drives that quit
for no good reason (and inability to buy
more IDE DVD drives).
Good stuff: (1) KEdit and its macro
language; (2) the scripting language Rexx
(Microsoft's PowerShell may also be
terrific but have yet to move to it); (3)
NTFS (fantastic); (4) Visual Basic .NET
design, functionality, speed of
compilation, compiler error messages,
minimal bugs (sweetheart language); (5)
what ASP.NET does when it compiles a
Visual Basic program (enough for a really
nice IDE); (6) NTBACKUP (once understand
how to use it, e.g., do have to ask to
save "system state", whatever the heck
that is, or the saved copy, restored,
won't boot -- learn this and how to get
around it the hard way, weeks of work);
(7) XCOPY; (8) the tools to have server
side Web page code write to a log file;
(9) Firefox (except for virus
vulnerabilities); (10) the classes in the
.NET Framework (once learn how to learn
about them and use them); (11) Adobe
Acrobat (except for virus
vulnerabilities); (12) the ability of XP
to find device drivers and recognize new
devices; (13) Microsoft's anti-virus
Safety Scanner (if only from CP67/CMS and
Multics, there should be no virus
vulnerabilities, but since there are the
safety scanner is terrific to have); (14)
Knuth's TeX; (15) the Western Digital
Passport Ultra 2 TB USB drive!
I'm not retired or retiring! Likely I'm
still 100% unemployable at anything that
would pay enough to let me keep a car
going to commute to the job. Y Combinator
and VCs want nothing to do with me.
But if I can get my Web site up to a
search a second, on a server for about
$2000, I will be in decent shape
financially and on the way to organic
growth for my business and much more.
Then I'll get a nice house, a building for
some cars, at least one Corvette, visit
the rest of the family still alive, take
off two weeks to pig out on lobster in
Maine, get some good grape juice from
between Beaune and Dijon, do some cooking,
give some dinner parties, go to concerts
and operas, continue with the business, go
to seminars on mathematical physics, etc.
I'll implement and deploy my server farm
monitoring techniques and maybe spin it
off as a separate business.
I have some guesses for some approaches to
real AI and might try to implement
those.
And I will get a kitty cat! We'll see!
Good luck on your work with math. Maybe
what I typed in here will help; I wish I'd
known all that when I started.
Writing this as a reply rather than an email because there's no contact info on graycat's profile.
Hi graycat,
I'm an applied math & CS undergrad at Caltech and I love all of your posts, especially the ones about how data center scale computing really needs to embrace statistics. The FedEx stories are also excellent. I'm personally interested in high performance computing and machine learning, but I'm also interested in solving "real problems" and like how you seem to focus on the actual value of the applications. I love the feeling of engineering solutions to mathematical problems, and this seems to be something that you also enjoy.
I'd really like to hear more about your career, research, and also the startup that you're working on. I'd be very happy if you shoot me an email at eric@ericmart.in
I really enjoy your posts and your writing style. Thanks much. I am currently working on some math problems as a part of a business that makes accessible to Indians who do not have access to structured banking services. Would love to discuss it with you, if you are inclined. My email is takenottie at google's mail. Thanks.
Again, thanks so much. One last question, what are you doing with your life right now? It seems like you have had a pretty broad career and I am interested in the end game. If you are retired, what is your lifestyle like and in particular how has your career impacted your life as a retiree.