Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks. I really appreciate you taking the time to write so much. Honestly, this seems to be the best description of potential careers that I have ever gotten and I feel so much more assured about my choice to study Maths now. It is also really interesting to hear about how colourful such a career could be! You sound like, despite the politics, you have had a really interesting career and have done the kind of things that I would exactly like to do!

Again, thanks so much. One last question, what are you doing with your life right now? It seems like you have had a pretty broad career and I am interested in the end game. If you are retired, what is your lifestyle like and in particular how has your career impacted your life as a retiree.



Part I

Well, in part we aimed too high: My wife was brilliant, quite broadly -- yes, Valedictorian, Summa Cum Laude, Woodrow Wilson, PBK, piano, clarinet, voice, prizes in cooking, sewing, raising chickens. But her family had her try to be perfect and to dedicate herself to saving the world.

She wanted a Ph.D. in mathematical sociology to do social engineering of social change to save the world and, thus, get her praise, acceptance, and emotional and financial security. And those concerns filled her plate.

Well, at the roots there some anxiety, from nature and/or nurture, was involved, then some perfectionism and some fear of criticism from powerful, influential people -- net, call it a special case of social phobia. That brought stress which can bring depression. That slows the work and in her case caused more stress and more depression then clinical depression. She was in a clinical depression the day she got her Ph.D. She never recovered and, net, didn't make it.

Took me a while to reinvent and learn the basics of clinical psychology to understand what was going on and what to do about it. I did learn but nearly always too late.

In trying to have a stable job so that I could take care of her, I took a job at IBM's Watson lab, in what we called artificial intelligence (AI) to do monitoring and management of large server farms and networks.

Then IBM got sick and the lab phone book went from 4500 full time names down to 1000 or so. The guy who hired me, a big star who was deliberately ignored by the higher ups, left for greener pastures -- eventually ended up with a nice place in Malibu.

IBM Watson Research was run by a clique of people who stuck together and blocked out everyone else. At one point it appeared that the company HQ in Armonk tried to correct that situation.

Instead of or better than the AI, I had some ideas: Detecting a problem in a server farm or network is necessarily essentially a statistical hypothesis test with Type I (false alarm) and Type II (missed detection) errors. So, want hypothesis tests that keep down the rates of the errors.

Since there's a lot of data readily available and want to exploit it to keep down the rates of the errors, also want multi-variate inputs -- nearly all hypothesis tests have only univariate inputs. Also for such multi-variate data can't hope to know any of the probability distributions so need tests that are distribution-free.

I don't think there were any such tests, so I dreamed up some. I used some group theory, summed used the classic S. Ulam, guy on the left in

http://www-history.mcs.st-and.ac.uk/BigPictures/Ulam_Feynman...

result LeCam called tightness (see P. Billingsley, Convergence of Probability Measures), and was able to permit selecting false alarm rate in advance and then getting that rate exactly. Asking for all of the Neyman-Pearson result would take more data than we had any chance of having, but there was a somewhat useful sense in which my work gave asymptotically, for any selected false alarm rate, the highest possible detection rate with that false alarm rate.

I cooked up some synthetic data that was challenging -- the critical region was something like the red squares on a checkerboard in several dimensions. My work did fine.

I got some data from a cluster of computers at Allstate, wrote some prototype software, and confirmed the false alarm rate empirically. And I cooked up an algorithm to make the computations nicely fast (in part I used k-D trees -- reinvented those -- a few years earlier and k-D trees would have been mine).

Politics: A guy in the clique up there didn't like me. But it took two levels of management to fire someone, so he reorganized to put me under a wuss who would go along with firing me. They claimed my research was not publishable (reviewed by a guy in the clique who admitted he couldn't read my math but claimed to have found someone who could but found nothing wrong with my paper) and walked me out the door.

The next day the wuss was demoted out of management. Two weeks later the main nasty guy was moved down to have him under an additional level of management and given a six month performance plan which he failed. He was demoted out of management -- lost his corner office, budget, secretary, and 55 subordinates.

I got a PC and Knuth's TeX and submitted a paper on my research. Since the paper had some measure theory in it, e.g., Ulam's result, much of the computer science community couldn't read it. But the journal that offered to review the paper kept at it; apparently the editor in chief walked the paper around his campus, to a CS department to see if the problem was important and to a math department to see if the math was correct, and accepted the paper. He invited me to present at a conference he was running, but I didn't want to bother going. The paper was published. IBM was wrong.

So it appears that I have the world's only collection, and it's large, of statistical hypothesis tests that are both multi-variate and distribution-free, with some nice properties, with a fast algorithm, with some confirmation of the false alarm rate calculations from some real data, etc.

Asymptotically the critical region can be a multi-dimensional fractal. Nice.

People should my work. I did give a talk on the work at the main NASDAQ site in Trumbull, CT.

IBM didn't pay very well, and cost of living was high -- I'd always saved money, even in grad school, but I lost quite a lot of money working at IBM.

When I joined IBM, it'd just won the Nobel prize in physics two years in a row and had a long string of being "the most admired company in the world". Now I'd advise anyone just to stay the heck away, a long way away.

But being pushed out of IBM and age left me 100% permanently unemployable. I sent 1000+ resumes. Zip, zilch, zero.

If in computing, be sure by age 40, hopefully by age 35, to have a rock solid stable career and/or be wealthy. So, really about have to own your own business and make it successful.


Part II

For now, sure, I know some math and can stir up some more, and I know some computing and can learn more.

So, I've got a 1.8 GHz AMD single core processor, Windows XP Professional SP3 (I have an official copy of Windows 7 Professional on DVD but have seen no great reason to go to all the trouble to install it and rebuild all my software environment yet), three hard disks, 100 million files, .NET Framework 4.0, a good text editor (KEdit), Visual Basic .NET (comes with the .NET Framework), ASP.NET, ADO.NET, IIS (low level Web server), SQL Server Express, etc.

I thought, why not go after about 2/3rds of Internet search, the safe for work part served at best poorly by looking for keywords/phrases?

So, how to do that? Not with just routine software! And not with anything commonly talked about for search!

I stirred up some math, typed the theorems and proofs into Knuth's TeX, worked up a scalable architecture, and typed in the software.

Maybe I should have used Redis, but instead I thought that writing my own session state server would be faster than even understanding Redis. So, my session state server is single threaded ("Look, Ma, no concurrency problems"), for faster lookup rates is trivial to run as several instances as in sharding, and is just some TCP/IP sockets, some de/serialization of instances of my session state class, and uses two instances of a .NET collection class. Simple.

It's fast! A server for less than $1500 should be able to keep session state for an hour of inactivity for each user and do the session state work for sending 5000+ Web pages a second. Two standard racks should be able to handle session state for the world.

The rest of the software is also readily scalable also from just simple sharding.

Currently I have one bug in one Web page -- it's not handling session state just right! But I have the fix in code in another Web page and should copy it over today!

My interactive development environment is just KEdit with about 100 macros and some careful use of file system directories.

I'm using SQL Server only to record the data from the users and for the results of some batch computations; at one point I actually do make use of a transaction; the data for the searches is drawn from SQL Server with a batch program (run it maybe once a day); some solid state drives (write rarely, read thousands of times a second) should do wonders for the data for the searches.

I was about to fix the bug in the Web page but took some opportunities to gather some good initial data. The site will start focused and only slowly grow to be comprehensive -- right, at first do some things that don't scale and please some niche group of users a lot instead of trying to please 2+ billion users a little.

Currently the database has only some meaningless data I put in for first testing of the software. It's about time to load in some of the good initial data I have. Then give a critical review, go live, etc.

It's getting there.

All the work uniquely mine has been fast, fun, and easy, but the whole project has taken far, far, far too long. Why? I worked through about a cubic foot of books and 6000+ Web pages of documentation of Windows, .NET, Visual Basic .NET, ASP.NET, ADO.NET, SQL Server, etc.

The main problems: (1) Badly written, obscure documentation (worst bottleneck in the future of computing); (2) computer viruses; (3) SQL Server installation bugs destroying my boot partition requiring rebuilding starting with the XP DVD (barbed wire enema with an unanesthetized upper molar root canal procedure); (4) SQL Server management and administration (e.g., a week of throwing stuff against a wall to see what sticks just to get a SQL Server connection string that will let code for a server side Web page connect with SQL Server); (5) clean, smooth means of system backup and recovery (including for both user data and bootable partitions); (6) Sony DVD drives that quit for no good reason (and inability to buy more IDE DVD drives).

Good stuff: (1) KEdit and its macro language; (2) the scripting language Rexx (Microsoft's PowerShell may also be terrific but have yet to move to it); (3) NTFS (fantastic); (4) Visual Basic .NET design, functionality, speed of compilation, compiler error messages, minimal bugs (sweetheart language); (5) what ASP.NET does when it compiles a Visual Basic program (enough for a really nice IDE); (6) NTBACKUP (once understand how to use it, e.g., do have to ask to save "system state", whatever the heck that is, or the saved copy, restored, won't boot -- learn this and how to get around it the hard way, weeks of work); (7) XCOPY; (8) the tools to have server side Web page code write to a log file; (9) Firefox (except for virus vulnerabilities); (10) the classes in the .NET Framework (once learn how to learn about them and use them); (11) Adobe Acrobat (except for virus vulnerabilities); (12) the ability of XP to find device drivers and recognize new devices; (13) Microsoft's anti-virus Safety Scanner (if only from CP67/CMS and Multics, there should be no virus vulnerabilities, but since there are the safety scanner is terrific to have); (14) Knuth's TeX; (15) the Western Digital Passport Ultra 2 TB USB drive!

I'm not retired or retiring! Likely I'm still 100% unemployable at anything that would pay enough to let me keep a car going to commute to the job. Y Combinator and VCs want nothing to do with me.

But if I can get my Web site up to a search a second, on a server for about $2000, I will be in decent shape financially and on the way to organic growth for my business and much more.

Then I'll get a nice house, a building for some cars, at least one Corvette, visit the rest of the family still alive, take off two weeks to pig out on lobster in Maine, get some good grape juice from between Beaune and Dijon, do some cooking, give some dinner parties, go to concerts and operas, continue with the business, go to seminars on mathematical physics, etc.

I'll implement and deploy my server farm monitoring techniques and maybe spin it off as a separate business.

I have some guesses for some approaches to real AI and might try to implement those.

And I will get a kitty cat! We'll see!

Good luck on your work with math. Maybe what I typed in here will help; I wish I'd known all that when I started.


Writing this as a reply rather than an email because there's no contact info on graycat's profile.

Hi graycat,

I'm an applied math & CS undergrad at Caltech and I love all of your posts, especially the ones about how data center scale computing really needs to embrace statistics. The FedEx stories are also excellent. I'm personally interested in high performance computing and machine learning, but I'm also interested in solving "real problems" and like how you seem to focus on the actual value of the applications. I love the feeling of engineering solutions to mathematical problems, and this seems to be something that you also enjoy.

I'd really like to hear more about your career, research, and also the startup that you're working on. I'd be very happy if you shoot me an email at eric@ericmart.in

Best, Eric


Dear Graycat,

I really enjoy your posts and your writing style. Thanks much. I am currently working on some math problems as a part of a business that makes accessible to Indians who do not have access to structured banking services. Would love to discuss it with you, if you are inclined. My email is takenottie at google's mail. Thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: