I had a colleague that told me that his most productive period was when he was stuck at the hospital for a couple of weeks but was able to do some coding.
Here are some tips:
- Find the best cruises most easily on cruisesheet.com (disclosure: it's Tynan's project)
- Royal Caribbean has the best internet at sea through O3b. In the Caribbean or Mediterranean it's about 70ms latency. In the middle of the Atlantic it's about 220ms latency. So Skype may work but the delays are annoying. Plenty of bandwidth now though. You may have only "one 9" service on average though. So scheduled conference calls are always a gamble.
- Repositioning cruises (many ships move from the Mediterranean to the Caribbean and back seasonally) in April/May and October/November are the cheapest cruises you'll ever find. 5-8 days at sea means plenty of time to get some work done, as well as goof off a bit during the evenings. For the second week (Europe or Caribbean), I find it easy enough to get an hour or two of email catch-up in on port days after getting back from a shore excursion, but it's easier to just say port day = vacation, sea day = work and great food.
A 2-week repositioning cruise may cost as little as $600 per person including taxes and fees. Add another $200pp for gratuities, a few $hundred for shore excursions, and a few $hundred for airfare to get back home. Depending on where you go, you may get the benefits of a 2-week vacation for the price of one on land.
Similarly, if you want to do a one-week offsite with your startup (particularly if you're all remote most of the time anyway), this is probably cheaper than flying your crew to any big city.
The seminars-at-sea model was also well proven out by geekcruises.com, now renamed insightcruises.com
My wife and I have been renting an off-season beach house near Boston for 8 months out of the year for less than the price of a basement studio in Cambridge, then traveling during the summer. This means our repositioning cruises are further discounted by the fact that we aren't paying rent or mortgage on an empty house. It's hard to beat if you both have good schedule flexibility.
People throw around words like "revolution" for the current deep-learning push. But it's worth remembering that the fundamental concepts of neural networks have been around for decades. The current explosion is due to breakthroughs in scalability and implementation through GPUs and the like, not any sort of fundamental algorithmic paradigm shift.
This is similar to how the integrated circuit enabled the personal computing "revolution" but down at the transistor level, it's still using the same principles of digital logic since the 1940s.
Yes, neural networks have been here for a while, gradually improving, but they were simply non-existent in many fields where they are now the favored solution.
There WAS a big fundamental paradigm shift in algorithmic. Many people argue that it should not be called "neural networks" but rather "differentiable functions networks". DL is not your dad's neural network, even if it looks superficially similar.
The shift is that now, if you can express your problem in terms of minimization of a continuous function, there is a new whole zoo of generic algorithms that are likely to perform well and that may benefit from throwing more CPU resources.
Sure it uses transistors in the end, but revolutions do not necessarily mean a shift in hardware technology. And, by the way, if we one day switch from transistors to things like opto-thingies, if it brings a measely 10x boost on performances, it won't be on par with the DL revolution we are witnessing.
You could have started your PhD in 2006, made professor in 2012, and nearly everything you had learned would have been _completely_ different.
These were at the root of many detectors. They still are for some applications but for most of them, a few layers of CNN manage to train far better and very counter-intuitive detectors.
Facial detection/recognition was based on features, this is not my specialty, I don't know if DL got better there too as their features were pretty advanced but if they are not there yet I am sure it is just a matter of time.
I can see image stitching benefiting from a deep-learned filter pass too.
Camera calibration is pretty much a solved problem by now, I don't think DL adds a lot to it.
Like I said, not everything became obsolete, but around 50% of the field was taken over my DL algorithms where, before that, hand-crafted algorithms had usually vastly superior performances.
Traditionally, neural networks were trained by back propagations, but many implementations had only one or two hidden layers because training a neural network with many layers (it wasn't called "deep" NN back then) was not only hard, but often led to poorer results. Hochreiter identified the reason in his 1991 thesis: it is because of the vanishing gradient problem. Now the culprit was identified but the solution had yet to be found.
My impression is that there weren't any breakthroughs until several years later. Since I'd left that field, I don't know what exactly these breakthroughs were. Apparently, the invention of LTSM networks, CNNs and the replacement of sigmoids by ReLUs were some important contributions. But anyway, the revolution was more about algorithmic improvement than the use of GPUs.
The GPUs and dataset size were definitely very important though.
I think I cannot agree with this. There has been a lot of improvements to the algorithms to solve problems and the pace has speed up thanks to GPUs. You just cannot make a neural network from 15 years ago bigger and think it is going to work with modern GPUs, it is not going to work at all. Moreover, new techniques have appeared to solve other type of problems.
I am talking about things like batch normalization, RELUs, LSTMs or GANs. Yes, neural networks still use gradient descent, but there are people working on other algorithms now and they seem to work but they are just less efficient.
> This is similar to how the integrated circuit enabled the personal computing "revolution" but down at the transistor level, it's still using the same principles of digital logic since the 1940s.
This claim has exactly the same problem as before. You can also say evolution has done nothing because the same principles that are in people they were there with the dinosaurs and even with the first cells. We are just a lot more cells than before.
[Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (8 October 1986). "Learning representations by back-propagating errors". Nature]
"Back-propagation allowed researchers to train supervised deep artificial neural networks from scratch, initially with little success. Hochreiter's diploma thesis of 1991 formally identified the reason for this failure in the "vanishing gradient problem", which not only affects many-layered feedforward networks, but also recurrent networks."
With how fast things are evolving, developing something like an ASIC ($$$) for this might be outdated before it even hits release, no?
ASICs will definitely be very helpful, and there's currently a bit of a rush to develop them. Google's TPUs might be one of the first efforts, but several other companies and startups are looking to have offerings too.
NVidia's Volta series includes tensor cores as well . So far, I think they've only released the datacenter version, which is available on EC2 p3 instances .
backprop is a very simple algorithm, nothing to fear there. The problems are to calculate the derivates if you want to be flexible building your model. But for feedforward networks with sigmoid activation, the equations to update the weights are a joke.
So it's probably as simple as
newWeight = oldWeight +/- (stepValue * someFactor)
You won't regret. One of the best explanation of backprop on the internet.
But I meant, if you see the equations and the steps without understanding completely the insights, it is a joke of an algorithm. It just does some multiplications and applies the new gradients, move to the previous layer and repeat.
I wrote about my specific recommendations here: https://robots.thoughtbot.com/you-should-take-a-codecation.
(That being said, passion / perseverance / intelligence can often lead to focus)
Arent those more of a prerequisites for focus? Focus is merely a side effect not something you can aim for.
In the case of our painter, if we're talking about Michaelangelo or something, he'd likely come out with an excellent IQ.
As an aside, IQ tests don't try to make you do math problems and the like, it's all pattern matching kinds of questions where you learn all of the necessary context in the test itself. Which makes sense right, it's trying to measure your ability to learn and generalize, not what you happen to know before you take the test.
One I recall vividly was basically a page full of weird symbols and a "key" at the top mapping symbols to numbers. I had to go through the whole page converting the entire page and various elements of my performance (variance of speed, attention, etc.) were measured during the task. This does not really correlate to what I'd consider focus or "flow" during typical programming tasks, however, but it did try my patience! :-)
You can probably go through the whole thing including assignments in under a week full-time.
I found myself knowing how to create CNN's, but the why of the entire process still feels under-developed. But I'll admit it could be because my understanding of Calc and Linear Algebra was far more under-developed back when I was studying the course than it is now.
Pity that you pay so much money to attend Stanford only to be taught by your peers. Not knocking on Stanford as this is how is being done much everywhere in the undergrad level now.
So these grad students are exactly the people you want to learn from because they have done the dirty work of fiddling with parameters to know what tricks work and what doesn't. It's probably preferable to a more theory-heavy course because very few people (not even the more experienced professors) understand why those tricks work.
Note: I took an older version of the course which was started by Andrej Karpathy who was a grad student at the time but is now the Director of Artificial Intelligence at Tesla.
It takes time though and one has to combat the "how come you're reinventing the wheel" comments from co-workers, spouses, bosses, etc., which can be a challenge.
Also, if it is intimidating to do a whole project from scratch, break it down into parts and tackle those first.
Small successes in the beginning can be the boost you need to finish the project.
That is the bane of doing probabilistic code. Errors show up not as clear cut wrong values or crashes but as subtle biases. You are always wondering, even when it is kinda working, is it REALLY working or did I miss a crucial variable initialization somewhere?
Both activities very much resemble chaotic systems and they are both very challenging to debug.
I am the same kind of person. But when John Carmack approaches this with scepticism and concludes that it is indeed not over-hyped, I guess its worth learning after all!
CS231N, here I come.
This is a lesson for me and probably many others. Don't get hung up on tools, ship!!!
What matters is making a cool game with whatever tech is available and ship.
This perspective makes it supremely frustrating anytime I try to get a 'normal' programming job, and run into the 'but do you have .NET4, ASP.NET Core, and JS6??' mentality.
makes note: not normal programming material
Maybe I'm reading you wrong. But if I am right, it's good to take a look outside of the Unix bubble. Visual Studio is literally the world's most sophisticated developer tool. More human hours of engineering have been poured into it than likely any other piece of software we use on a daily basis.
Windows isn't my jam, but VS is incredible.
He took a week off to focus on a problem set, and ended up spending a number of hours working around the limitations of his setup instead.
I mean, I don't want to sound judgmental. Perhaps that was part of his plan. It just stuck out to me as seeming orthogonal to his expressed goals.
I ended up writing the most basic feed-forward network in C; although I didn't use base libs like Carmack :(
It’s pretty cool that so much complexity can come from a few small rules or equations.
It looks a little something like this: I'll be reading a manpage and notice another manpage referenced at the bottom. So I obviously keep crawling this suggestions tree until I bump into a utility whose purpose is unclear. So then I'll go searching online to try and figure out what kinds of problems or use-cases it's meant to help with.
The question becomes; if its not C++, then what is the first class citizen on OpenBSD. And if it is C++ then how do we improve support? Just going to ports seems like a poor answer.
Being an UNIX derivative the answer is quite easy, C.
He would have had a much better time using C99 than C++11, especially since he didn't want to install a newer gdb from ports.
I was just speaking from language point of view, without regard to specific implementations.
John has been experimenting with a lot of stuff -- Racket, Haskell, Computer Vision and now Neural Networks. I guess there is no professional intent, but the spirit of hacking lives on.
After a several year gap, I finally took another week-long programming retreat, where I could work in hermit mode, away from the normal press of work. My wife has been generously offering it to me the last few years, but I’m generally bad at taking vacations from work.
As a change of pace from my current Oculus work, I wanted to write some from-scratch-in-C++ neural network implementations, and I wanted to do it with a strictly base OpenBSD system. Someone remarked that is a pretty random pairing, but it worked out ok.
Despite not having actually used it, I have always been fond of the idea of OpenBSD — a relatively minimal and opinionated system with a cohesive vision and an emphasis on quality and craftsmanship. Linux is a lot of things, but cohesive isn’t one of them.
I’m not a Unix geek. I get around ok, but I am most comfortable developing in Visual Studio on Windows. I thought a week of full immersion work in the old school Unix style would be interesting, even if it meant working at a slower pace. It was sort of an adventure in retro computing — this was fvwm and vi. Not vim, actual BSD vi.
In the end, I didn’t really explore the system all that much, with 95% of my time in just the basic vi / make / gdb operations. I appreciated the good man pages, as I tried to do everything within the self contained system, without resorting to internet searches. Seeing references to 30+ year old things like Tektronix terminals was amusing.
I was a little surprised that the C++ support wasn’t very good. G++ didn’t support C++11, and LLVM C++ didn’t play nicely with gdb. Gdb crashed on me a lot as well, I suspect due to C++ issues. I know you can get more recent versions through ports, but I stuck with using the base system.
In hindsight, I should have just gone full retro and done everything in ANSI C. I do have plenty of days where, like many older programmers, I think “Maybe C++ isn’t as much of a net positive as we assume...”. There is still much that I like, but it isn’t a hardship for me to build small projects in plain C.
Maybe next time I do this I will try to go full emacs, another major culture that I don’t have much exposure to.
I have a decent overview understanding of most machine learning algorithms, and I have done some linear classifier and decision tree work, but for some reason I have avoided neural networks. On some level, I suspect that Deep Learning being so trendy tweaked a little bit of contrarian in me, and I still have a little bit of a reflexive bias against “throw everything at the NN and let it sort it out!”
In the spirit of my retro theme, I had printed out several of Yann LeCun’s old papers and was considering doing everything completely off line, as if I was actually in a mountain cabin somewhere, but I wound up watching a lot of the Stanford CS231N lectures on YouTube, and found them really valuable. Watching lecture videos is something that I very rarely do — it is normally hard for me to feel the time is justified, but on retreat it was great!
I don’t think I have anything particularly insightful to add about neural networks, but it was a very productive week for me, solidifying “book knowledge” into real experience.
I used a common pattern for me: get first results with hacky code, then write a brand new and clean implementation with the lessons learned, so they both exist and can be cross checked.
I initially got backprop wrong both times, comparison with numerical differentiation was critical! It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made.
I was pretty happy with my multi-layer neural net code; it wound up in a form that I can just drop it into future efforts. Yes, for anything serious I should use an established library, but there are a lot of times when just having a single .cpp and .h file that you wrote ever line of is convenient.
My conv net code just got to the hacky but working phase, I could have used another day or two to make a clean and flexible implementation.
One thing I found interesting was that when testing on MNIST with my initial NN before adding any convolutions, I was getting significantly better results than the non-convolutional NN reported for comparison in LeCun ‘98 — right around 2% error on the test set with a single 100 node hidden layer, versus 3% for both wider and deeper nets back then. I attribute this to the modern best practices —ReLU, Softmax, and better initialization.
This is one of the most fascinating things about NN work — it is all so simple, and the breakthrough advances are often things that can be expressed with just a few lines of code. It feels like there are some similarities with ray tracing in the graphics world, where you can implement a physically based light transport ray tracer quite quickly, and produce state of the art images if you have the data and enough runtime patience.
I got a much better gut-level understanding of overtraining / generalization / regularization by exploring a bunch of training parameters. On the last night before I had to head home, I froze the architecture and just played with hyperparameters. “Training!” Is definitely worse than “Compiling!” for staying focused.
Now I get to keep my eyes open for a work opportunity to use the new skills!
I am dreading what my email and workspace are going to look like when I get into the office tomorrow.
That said, I chuckled at his note about being contrarian, as he writes a post on FB
During the Zenimax/Oculus case:
He claimed that he never wiped his hard drive: An independent court expert found that most of his hard drive was wiped after Carmack heard about the lawsuit. So Carmack lied in his affidavit.
He claimed that no source code from Zenimax ever got transferred over to Oculus. Then later admitted that the emails he had taken form his Zenimax laptop on his last day there did contain source code. He denies that the source code in the emails benefitted him and that he "rewrote" all the code anyways. But this runs counter to the testimony of Oculus programmers who admitted they copied Zenimax code straight into the Oculus SDK.
He also has not outright denied the copy claims from the testimony of David Dobkin's in which Dobkin testified about the similarity between the source code in Zenimax and the source code in Oculus's SDK. Carmack instead accuses him of doing it for money and argued that the methodology wasn't very robust. But no denial, just ad hominem attacks.
So why do people still fawn over him when it seems like his ethics are dubious?
(Granted it could be possible that the technical aspects of the case went over the heads of jurors. And I will admit to being wrong if the Oculus appeal ends up revealing more information.
But his HD was discovered to be wiped, and even Oculus programmers admitted to copying code. Why does he get a free pass on this?)
I understand why John doesn't own the fruit of his labor. I understand why John is willing to lie, cheat, and steal the people benefiting from the system that takes the fruit of his labor from him.
If you don't like it, build a better system that prevents it.
Just come to seattle.
Also, it's a false equivalency, stealing IP is far more sinister than smoking weed.
But is there any point in bringing it up now? I mean, whether or not he did something illegal/unethical, does this now mean that literally every single time he writes something, someone feels the need to chime in with "but remember that bad thing he did once?".
I'm not saying this isn't a legit conversation or that you're unethical or something for bringing it up... I just really think this isn't the time or place for this.
But this seems to be a trend right now. Our tolerance for any sort of moral ambiguity or uncertainty has sunk to such low levels that we can no longer appreciate someone's work or insight without establishing that this person is flawless in every respect.
> Late one night Carmack and his friends snuck up to a nearby school where they knew there were Apple II machines. Carmack had read about how a thermite paste could be used to melt through glass, but he needed some kind of adhesive material, like Vaseline. He mixed the concoction and applied it to the window, dissolving the glass so they could pop out holes to crawl through. A fat friend, however, had more than a little trouble squeezing inside; he reached through the hole instead and opened the window to let himself in. Doing so, he triggered the silent alarm. The cops came in no time.
> The fourteen-year-old Carmack was sent for psychiatric evaluation to help determine his sentence. He came into the room with a sizable chip on his shoulder. The interview didn’t go well. Carmack was later told the contents of his evaluation: “Boy behaves like a walking brain with legs ... no empathy for other human beings.” At one point the man twiddled his pencil and asked Carmack, “If you hadn’t been caught, do you think you would have done something like this again?”
> “If I hadn’t been caught,” Carmack replied honestly, “yes, I probably would have done that again.”
> Later he ran into the psychiatrist, who told him, “You know, it’s not very smart to tell someone you’re going to go do a crime again.”
> “I said, ‘if I hadn’t been caught,’ goddamn it!” Carmack replied. He was sentenced to one year in a small juvenile detention home in town. Most of the kids were in for drugs. Carmack was in for an Apple II.
Bill Gates: Steals a bulldozer to race with his buddies at age 19(?), is responsible for Microsoft's corporate culture and history of predatory behavior. Hackernews loves him and can't get enough of Microsoft. Boy, they're a fair sight better than Google, eh lads?
I pity the person who learns ethics from Hackernews.
There is a big difference between wrapping up a source control tree and having fragments in emails.
Also let's not pretend that the idea that you can remember something and re-type it is fine, but if you copy and paste it from an email, that's terrible. The line is pretty blurry.
People fawn over John because he shows a great spirit of the hacker. A hacker should be judged by his/her hacking.
On the other hand, I found that many people consider me wrong when I judge a hacker by his/her hacking when he/she is accused of sexual abuse. But apparently they are okay with someone accused of IP crime.
If you judge the person as such (that entity which you're calling "the hacker"), then, though both are part of the person, the abuse is more relevant than the hacking. At least, I would guess, in most people's estimation.
Referring to that person using the noun phrase "the hacker" rather than "the abuser" doesn't make the hacking a more relevant measure of the person.
Yup! Why do you suppose that might be?
The history of this business is written by IP thieves who, once they get big, turn around and lobby Congress for tougher IP laws, to criminalize the very methods they used to get big in the first place, so that the next gang of thieves will not be able to steal so effectively.
What Carmack did is not egregiously unethical for this business. If anything he embodied the hacker spirit of "fuck the rules". It's on dubious ethical ground and certainly not the cautious approach used by Stallman, who colors assiduously within the lines so no one could ever accuse him of anything untoward. But again, look at the history of this business.
Anyway, I will probably always read or listen to anything he has to say on the subject of programming.
> Why does he get a free pass on this?
Because many people draw a distinction between something being "unethical" vs being "illegal". What exactly do you think is the problem with his ethics? How serious do you think it is?
Why should I care?
The guy is an absolute legend.
While I dont't believe that one could argue that FreeBSD isn't focused on quality, nor craftsmanship, it always struck me as lacking a consistency. FreeBSD on the surface seem focused, because the base system is bundled, like with the other BSDs. They don't however seem to have a clear syntax for tools and configuration files. The worst offender, and this is just my personal opinion, is ZFS. ZFS makes no attempts to hide that it's bolted in. In terms of configuration there's no doubt that it was lifted from Solaris.
OpenBSD have less features that FreeBSD, or Linux, but the features that are available are clearly made by one team with the same focus and direction.
That being said, I don't think we should put to much emphasis on Carmacks choice of operation system in this case.