[1] StackOverflow has VERY FEW tests. He says that StackOverflow doesn't use many unit tests because of their active community and heavy usage of static code.
[2] Most StackOverflow employees work remotely. This is very different than a lot of companies that are now trying to force employees back into an office.
[3] Heavy usage of Static classes and methods. His main argument is that this gives them better performance than a more standard OO approach.
[4] Caching even simple pages in order to avoid performance issues caused by garbage collection.
[5] They don't worry about making a "Square Wheel". If their developers can write something more lightweight than an already developed alternative, they do! This is very different from the normal mindset of " don't reinvent the wheel ".
[6] Always using multiple monitors. I love this. I feel like my productivity is nearly halved when I am working on one tiny screen.
Overall, I was surprised at how few of the "norms" that they follow. Either way, seems like it could be a pretty cool place to work.
It's not that no one follows the norms or tests. On the Careers team we do much more automated testing because there's money and literally people's jobs at stake. We have unit tests, integration tests and UI tests that all run on every push. All the tests must succeed before a production build run is even possible.
I see more and more static methods and classes last 2 years maybe. It's probably more about the stateless design and less side effects, but it definitely helps garbage collection if you avoid classes at session scope or smaller. In OOP there is another pattern that helps - object pools, but it's a lot of work to get it to work correctly and it's not as efficient.
If most of your classes are small, I don't see why people have to resort to static methods.
If your methods are static, there are tendency/lust to use static member variables (hence stateful) which will cause side effects.
Don't forget the following points too:
1) You still have pooled objects somewhere (stateless business logic classes like XYZServices, repository classes that may be backed by pooled DB connections and Transaction Managers) provided/managed by your Application Server or by 3rd-party framework (Spring does this).
2) Your Application Server tend to have beefy hardware, good enough not to care of GC hiccups.
There are other reasons to use static methods but I don't think they're strong enough in this case.
Small classes are actually worse GC-wise. Because they will fill up the GC graph with many small nodes as opposed to fewer large nodes, which are released in bulk with little fragmentation. Small and large nodes have the same GC overhead essentially. In general you want your objects to be large. When they are small, once the GC realizes what's going on (usually at some high threshold 90% or so), it will have to run a some O(n^x) graph reduction algorithm or defragmenataion. Special tuning is required for such cases. Beefy hardware doesn't help in many cases due to locks. There are very few production-ready lock-less GCs.
Object lifetime is a much more important factor than class size for most server request / response style processing.
Typically there are three lifetimes for objects in server processes. Those that are allocated around startup and are never deallocated; those that are allocated per-request and become garbage once the response goes out; and lifetimes that span multiple requests, like objects in caches.
The first are normally ultra-cheap to "collect": with a generational GC, you simply don't scan them at all, because they haven't changed.
The second group, per-request, are also fairly cheap to collect. Every so often, you GC the youngest generation, and you only need to keep track of references in registers and on the stack. Ideally many requests will have occurred between collections, and the only objects that get kept alive are objects that are in-flight for the current request. And this is why you need at least three generations; you really don't want to have to scan the oldest generation to collect these ephemeral objects after they've built up over a number of youngest generation collections.
It's the third group that kills you. You can save on the cost of scanning the whole heap, using write barriers to track new roots buried in the oldest generation; but that adds accounting costs, and eventually overtakes the cost of a whole heap GC. These guys can also cause the fragmentation you're worried about - they need to be compacted down, copied possibly multiple times. On the CLR, last time I checked, you need a full gen2 GC in order to get rid of them, as they've likely survived a gen1 collection.
With these guys, it's worthwhile doing the big object thing. In fact, it may be worthwhile not having any GC heap storage for them at all, and refer to them using different techniques, like ephemeral keys that look up in Redis, or native pointers stored in statically allocated arrays.
In app servers I've designed, I've never seen GC CPU usage over 5% or so, even with heavy usage of tiny short-lived objects. But you need to care about lifetime.
We're talking in the context of stateless Request <-> Response of the Web-Application nature here.
When a Request comes in, the App-Server will allocate (or use from the pool) a thread to serve that Request (in .NET/JVM world, Ruby/Python uses Processes unless you use different App Server).
If you create small objects within the scope of that Request (which usually lives inside a method) and that objects are contained and don't hold references to any long-lived objects, they will be GC-ed quickly (and potentially way quicker) once the method is finished.
Thread is GC-ed as well once it's finished (unless you wish to release them back to the 'unused' pool).
My feeling is that their use of static methods have nothing to do at all with GC.
The Stack Overflow Q&A dev team has 2 people in New York, out of a team of 10 team. The Careers dev team is more New York heavy, 3 remote and 5 in New York. The sysadmin team is also quite remote, though I don't know the breakdown offhand.
I believe at this point most new technical hires are remote.
Our offices are mostly sales, Denver and London exclusively so.
I saw that Jason went remote recently. Any particular reason so many devs are going remote? Is it people making individual decisions or the company providing new incentives to do so? My impression when you were at 55 was that most devs worked at the office (I've been at Fog Creek since a little before you guys moved. Hi!).
People making individual decisions. All else equal, we'd slightly prefer to have people in NYC, because we think the in-person time is a plus for the casual interaction that happens in between "getting things done". But we've set our selves up to make real work and official team collaboration work almost entirely online. We've learned that the in-person benefit is more than outweighed by how much you get from being able to hire the best talent that loves the product anywhere, not just the ones willing to live in the city you happen to be in.
The most common reason for someone going remote (that I'm aware of) is starting a family. New York's great, but spacious it is not.
I can think of 3 devs who have gone remote, and 2 devs (including myself) who have moved to NYC since I've been here. Most people stay wherever they were hired. The only location-specific policy I'm aware of is a cost-of-living adjustment in NYC (though that may also apply to London/SF/etc., I don't honestly know).
The most important thing, technically, is having great developers who ship.
For piths sake, I want to say "Everything else is noise" but that isn't true. Everything else can help or hurt, depending on the application and how doctrinaire the application of a given approach/methodology is, the organizational knock on effects (e.g. "Mr Tough Guy Testalot" holds up the release train or nukes your architecture to make it 'testable'), etc. but, seriously, "great developers who ship" is really what moves the needle.
Having a great Ops staff also helps ;) Of note is Thomas Limoncelli who wrote "The Practice of System and Network Administration" [1] and "Time Management for System Administrators" [2] works for Stack Exchange (formerly at Google). The Practice of System and Network Administration is basically the bible for most sysadmins, myself included.
ps. I only singled Thomas Limoncelli out as an example just to highlight the caliber of their Ops staff.
Every feature/user story has to go through a workflow of selected for development -> UX design(If required) -> Development -> Unit Tests(Or the otherway around) -> Staging -> Load Test -> Acceptance Test -> Production -> Analytics(To see if people actually use it) -> Learn from analytics -> back to start if required.
The goal is to get as many issues through the workflow as fast and rigoursly(no shortcuts) as possible at a sustainable pace. Have a continuous flow of features rolling out through this process. Ideally with continuous delivery to automate the majority of it.
Everything else is noise. If you have great developers who ship, then by definition you don't have doctrinaire methodology or "Mr Tough Guy Testalot" (I generally find "Mr No Test" to be a much bigger problem anyway). You might the situation where you have great devs but bad management, but that's next to impossible in the real world.
There's really only two steps to great software development.
He mentioned that they use the servicestack.text library. I've looked into servicestack recently (using the nuget packages), but then found the library to be pay-to-play. There's an older version (v3) that is BSD licensed that is being maintained. Do any of you have experience with it? I have grown tired of Microsoft pushing new solutions to the same problem (REST service with WCF and then Asp.net web api).
Technically we use Newtonsoft and Jil, Jil replacing Newtonsoft as we become increasingly confident in it.
I wouldn't suggest anyone use Jil in a production role unless you're at Stack Overflow. It's too untested at the moment, and the typical person can't get me on the horn to fix whatever just broke.
You wouldn't right now (Kevin doesn't recommend it). But in the end it you'll want to use it if JSON serialization is a performance bottleneck for you.
ServiceStack is just plain awesome when it comes to developing web services, though it's gone commercial for v4 onward. Nancy is another popular alternative - it's basically Sinatra for .Net. Every time I go back to WCF I want to stab myself in the face.
It sounds like they are all SQL Server instances. However, he made it seem like they are reproducing the schema once per site? I.e., a separate database per site rather than sharding the shared data to multiple hosts per site. Did I hear this right in the question/answer portion?
Stack Exchange has one database per-site, so Stack Overflow gets on, Super User gets one, Server Fault gets one, and so on. The schema for these is the same.
There are a few wrinkles. There is one "network wide" database which has things like login credentials, and aggregated data (mostly exposed through stackexchange.com user profiles, or APIs). Careers Stack Overflow, stackexchange.com, and Area 51 all have their own unique database schema.
All the schema changes are applied to all site databases at the same time. They need to be backwards compatible so, for example, if you need to rename a column - a worst case scenario - it's a multiple steps process: add a new column, add code which works with both columns, back fill the new column, change code so it works with the new column only, remove the old column.
tldw; He says he doesn't advocate it but they get away with it by having the community test it out for them in their meta site. Then the community writes up the bugs.
User community as testers presents some interesting pros and cons.
Pros:
* Tests are self-updating. Add a new feature: tests come in for free. Change a feature: tests automatically update. Fail to document a change: tests fail.
* Tests are unusually thorough
* Eventually consistent testing. If nobody ever complains, it probably wasn't a bug worth fixing.
Cons:
* Tests cannot be run offline. Feature must be committed and deployed before tests can be run.
* Potentially large quantity of false positives (bad bug reports)
* Potentially large quantity of false negatives (nobody notices particular bug, release considered good)
* Does not work for non-user-visible features
So basically you trade the reliability of your tests for a substantial build/release speedup. Some users experience each bug, but they are the users who are actively using the meta-community and have signed up to experience more bugs. Still, lack of pre-release unit testing must radically increase the importance of VERY careful code reviews.
Not the decision I would have made, but definitely has the sorts of advantages that a small team of engineers drool would drool over.
Remember that our community writes bug reports but also vets bug reports. We rarely have to deal with bad reports. Interestingly, large quantities of false negatives are a non-issue.
Presumably the same reason why they don't have a ton of bad questions on stack overflow: their community scoring would apply just as much to bug reports
To be clear: there is no version of the actual Stack Overflow code that is publicly available. There are, however, numerous open-source reimplementations of portions of the site code.
I wouldn't trust Joel Spolsky's code expertise -- just look at Excel internals!
Nevertheless, Stack Overflow is super cool. But that tells nothing about its architectural quality.
[1] StackOverflow has VERY FEW tests. He says that StackOverflow doesn't use many unit tests because of their active community and heavy usage of static code.
[2] Most StackOverflow employees work remotely. This is very different than a lot of companies that are now trying to force employees back into an office.
[3] Heavy usage of Static classes and methods. His main argument is that this gives them better performance than a more standard OO approach.
[4] Caching even simple pages in order to avoid performance issues caused by garbage collection.
[5] They don't worry about making a "Square Wheel". If their developers can write something more lightweight than an already developed alternative, they do! This is very different from the normal mindset of " don't reinvent the wheel ".
[6] Always using multiple monitors. I love this. I feel like my productivity is nearly halved when I am working on one tiny screen.
Overall, I was surprised at how few of the "norms" that they follow. Either way, seems like it could be a pretty cool place to work.