
The Surprising Path to a Faster NYTimes.com - danso
https://speakerdeck.com/nytdevs/the-surprising-path-to-a-faster-nytimes-dot-com
======
saturdaysaint
Improvements like that at Nytimes.com have had me happily paying for my daily
news for the first time. The user experience is just orders of magnitude
better than anything I've found that's available for free. Almost reminds me
of when Google maps came out and completely changed expectations for a map
site. The iOS app also impressed me: it's great at caching stories, so even in
crappy network areas it feels like a "broadband" newsreading experience (kind
of amazing how long it took for a news app to accomplish this).

~~~
Curmudgel
I find the user experience on the new site to be much worse than the new site.
When I'm reading an article I really don't care about navigating to the rest
of the site. I have a small widescreen laptop, which means that the fixed
headers at the top give me less room to read the article. I don't need gobs of
white space to read the news article in print, so why is white space so
important on the web that navigation has to be hidden in one of those awful
hamburger buttons? The text is harder to read because it has lower contrast.
The new comment system is totally unusable. I don't really appreciate large,
useless images interrupting the flow of the article. And I don't need
Javascript to read a newspaper article (the site loads like a pig with
Javascript) because I'm going to be navigating to a different page when I read
another article. It's faster to just open the home page and open each article
in a new tab (with JS disabled).

------
nanoscopic
The slides say there are 1 million pages, and "republishing" them would take
90 days. Maths -> 7.8 seconds on average to "republish" a page. Modern
templates systems can convert a page constructed out of structured data in
less than 1/4 of a second. ( and that is a high estimate ) That is ~30 times
faster, meaning all pages could be "republished" in 3 days instead of 90 had a
more efficient system been used from the start.

Focusing on shifting all "rendering" into front end JS seems like it will lead
to more difficulty in the long run instead of using a more efficient
structured page creation mechanism.

I am curious how the static pages were created. Others here are speculating
that templating was not done. If not; what does "republishing" mean exactly?

~~~
bonaldi
Their source material here is 1 million pages of HTML, they don't have (on my
reading) some separate source of "structured data" for the modern template
system to use.

It seems reasonable (and possibly low) for a 90 days estimate to extract the
content from the variety of versions of static page, structure it, and then
publish it in a more modern fashion.

It's all very well to say they should have used a more efficient system from
the start, but "the start" in this case is 1996, which is the wild west in
terms of best practices.

~~~
thezilch
In 1996 (or as late as early 2000s), many published texts were via tools like
Dreamweaver, where the publisher would checkout / lock a file, make static
changes, and save the file, directly on prod. Like you said, it's not
surprising at all they have a large portion of their articles in static HTML
files.

------
hyperpape
The point about elements on the page shifting around is huge. I've personally
dreamed of a browser change that would make reflows that are out of the
current browser viewport not change your position on the page. If you have a
slow connection or view certain types of content (liveblogs, etc) this can
become a huge pain.

Just thinking about it raises all sorts of questions about whether the
browser/rendering engine can actually reliably know that information, but it
doesn't mean I can't dream.

~~~
malyk
If you have fixed ad sizes and known locations you could make those spaces
empty boxes of the correct size and then asynchronously fill them with data. I
haven't tried it, but it seems like it should work.

------
stdbrouw
I can imagine static pages getting really annoying at this scale, and it also
seems like a no-brainer to have your content in a database... but the nerd in
me did think "page rendering can be trivially parallelized – why not throw
some map/reduce at it?"

------
jamessantiago
I've been really impressed with the quality of new york times posts as of
late. The post "Norway the Slow Way" posted here a few days back was
impressive in its use of a variety of frontend display techniques to tell a
single story. Even their web console output had some neat ascii art and a
hiring call to interested developers.

~~~
gizzlon
[http://www.nytimes.com/interactive/2014/09/19/travel/reif-
la...](http://www.nytimes.com/interactive/2014/09/19/travel/reif-larsen-
norway.html)

------
eitanmk
Hello Hacker News.

I'd first like to say that this is the deck from a presentation at Velocity NY
last week. Like most other talks, separating the slides from the presenter can
make interpreting the context difficult. I did try to make an effort to have
my slides provide useful information without me presenting them, but I
acknowledge that I may not have done enough in that regard. I also received
feedback from people present that there were too many bullet points and my
font was too small. Can't please everyone I guess. But if you have a link to
what you consider the "perfect" slide deck where unambiguous context is
maintained without video of the talk, I'd love to study it in order to
improve.

Other replies will be directed at the specific comment thread.

~~~
x110dc
Is there a video of your presentation? I'm interested in watching it.

------
DanielBMarkham
Static pages are a barrier to scaling _if you have a bunch of other stuff tied
in with them_ Stuff like HTML macros, CSS, and so forth.

My physical NYT copy from 1980 is fine. It was "published" this way, and it
stays this way.

What we're really saying is that if you want to go the static route, you can't
go half-way: everything that _is_ the page gets deployed in one file. I doubt
very many people who think they have static pages actually do.

------
andreasvc
What is WPO?

~~~
hawtshot
Web Performance Optimization

------
vkb
What really strikes me here, aside from the technical aspects, is the note on
p. 21 about how the project was supported from the top because SEO was lagging
as a result of site load time, and this line especially: "NYT became an
e-commerce site since the last redesign."

Once you are focusing on e-commerce and SEO as an executive team, are you
still committed to journalism?

~~~
acdha
If you're selling subscriptions, wouldn't that leave you more committed to the
journalistic quality readers want rather than letting advertisers dominate
that discussion?

~~~
vkb
Yes, if you're selling subscriptions. SEO is not meant to maximize
subscription revenue, but to get people to click for free, aka ads. Which will
it be for them?

Two recent pieces are leading me to believe that NYT is floundering around,
without a real cohesive online strategy, still:

[1][http://www.cjr.org/the_audit/the_new_york_timess_digital_li....](http://www.cjr.org/the_audit/the_new_york_timess_digital_li.php?page=all)

[2][http://www.cjr.org/the_audit/the_new_york_times_cant_abando....](http://www.cjr.org/the_audit/the_new_york_times_cant_abando.php?page=all)

------
akgerber
Perhaps off-topic, but recently it's appeared that NYT pages have had some
sort of JS memory leak when left open for a long time in Chrome.

------
ck2
They were keeping a million static pages on disk without any templating?

Whoa.

------
untilHellbanned
What was so surprising about the path? Wasn't clear from the deck.

~~~
sp332
Really? Each surprise gets a whole slide in bold text all to itself. #1. A lot
of static pages are a barrier to optimization. #2. Performance increase
demanded as part of redesign. #3. Sometime you have to slow down to seem
faster.

#1 really did surprise me, because I had always assumed that serving static
pages would be really fast. I guess I never thought about sites with millions
of pages.

~~~
nsfmc
for #1, it _could_ be fast, but the problem they're running up against is that
the data appears to be held up by a single bottleneck, similar to pre-sharded
database setups. in this case, their filesystemdb is hitting the limits of too
many connections saturating much of their disk's i/o.

a possible way of approaching that problem is a divide and conquer approach
with a reverse proxy that assigned manageable chunks of their content across
numerous machines each serving less than millions of pages. the nyt already
has a /<yyyy>/<mm>/<dd>/<section>/<subsection>/<slug> url scheme which would
make this less painful.

I'm not sure how inefficient this would be, though, certainly a time
investment, but it ends up offloading your disk i/o/ issues by creating more
and more s3 buckets (or what have you) and routing via a proxy. i'd be curious
to see when s3+cloudfront-as-host becomes too slow simply because of disk i/o
limitations, although s3 almost certainly has its own abstraction above the
bucket i'm not aware of which mitigates that.

it still doesn't address the serious and complex frontend issues they were
facing, which seemed _much_ more onerous to be honest. their server rendering
seems to be pretty lean though, it looks like dom processing and client
rendering make up easily 80% of their 3.6s pageload time.

