Hacker News new | past | comments | ask | show | jobs | submit login
NYTimes Opensources Their Deep Linking JS (open.blogs.nytimes.com)
169 points by joeybaker on Jan 11, 2011 | hide | past | web | favorite | 50 comments

I'm surprised to find this posted here. I developed this and given the community here I'd appreciate any feedback for future iterations...

Many thanks! This is very nifty.

Since you asked…

1) Change tracking. Technical hurdles (and they're huge) aside, it would be great not to have these links break when text is added/removed.

2) Better onsite instructions. I knew how it was supposed to work, and it took me a bit to figure out that clicking a sentence highlighted it. Perhaps a tooltip?

3) A wordpress plugin seems easy/obvious :)

Many, many thanks for the good work!

I would add a convenient way for this to interact with already existing hashed links that are used for navigation.

Rather than the #p[...],h[...] how about #p=...&h=...?

EDIT: Or even #deeplink=p[...],h[...]

Its a fair point, and there were a number of ways it could have been done.

Given that there could be a long number of paramaters included for Highlighting I felt the [] gave a sense of belonging.

I like the #deeplink suggestion too, but in the end being as concise as possible was a factor.

I think the deeplink suggestion is good. If I stumbled across these urls, I might just think it was some evil tracking code or something and strip it off, deeplink is meaningful and gives me a clue that it does something clever. Great work btw.

Nice library. It looks like a great candidate for one of our upcoming projects.

Awesome, they just invented a small part of Ted Nelson's Xanadu project. Say what you will about Xanadu being vapour-ware but at least Mr Nelson designed it properly to handle problems like deep-linking and track-backs which the Web has to employ workarounds for.

Speaking of NYTimes, Safari user stylesheet has a line to disable the annoying word definition popup when selecting text.

.nytd_selection_button { display:none; }

That's why NYT is never getting on my NoScript whitelist. I have a nervous habit of highlighting random text while I read.

Me too, and I would like to know if anyone has ever studied the trend so I can better develop my application for these users.

Wikipedia claims it does not exist:


On Github, they say they'll eventually remove the dependency on PrototypeJS. The library is only ~10k now, hopefully that change won't increase the size too much.

You know what I would love to have the HTML5 guys add to the spec? Some way of keeping libraries like PrototypeJS and jQuery in the browser cache at all times, so that pages could just use them without worrying about the size.

Perhaps some kind of alternate src attribute on script tags, so you could list a local copy (for reliability) as well as Google and Microsoft CDN URLs, and the browser would go with the first one it had in its cache. And if individual browsers wanted to distribute jQuery (et al.) along with the browser itself, and define an URN for it, so much the better.

This is just something I thought up in the past few minutes, so take it with a grain of salt, but as a web developer, I would love this.

Wouldn't you be able to do this with the cache manifest? http://diveintohtml5.org/offline.html

Why isn't it sufficient to use the Google-hosted jQuery? It's likely to be cached.

1. a boatload of page view info you are sending off to google

2. it's another dependency that you don't control

To your first point, is this a critique of the speed (pushing info up to Google) or to Google sucking in yet more information? If it's the latter I'm already in trouble because almost everything I do uses Google Analytics, but I can see the point if you're doing something else.

WRT your second point, there's a middle ground of pointing to Google's hosted version for speed and falling back to a local copy if it is not found.

You can see the technique in use within the Boilerplate HTML5 template - http://html5boilerplate.com/ - Scroll down to the index.html file, line 58.

1) They set far in future expires header, so they should see very very little of your page view info.

In HTML5 you can do this by keeping a script in LocalStorage, then executing it on-demand. It could also have a fallback mechanism for network loading (and lazy initialization).

The implementation of this is left as an exercise for the reader.

The only reason I didn't rip out PrototypeJS was for the CSS selectors and the add/remove classnames. That's what remain that will take time and hopefully not for much longer.

It looks like they are only using a small portion of it -- selectors in init and a few event handlers. They could use one of the many pre-existing selector engines and write a simple cross-browser addEvent function

That is the plan. The CSS selectors are only required (for now) as NYT has some specific criteria to cover the markup in various Article and Blg Post pages.

For most cases I imagine 'querySelectorAll' or 'getElementsByTagName' would suffice

Grats donohoe, well done. What is of particular interest to me here is the use of the Levenshtein distance algorithm. The reason this works well here is because you are comparing your supplied key against a constrained set. Applying the Levenshtein distance algorithm (or its variants) against a constrained set of small size in this fashion has virtually no performance impact as the time to complete is entirely based on the size of the set you are matching against. On the other hand, matching against a set of millions of records does get costly.

I think v2 is trying to solve a problem it shouldn't have to solve. If the NYT made available previous (published!) revisions of their content, there would be no need to assign paragraphs with special IDs or the like. You'd simply say, "get me p2 for the story as it was on Jan 11 15:33". When you link to a specific piece of content, it's in the hopes that people will read or see the same thing you saw when you made the link. You don't want to be talking about different things, so revision-awareness would actually make more sense overall.


But is that going to happen? No. Why? Many reasons come from an editorial perspective (which I'm not knowledgeable enough to get into - but a simple one is: corrections), however the big one is also technical:

Providing previous revisions is not trivial feature. It would be a huge effort. There is no way to justify it from the perspective of deep-linking.

Except that would be another hack for the Web. As I said in another comment here, Ted Nelson's Xanadu project already figured this stuff out, we just have to employ some hacks on the Web to make it work ;/

Pretty good. Did they do an analysis of the NYT archives to ensure that their First Three Words Last Three Words technique is "good enough"?

Yes. Also the 12+ page long articles you'd see in the Magazine section from time to time proved very helpful in this.

I played around with this and got it so you can both highlight and have it move the view to the paragraph of choice. Seems pretty interesting.

An issue with it though, when sending the link to someone, and then they click it, browsers (Chrome here) turn the #h[TArTWw,1] into #h%5BTArTWw,1%5D which then seems to be ignored by the script.

I noticed that just now too. I'll investigate and see if I can get a fix in the next update.

Fixed it for now.

I made something similar to this NYTimes.com feature, a few days ago: https://github.com/alexbarredo/insidelink

It's simpler, of course. It's in plain JS and as jQuery plugin

It looks like your version uses a numeric value as determined by the order of the P tag. This is what I had before.

The second version generates a Key for the paragraph so it can be moved anywhere in the page, and survive slight modifications if the text changes too.

Nice work though.

Enterprise JS is 400 lines of code... 0 lines of tests. Kudos, NYT.

I'm suspicious of anyone who spends their time critiquing others' work so harshly instead of innovating on their own.


Hardly. The attitude you're demonstrating is the real anathema. You're just confusing people recoiling from your abrasive disposition with disagreement over the importance of tests (as others have done in this thread).

> Enterprise JS is 400 lines of code... 0 lines of tests. Kudos, NYT.

See how your comment carries across a real difficulty working with other people? This would have said the same thing, albeit without the slam to the author:

> This project has no tests. Maybe I'll fork it and add some, in order to make it more robust.

Another example from your history:

> adding position:relative without knowing what it does... great advice

Someone made a good point here which is actually grounded in reality, and you responded with a smartass remark which might have discouraged him from contributing in the future. One of the guidelines for Hacker News is that you shouldn't write a comment that you wouldn't say to someone's face. If you go around quipping like that to peoples' faces, I pity your acquaintances.

You could have worded it this way:

> That isn't what position: relative is meant for. There is another way to accomplish that: <blah>

Just be positive to your fellow human being. It's not fucking difficult. That's why you're getting downvoted.

> I'm suspicious of anyone who spends their time critiquing others' work so harshly instead of innovating on their own.

How in any way is that 'positive'? Also, read the OP's original post. Is it really that offensive to you? I can almost see it as some poke of a joke--humor. Something that seems to escape the boring minds of those who long to be someone special, DEFENDING THE INTERNET! Stop being a superhero, read the comment and move on. There is no reason for you to regulate anyone's internet experience.

I'm not trying to take sides, but from an unbiased point of view, you argument is invalid.

Who said anything about the Internet?

I'm suspicious of anyone who obviously doesn't think innovation is innovation without making sure your innovation actually works (with tests).

Since you and goldenthunder completely missed this, allow me to clarify that OP could have been critiquing the project's choice of variable names and I would have said the exact same thing.

My statement wasn't about tests in the slightest.

You clearly don't understand the importance of TDD like many others (count the downvotes).

If you cannot stay with the curve because of your lack of knowledge, maybe you should spend less time flaming on Hackernews and more time studying up, sir.

I'm not seeing how my statement has anything to do with TDD. OP was very harsh in his critique, whether it be valid or not (which I never said anything about), and that's all I was making a point about.

Going from there to my supposed 'lack of knowledge' is a poor representation of yourself, too.

Downvote this if you hate writing unit tests for code you release publicly.


It's because you are adding nothing to the discussion.

I don't understand what's so awesome about this? Browsers have supported deep linking since forever. You can link to any specific id on a page and the browser will scroll to it when you open the page.

For example, I could have every paragraph an id say id="p5" and then link ad example.com/story#p5 and voila, deep linking.

Hoorah for reinventing the wheel :)

RTFM? :)

The difference is no 'href' tags. The 'tag' is automatically created based on the words in the paragraph, via Javascript, and decoded appropriately.

It is also slightly neat in that you can highlight a specific sentence (multiple sentences actually, see the little tutorial at the bottom).

I actually kind of like it, it would be a neat way to really highlight what you think is interesting in an article when sending someone a link. But doing it as a per site thing is crazy.. seems like it could be a good browser extension though. People who have it installed would instantly get a more functional linking experience. Imagine linking off to some documentation in a blog post for a coding problem, say to a Django documentation page, and when someone clicks the link they are not only taken to the specific part of the page that you are talking about, but the relevant stuff is actually highlighted. That'd be Neat (tm).

I thought about a plugin but then you are maintaining several and handling issues from readers who hits walls installed or uninstalling it...

My hope tis that this approach is equally unhelpful to everyone :)

Seriously, my hope is that if an approach like this is going to happen that we can keep the usage (syntax) consistent.

Further down the road I'd like the view to also show you what people in your network have highlighted, or on a more aggregate and subtle level, what everyone has...

That reminds me of http://www.tynt.com/

Highlighting is neat; but concerning deep linking, the parent has a point: since the NYT controls the source, why not simply generate anchors for each paragraph?

The upside would be that there would be much less work needed to find edited paragraphs: a paragraph would be identified by its anchor and it could be changed completely, as long as the anchor is still there the link's fine.

This approach has certainly been envisioned: it would be interesting to know why it was put aside?

It seems like an easy solution - and it should be but the reality is that it is not.

There would have been deep-linking on the site years ago if it were not such a big undertaking. It had been on my own wish-list for many years but nothing something I decided to do my self actively until I met Kellan at OSCON in 2006(?) and he brought it up too.

Since then, a colleague of mine, Eitan and I started digging into the CMS side, as well as looking at some highly optimized code that outputs the article body. A combination of development time, risk, resources, and testing didn't justify the result - especially since we were trying this on our own time.

There more to it than that. But thats the main point.

The problem with this approach is that as the paragraph is moved around, the linear id's become stale. Say the article is edited to move content, your deep link may break, still right page of content, but wrong deep link to content.

This addresses this (as stated in the article) without some elaborate server content management solution that tracks paragraphs and their html id links.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact