
Show HN: SEO for JavaScript, HTML5 and Single Page Applications - CMCDragonkai
https://snapsearch.io/
======
programminggeek
Am I crazy for thinking that SEO is a huge reason to NOT do a Single Page App
for a public facing website that you want Google to index?

It seems like the wrong business choice from a marketing perspective that we
are now looking for a tech solution to solve a problem we created ourselves.
Single Page Apps feel like building Flash web apps 5 years ago. It's a valid
choice, but with clear drawbacks.

~~~
integraton
You are not crazy. The single page app trend is solidly in the "peak of
inflated expectations" phase of the hype cycle (
[http://en.wikipedia.org/wiki/Hype_cycle](http://en.wikipedia.org/wiki/Hype_cycle)
), which is why there exists a subculture of developers who are trying to use
the approach for everything, even where it isn't appropriate.

~~~
CMCDragonkai
I think the single page app trend is coinciding with the growth of javascript,
HTML5. It shows that the websites of the future will be more and more like web
applications, and they need dynamic UI. The thing that's stopping us the SEO
problem, which I hope is no longer problem anymore with snapsearch.

~~~
hippee-lee
Why not just render the public pages of your app out to HTML or use one of the
phantom JS services to do it. Then when spiders come up the waterspout have
your web server tell them where to go for your sites links.

If you don't ant to do it yourself there are quite a few companies that can
help: [http://scotch.io/tutorials/javascript/angularjs-seo-with-
pre...](http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io)

~~~
CMCDragonkai
There's some important technical and operational differences from SnapSearch
and using phantomjs, I answered this in a comment below:
[https://news.ycombinator.com/item?id=7765731](https://news.ycombinator.com/item?id=7765731)

------
philbo
If you're building JavaScript-heavy/single-page applications, you should be
applying the principle of progressive enhancement anyway, there is no reason
not to do so.

Progressive enhancement brings a number of benefits, of which friendliness to
search engines is just one. User has JS turned off? Page still works. Request
to CDN fails? Page still works. Silly lint error slips through to production,
breaking all your JS? Page still works. Weird cross-browser issue breaks your
JS in some clients? Page still works.

The most frequent arguments against progressive enhancement, which has been
prevalent amongst a number of back-end and product people I've worked with, is
that it is a waste of time and that it adds too much extra effort. I disagree
with both. It is not always necessary to reach full feature parity between
your JS and non-JS implementations, for instance an auto-suggest input in your
JS code can happily be a vanilla input in the non-JS. But if you do just
enough to ensure that your application is functional without JavaScript
enabled, my (subjective, biased) experience has been that it doesn't take too
long.

~~~
Kiro
Easier said than done if you're using something like Angular or Ember.

~~~
mercer
Don't you generally use those for web apps where SEO and search engine
indexation is less important? I'm not very experienced in this area, so I find
it hard to think of an example where you need indexation/SEO-optimization for
the kinds of things you do with Angular or Ember.

I'm very curious to hear of some examples.

~~~
CMCDragonkai
Content sites should be able to take advantage of javascript technologies more
fully. It's all about user experience in the end. There's all sorts of new
ways of navigating or presenting content that isn't just a static document
after static document. In these cases, SEO is a requirement.

~~~
mercer
Ah, good point.

And come to think of it, I'm working on a basic blog and I'm using javascript
(react.js, specifically) to make the client-side experiences smoother... I had
completely forgotten about that.

Thanks for the insight.

------
nnq
Am I crazy thinking that we as web developers should NOT HAVE TO have to care
about this? It's Google's and Bing's and Yahoo's problem to go build decent
spiders that run multiple versions of the most recent web browsers inside
them, and they _do have_ the computing power that's needed.

All spiders/crawlers should see webpages just as human users see them that's
the point! They should do whatever it takes to get this working, regardless
the effort, because this is _why they get paid (indirectly, though adds, but
still...)._ And this is still not enough: it's any respectable search engine's
duty to try and develop near-human-level AIs to be able to also extract
meaningful information from visual structure, from pictures, and from hard to
understand prose. And longer term, it will be their duty to develop above-
human-level AIs to be even better at this than humans so they can provide even
better search results.

Since when are we, web developers, expected to do the search engines' work for
them?!

I expect this is just a transition phase and things will get on the right
track soon.

(Note: and no, I'm not _against_ helping search engines. Providing semantic
markup, microformats and all this is great. But there is a difference between
_helping them_ and _doing the work for them_ , on our dime!)

EDIT+: @OP: I'm not critical, what you and prerender.io are doing is awesome
and immensely helpful for web developers! But you do realize that you're doing
what search engines should be doing themselves, right? And when they will wake
up and realize they should get into the game, they will have basically put you
out of business... or buy your business, which I think is what you expect :)
Anyway, good luck with it!

~~~
CMCDragonkai
Hey I understand where you're coming from. I had the same impression before I
began on SnapSearch. But right now SPAs are still a minority of the websites
in the world, but eventually everybody will start making web applications.

But do note, as I said in the other comments, it's not just search engines.
There'll be a day when curlling sites will require a JS VM. There's a big
difference between a standard curl vs a full blown chrome/firefox level VM.
Considering the amount of sites google crawls per second, their computing
costs would explode compared to what they're doing now.

And this doesn't even address all the social network bots, and pretty much all
the other bots doing stuff other than search indexing.

So I think we're still got quite some time before we're redundant.

At any case, the question for you is, are you going to wait for Google to
implement this before you can start using single page application technologies
which can give you an edge when it comes to customer user experience?

~~~
tracker1
Why would curl need a full VM? Just curl the API/Service endpoints for the
data you want.. if you need more that's what PhantomJS is for.

------
mattlenz
Will Google eventually begin javascript-enabled scraping themselves?

~~~
Encosia
I don't know the specifics, but they already have been for years (at least to
some extent). Several of my clients have found content deep inside public-
facing client-side apps indexed in situations where we didn't care if it was
indexed, but assumed it wouldn't/couldn't be.

~~~
CMCDragonkai
Do you have any examples? Because I've been building single page apps for 2
years, and none of it is indexed. Also indexing for search is not the only
thing, sometimes clients need google adverts, which require the google media
bot check your site's content. One of my clients could not get any data out,
and the site could not get approved by Google for the adverts. But with
SnapSearch, the google media bot was able to extract the necessary
information.

Check your google webmaster tools, and you can force a scrape from google
bots, and you can see that SPA sites are not discoverable.

~~~
Encosia
I can't share the specific examples, unfortunately.

That's right that if you force a googlebot request from the webmaster tools,
it's essentially just how the basic googlebot would wget a page on your site.
We know they're doing more than that though, because it would be trivial to
hide cloaking and malware from Google if that's the only tool they used for
scraping.

What Google actually does for crawling content seems to be a bit more nuanced.
If you watch the server-side logs on a site that Google crawls often, you'll
sometimes see a pattern of googlebot crawling a page and then an immediate
subsequent request from a Google IP from a Chrome UA. I'm sure that's
partially just to assess malware/cloaking, but it's probably also related to
developments like this one:
[https://twitter.com/mattcutts/status/131425949597179904](https://twitter.com/mattcutts/status/131425949597179904)

They're definitely doing more than just the ?_escaped_fragment_ rigamarole
that almost no one uses.

Fair point about the AdSense crawler though. It seems to be just about as dumb
as a rock in my experience.

------
fidlefodl
I feel like applications and tooling to pre-render your HTML (eg: Reactjs) are
more interesting. It's definitely more work currently (when done right), but
it seems like the most natural flow. Always render html, let clientside link
it up, everybody wins.

------
ant_sz
I think this should be more a problem for google than for web developer. web
developer have the right to choose new technique they like, it's google's
responsibility to scratch your content well.

But now things are reversed, It's kind of monopoly, isn't it?

~~~
erichurkman
Developers have always been free to choose techniques. Search engines owe you,
the developer, nothing. If Google et al can't scrape your content, they'll
scrape a competitor who better serves parse-able content. If you want to play
ball with a particular search engine, you have to serve content that they can
parse at the time.

------
andrenotgiant
When you scrape the HTML from a single page app (let's say the default app
homepage) do you re-write the javascript links to be HTML links with unique
URLs?

If not, how does Google find anything more than the app homepage?

~~~
albemuth
Sitemap works

~~~
CMCDragonkai
Yep a sitemap can also work. But it's not needed, since Google will follow
your anchored links anyway. SnapSearch in the future can also create sitemaps
dynamically.

------
timdorr
I think the logo could use a little work. The lightning bolt doesn't come off
as an S, so it looks more like "napSearch".

~~~
CMCDragonkai
Thanks for the feedback :). I'll look into it. Did you know the bolt is also a
tiger stripe? That's why I got the tiger theme.

------
CMCDragonkai
Hey HN, Roger here, happy to answer any questions.

~~~
ilaksh
This seems exactly like prerender.io but with more code to insert (harder to
use) and more expensive..

~~~
CMCDragonkai
I calculated the average price per usage versus Prerender.io for average
sites. And it comes out to be cheaper or equivalent to the standard price of
Prerender.io.

Basically take the amount of usages you use of Prerender.io per month, and
compare it yourself.

What do you mean by more code to insert? I do know of Prerender.io but more
code depends on which framework/language you're using. If it's node. It's
pretty much one line.

The extra code is just options that allow you to customise the way the
interceptor works, these options include things like blacklist, whitelist,
regex, extensions ignoring, client side caching.. etc. But the basic usage is
the just key + email, and you're good to go.

Also note that unlike Prerender, our cache storage is free, so we don't track
how many pages you have.

~~~
ilaksh
I see three lines of code whereas prerender.io is one line. I guess that's
because they don't need you to configure the client with the API key or email.

That's not really a huge difference, although it is still a little bit more
effort.

It really comes down to price I think.

I can't really tell for sure what calculation to use to compare the pricing.
Looks like you might be cheaper.

You might consider doing a blog or something comparing the pricing though
because I think other people know about prerender also and having a simple
explanation that proves your pricing is better would make it easier for people
to make that decision.

~~~
CMCDragonkai
Ah, well Prerender does use a single file, which means they haven't broken
down their middleware into 3 components. All SnapSearch middleware is broken
down (OOP style) into Detector, Client and Interceptor.

I recognised, that this OOP strictness leads to a bit more verbosity.

Therefore, for PHP I build a Stack interceptor. For Ruby, there's a Rack
alternative. NodeJS has the expressConnector. Python doesn't have a single
thing that brings them together atm.

These integration points, make the entire thing a one liner to integrate.

If you're referring to node, here's the connectInterceptor. Usage is on the
README: [https://github.com/SnapSearch/SnapSearch-Client-
Node/blob/ma...](https://github.com/SnapSearch/SnapSearch-Client-
Node/blob/master/src/connectInterceptor.js)

Also sure, I'm going to get a blog up soon. Thanks for the advice. I'll do one
on comparing pricing.

But do note, that the fact that we use Firefox instead of QTWebkit means that
we get 6 week release cycles so our scrapers keep up with the latest
development HTML5 unlike the slower QTWebkit of PhantomJS. So we do have a
technical advantage :)

------
puppetmaster3
Dear N00bs, SPA works great w/ SEO.

Hashtags! (ex:
[https://github.com/puppetMaster3/ModuleBU/blob/master/latest...](https://github.com/puppetMaster3/ModuleBU/blob/master/latestCDN/lib/ModuleBU.js)
in AppBu section, github has examples).

Just because twitter or Facebook programers don't know how to do it, but then
they do hire 0 experience out of school.

Cheers.

~~~
puppetmaster3
Down votes?

It upsets you that I show you how to SEO w/ SPA?

Says a bit about you I think.

~~~
puppetmaster3
Dear downvoter: [http://theunboundedspirit.com/wp-
content/uploads/2014/05/tru...](http://theunboundedspirit.com/wp-
content/uploads/2014/05/truth-quote-nietzsche.jpg)

