Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Page Replica – Tool for Web Scraping, Prerendering, and SEO Boost (github.com/html5-ninja)
135 points by nirvanist 9 months ago | hide | past | favorite | 54 comments



It seems in the AI era SEO is now starting to become irrelevant and a relic of the 2000-2020 era.

Why is SEO needed still needed here when AI / LLMs can just conjure up answers with references to valid links, bypassing search engines.

Even privacy based search engines like DuckDuckGo, Brave and Kagi doesn't prioritise 'SEO'.



>Why is SEO needed still needed here when AI / LLMs can just conjure up answers with references to valid links, bypassing search engines.

In short: money. LLMs will no doubt change the implementation, but the commercial dynamics are fundamentally the same. It's expensive to build and run a search engine, whether conventional or LLM-based. Someone has to pay for that - and it's not search users. Advertising and its derivatives have become that revenue source, with all the good and bad that brings with it. As long as that commercial dynamic remains, there'll be SEO or some derivative thereof.

--

Other than Kagi - but that's a tiny niche.


Long term you may be right.

SEO isn’t dead but it will slowly die off. Being a reference link is second place but by that point you only get visited if the AI wasn’t trusted or didn’t solve the problem.

Therefore I think viral/word of month or links from other engaged sources will become more relevant.

Right now though why lose out on free SEO traffic just because you used JS to render most of your site?


How will AI tools know which sources are "valid"? It's likely SEO will transform into ways of tricking bots scraping training data into considering their information as being more "valid".

Alternative search engines must rely on AI themselves to filter out good results, or some form of manual curation by humans, like Kagi's boost/block/pin feature.


Because AI doesn't provide accurate information and you need to validate it yourself? Has anyone who cared about SEO stopped recently?


I think we must distinguish between onpage and offpage seo. This proposal is only relevant for onpage seo, for which i would mostly agree with your comment. However, inbound links are and will be the most important signal for search. What else would be left for ranking?


I still use Google (and ddg and kagi), so people who want to sell me stuff try to get better rankings in these search machines. I'd also wager that people who primarily use LLMs to answer their questions are only a rounding error.


In the AI era, which services provide up to date information about local queries — like which dentists near me are open today?


Any reasonable AI product that does this would use RAG under the hood.

I hope nobody is using ChatGPT to query for information. This is how you get hallucinations.


What is the reasonable AI product that actually does this?


yep AI will not return a listing.


SEO has been “dead” since the late 90s.


Google’s PageRank is why SEO became a thing, so I respectfully disagree with your timeline.


SEO was a thing in the 90s, I know people who were doing it then.


But there's still people doing it today.


Yes, that’s the point of my original comment.


https://github.com/html5-ninja/page-replica/blob/main/api.js...

This code base has the most useful comments ever, are these normally accepted? Enforced? Adding stuff that has no value, but needs to be mantained and updated when code changes without the ability for it to be validated by compilers or parsers?


These comments look a lot like GPT to me


I tend to write that kind of comments in my code, too. That's part of my thinking process, and I'm not an LLM.


Copilot


Js doc yep generated by copilot


Curious, how would caching the pages and serving over NGINX help with SEO? Is there any benefits over serving a static site?


Yes, Google can render your content handled by JavaScript, but Googlebot allows only a few milliseconds for rendering. If your page isn't rendered within that time frame, it may be penalized.

For this reason, many news and broadcasting media outlets still use prerendering services. I speak from experience as I worked in a large Canadian media company.

Another important factor to consider is that the SEO world isn't limited to Google. Various bots, including those from other search engines and platforms like Facebook, require correctly rendered pages for optimal sharing and visibility.

Lastly, the choice between client-side rendering (CSR) and server-side rendering (SSR) depends on your specific needs. Google Search Console provides valuable metrics and information about your app, so it might be worth considering SSR if that better aligns with your requirements.


A "static site" implies HTML rather than a JavaScript app.

With respect to JavaScript apps (React, Angular, etc.):

It's not clear these days because the major search engines don't explicitly clarify whether they parse JavaScript apps (or if they only parse high-ranking JS apps/sites. But 10 years ago it was a must-have to be indexed.

One theory on pre-rendering is it reduces cost for the crawlers since they don't need to spend 1-3s of CPU time pre-rendering your site. And by reducing costs, it may increase chances of being indexed or higher rank.

My hunch is that long-term, pre-rendering is not necessary for getting indexed. But it is typically still necessary for URL unfurls (link previews) for various social media and chat apps.

disclosure: I operate https://headless-render-api.com


absolutly , bots still need to access content as faster as possible


If your web app serves dynamic routes (i.e. client only) this helps with SEO because those routes are now directly visible through most crawlers.


routes yes but maybe not content, most of the time google search console will show that you have no content , also google ads boot need to access to content this useful to get your adsense account quickly validated


What's the use case? Scrape someone else's dynamic site and serve it statically as your own?


I read it as being able to dev your site in whatever you want, then scrape it and publish it as a static and seo optimized site.


That sounds sort like an inefficient static site generator


It sounds like a hack, which is after all what this website is dedicated to.


Sounds like an inside-out NextJS!

What you can do with this is design your app as an SPA and have this give a quicker loading experience to any route that is “logged out” so to speak.

The problem is in reality you are logged in and what NextJS can do is allow you to define the subset of the page that can be static.


for me is t not the purpose, but you can do it if you want.

the use case for me was that meteorjs app are pourly SEO friendly and I need it to have prerindering html to serve it for bots


I have done something similar when archiving a dynamic site, serving it as static snapshot for free.


yep that s the use case for me too


Curious


and reddit too , thank you for your feedback


Cool! I thought about this idea as an IaaS! A CDN that can figure out your rendered page and serve it. Regardless of tech stack.


Thx , and this is the idea


I wonder where this is actually needed, since most React frameworks support metadata with server-side rendering.


thank your for the comment while it's true that not all web applications leverage the React library, it's important to note that Next.js inherently supports React. However, the choice of technology stack depends on the specific use case and requirements of your project.


It's not about React specifically, but about whether pre-rendering will make any difference from an SEO perspective.

Not only do most frameworks do SSR, but Google is able to crawl dynamic content just fine. Here's an article from 2015 on the topic: https://searchengineland.com/tested-googlebot-crawls-javascr...


Companies still struggle with seo. I've worked at two tech companies so far, and both utilized third-party services for ssr, the reason is that bot allow a few second to render your html if it s not availbale during that time frame, you can use nextjs it s a good alternative. In any case, it remains useful for my own projects and it s free, maybe can useful for other dev


Absolutely. We have a NextJS website and we faced some SEO problems due to Google bot not being able to get to some pages. To be fair, we used too much sliders.


Google renders all sites in Chrome according to their documentation. So why would you render it? AFAIK they even penalize serving different content to their crawlers.


sure , but you want aslo your web app available for other bots, and you are not serving other content it same content just you avoid the js layer so it will be quickly serverd, so far for my projects it worked very well ,google serach console showed a better ranking , it s not a new technique and other services like prerender.io do the same things but not free.


[flagged]


> In places where JS is truly useful, that is when the UI is more than a text document, SEO is not a concern or possibility.

I don't think you're thinking about this. e-commerce is more than just a document and SEO is massively important there but javascript makes the user experience miles better; Think product variation selectors, bulk pricing selectors, product filtering, realtime cart, etc, etc. It's insane to say we shouldn't use new tech so that the search engines can index us, do we just forever stick with what we had 15 years ago and never progress? Madness.


I should have been clearer: I'm not against JS, I'm huge fan actually but I don't see the promised lands of better user experience with having these overly complex architectures where the UI is JS heavy and has to be pre-rendered on the server side so that the Search engines can make a sense of it. Larger and larger, heavier and heavier JS never delivered those perfected UI experiences that Web technologist promised.

The Web technology folks still circle around the same problems like 10 years ago when the actually popular and successful websites like HN and others deliver superb UX with the dinosaur tech.

Anything that is actually using the advanced Web technologies are not in the domain of SEO, stuff like true web apps like Figma, Google Docs let's say. At best, you can index list of content for those but that list could have been a JS-free HTML rendered by PHP.


even with php you need to use caching, the deal it s to provide bot quickly with content :)


Sure, my complaint wasn't targeting your project but the need for it in first place. In the state of the web, this is a nice tool to do some useful things.


A lot of people on HN are very against JavaScript and I often get the feeling they’re academics or don’t work in the part of IT with web software. The reality is huge swathes of the web benefit from it whether people like it or not, this project is just one example.


As I'm the subject here, no I'm not an academic but used to work with Web technologies. Quit working with web technologies at the time when a new JS framework was hitting the home page with a promise to make everything better. I quit because I lost my productivity because I got carried away with all this JS stuff and find myself in infinite complexity eating away my time and energy for less than marginal gains on the final product. It was extremely cool but utterly useless.

Now that I'm at the receiving end of the Web based products, i assure you the experience is horrific. I don't even use search engines as much since the AI went mainstream and can't be happier.

Good riddance.

Of course I do enjoy things made with the advanced Web technologies like Wasm, Figma for example. I'm every now and then impressed by some web app that does something I thought not possible before.

However I'm still disgusted by documents that are some images and text loading tons of stuff that don't do anything beyond degrading my experience by slowing down the response time or act weird when interacting.


As I'm the subject here

That's not the subject of Show HN's, though, which is a great reason not to post generic fighting comments in people's Show HN's.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: