Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Open-source script to get your site indexed on Google (github.com/goenning)
131 points by goenning 10 months ago | hide | past | favorite | 62 comments



This script abuses the Indexing API which is intended for job posting and other specific purposes. https://developers.google.com/search/apis/indexing-api/v3/qu...

Use at your own risk.


Might simply not work for your website:

> Currently, the Indexing API can only be used to crawl pages with either JobPosting or BroadcastEvent embedded in a VideoObject.

I wanted to highlight (in addition to your statement) that JobPosting is a specific type of structured data.

If the target site doesn't have these elements, it may or may not work... or it may work for now, but not once they realized it's being used incorrectly

JobPosting structured data: https://developers.google.com/search/docs/appearance/structu...


> If you dont have content that falls into those categories, the API won't help you there.

https://www.youtube.com/watch?v=kvYb2bdtT7A&t=422s

IME it will silently drop and ignore anything that is not relevant.


The annoying thing about this is that it will ruin this feature for everyone else. I, and many others, use this for requesting to index time sensitive content.


Yes and no. I mean, just because something gets indexed doesn't mean Google values it and is willing to expose its customers to it.

The consistent problem with SEO is that most SEOs don't understand Google's business model. They don't understand Google is going to best serve its customers (i.e., those doing the search). SEOs (and their clients) need to understand that getting Google to index a turd isn't going to change the fact that the content and the experience i'ts wrapped in is still a turd. Google is not interested in pointing its customers to turds.


For a company not interested in pointing their customers to turds, they sure do point them to a lot of turds.


That's not what it wants to do. Yes, that is what's happening, for a number of reasons. Without people searching, there are no eye-balls. Put another way, the sites being indexed and ranked are *not* the customer(s).


> Google is not interested in pointing its customers to turds.

We must have been using a different Google over the past 3 years. It does this almost exclusively now.


Google is not interested in pointing its customers to a turd that hasn't paid for that right.


I'm not sure I agree that the people doing the search are the customers here


I’m really curious to know who exactly you think are Google Search’s customers in the context of this thread about SEO.


Google Search's customers are companies who advertise on the SERPs.

Google Search's users are people who do the searching.

If a service is free to use, you are the product.


People paying ads to show up on google's users search results are google's customers

people using google's free services to see those google results that have gone to shit since the past 10 years or so, and also are full page of ads without ad blockers, are google's product, which is acquired through offering free product and services that hook those customers to stay hooked even through the enshittification that has proceeded on those since the web 2.0 golden era


The advertisers


Advertisers?


but think logically for just a second, why would advertisers advertise their turd if they could just have their turd show up on the search results for free?

For a turd sandwich to work, you have to wrap your turd (ads) with high quality results so people actually search on Google and then you can show them the turd along with the good stuff.


plenty of turds do show up on google search engines, ai generated copy pasted mumbo jumbo full of more google ads inpage

somewhat google is happy to serve you turd they can double dip on


SEO died many years ago, but some companies are still trying to sell their naive clients some magical "SEO optimisation". Which is plainly scam at this point.


There are a ton of SEO optimizations that are extremely significant:

* performance/SSR * interlinking/dead links * keyword cannibalization

to name a few


Definitely. In general, most parts of technical SEO remain important (one h1 tag, etc)


Is there a trustworthy guide on this?


Eyeballs are not Google's customers, paying advertisers are Google's customers.

If a paying customer gives Google money to point eyeballs to turds, it points eyeballs to turds (this is how Google makes money today, it is the business model for search). The problem with SEO isn't that it degrades search, it's that SEO users aren't paying customers and don't make Google any money (and compromises Google's ability to direct eyeballs to paying customers).

This is classical "enshitification" - offer a service for free to capture eyeball share, then offer a paid service to companies that capitalizes on that eyeball share but compromises the "eyeball experience" (and then in the endgame, squeeze companies that become dependent upon the eyeball-platform to serve shareholders).


I wrote this a month ago or so. What I do in my specific case I keep a MD5 hash so it's technically not abuse. It lets google know when something has been updated. Supports Bing as well.

https://github.com/niemal/seo-auto-index


Thank you for sharing. Can you quickly explain why keeping an MD5 avoids the abuse issue?


You don't post the same URL twice with the same content therefore you maintain the logic of the API itself, so there's no spam or abuse of any kind.


Another easy way is to just tweet it, which works for me - they usually get indexed < 1 hour later. Google has access to tweets and the URLs in those tweets.


Paid API access?


Probably some sort of crawling. When Twitter became shortly logged-in only, google could not display tweets in search anymore.


No Paid.. just simply tweet it from a regular twitter account


What happend to the good'ol sitemap.xml?

You'll probably find an npm package with lots of dependencies that'll generate that sitemap for you if that's what you need...


I’m failing to see how this isn’t just “hey look at my sitemap”!


Per Google's docs:

> For websites with many short-lived pages like job postings or livestream videos, we recommend using the Indexing API instead of sitemaps because the Indexing API prompts Googlebot to crawl your pages sooner than updating the sitemap


Is this any different from logging into the Google search console and submitting your sitemap/index request?


I submitted 1,900 pages in September and it has yet to look at 600. It did 4 this month.

I wish I had been more picky with my sitemap but I thought including all URLs was the goal. I at least properly weighted them but that doesn't seem to do much.


If the organic results are too good someone might click on one of them instead of a sponsored link or advert. De-indexing/non-indexing helps to avoid that outcome.


Same here.


A little reading would get you the answer: https://seogets.com/blog/google-indexing-script


The info on 24 to 48 hours waiting time is wrong. I just submitted a few new pages (not yet indexed). They got indexed in less than 4 hours.


Submitting manually takes sooooo long.


same result from what I've seen, but not scalable for larger amount of pages


Is there something for the opposite? I don't want google (or any other scrapper) to index my website. Afaik, robots.txt is not authoritative.


    sudo iptables -A INPUT -p tcp --dport 80 -j DROP
    sudo iptables -A INPUT -p tcp --dport 443 -j DROP
    sudo ip6tables -A INPUT -p tcp --dport 80 -j DROP
    sudo ip6tables -A INPUT -p tcp --dport 443 -j DROP
That should do.


From Indexing API documentation:

> Currently, the Indexing API can only be used to crawl pages with either `JobPosting` or `BroadcastEvent` embedded in a `VideoObject`.

So this might come with the risk of seeing the site you want to boost rather penalized by Google.


This is not a boost, Index != Ranking


Either way, this is clearly abusing the API by using it for things that it wasn't intended for.

The only outcome I can see from this is a) contributing to the rise of spam and b) harming people managing sites and apps for which this API is vital.


I recently launched a mini project and was shocked at how difficult and long it took to get any of its pages properly indexed on Google.

It's almost as if Google is actively trying -not- to index anything as a way to reduce spam, by forcing the people who really care to jump through 100 hoops.

A great way for the dark web remains dark.


It just takes time. Getting people to link to it by sharing it in other channels will help to shorten the timeframe.


It just takes time


How long is long in your case?


I’ve seen a lot of indie startups lately that are basically selling faster google indexing then you can get for free using google search console. I guess they are probably using this feature under the hood.


I just submit a sitemap URL to Google Search Console Tools. Is this any different?


I've seen some people even wrapping and re-selling this as SaaS.


"to get your site indexed" => a nonsense claim

+ this technique might make engines aware of your content, but doesn't guarantee indexation whatsoever.


or just submit a sitemap.xml via google console.


? "what I've noticed"...Google only indexing if a site has backlinks or is submitted by owner. Uh..yeah, how else would google know about a new URL? C'mon. This just seems like the usual SEO obsession/grift with some 'secret' way to get things done. It's straightfwd these days. Are you saying none of the pages you're queuing up are linked to each other? Most cases they would be in some way right? So the spider will start indexing them all based on a top url submission or a few key urls. Do event/job board sites really need all of their pages to be indexed immediately?


So, Google stopped automating indexation because spam, humanity finds new way to resume automation to again propagate spam. It seems Google is trapped in its toxic game of search engine optimization.


Google no longer finds new sites automatically? That might explain why it's been so trash the past few years.

I remember running a few websites back in the day, and with zero interaction with google all of the pages showed up in the search index a day or two after publishing at most.



lol


The only possible outcome if for them to shutdown this API or make it work as documented. There's already at least 10+ SaaS offering this as service.


Eh, you can get google to index your site by just submitting a site map, it just takes a little longer.


If "a little longer" is six months+, you're absolutely right.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: