Hacker News new | more | comments | ask | show | jobs | submit login
SEO for Web Engineers (johnwdefeo.com)
207 points by midef 14 days ago | hide | past | web | favorite | 54 comments

The million dollar question is how to get good rankings for good keywords without having to spend too much money. This article does a good job describing how to keep a successful site optimized but actually getting the rankings is not covered much.

Never, ever, let marketers send newsletters and promotional e-mails from the same IPs that the websites are hosted on. A rogue employee who violates the CAN-SPAM act may result in the entire website being blacklisted.

But the problem is all someone needs to do is just get the IP for the website from the email link and blacklist that too, so a different IP will not help much. The best solution for blacklisting is cloud hosting which has many IPs , so the odds of all being blacklisted are slim. Amazon s3 servers have like 100+ ips

The key to getting good rankings for good keywords is really the same as it's always been: writing stuff people actually want to read/find useful and getting it referenced by sites and services with a large audience and a decent reputation.

That's why so many startups and agencies fail miserably with their blogs; because their content is third rate crap that no one really wants to read. They wonder why say, Moz or Sitepoint gets all the traffic, and you want to say "have you actually looked at your own work?" given that the latter two companies offer useful, actionable advice and they don't. It's like they expect the magic Google fairy to take their uninteresting work and hey presto, it suddenly appears on page 1 and get millions of views.

Okay, it's not quite that simple. Sites with a bigger marketing budget tend to do better, as do sites and companies willing to put a lot more time into getting content out on a regular basis. And yes, some people will get temporarily (or sometimes semi permanent) advantages through dodgy things like private blog networks and paid links.

But actually trying and putting in the effort/treating the SEO side as a full time job is really the minimum here, and many don't seem to even bother.

This. It's all about supply and demand, stuffing keywords to game the SERP is a short term tactic.

If you have content that people dig, Google will automatically send you more traffic.

If you have mediocre run of the mill startup content blog, you are shit out of luck because you are doing the same thing everyone else who failed is doing

Keyword stuffing - not any more :-)

I don’t understand, a mail server sends an email and it has an ip address. There’s no website associated with that. Sure, there’s an email address, and a load of rules about how to help avoid spam by setting up SPF and DKIM and DMARC to prove that the mail server is allowed to send on behalf of the domain. And if the specific IP address of the mail server ends up on a spam list, you can try to get it removed or start sending from another ip.

I don’t get what it means for the entire website to be blocked from an SEO perspective. Google don’t remove you from the index because an IP address ends up on a spam list and that’s also where you website is hosted, right?

Also, you’re only hosting on s3 if you’re hosting a totally static website, which very few people do.

You could potentially host an spa on s3 which hits an api. In that situation the spa would still be indexable. And I dont think google removes you from the index, but spam coming from the same ip seems to be a negative criteria.

Is it negative criteria though? It seems to me that with shared hosting the situation would be so common that it would seem to be big false positive metric. Granted, it’s a couple of years ago and things maybe have changed, but here’s Matt Cutts saying that it doesn’t (for that reason) https://youtu.be/4peSUa2FKvk

I think this is the danger of the SEO industry. On the one hand you have bad actors trying to game the system, but on the other, people selling you legitimate SEO are often claiming rules as fact with little or no evidence - because it’s different to verify.

That's what I was hoping what this link was about. I'd love to see a high level list of how to do keyword analytics and other things to have a better idea of what wording to use and try to rank better.

Another one is 'use the correct status codes for page requests'. Seen far too many sites returna 200 OK status for a non existent page, which may cause havoc with search engines indexing 404 messages and what not.

Oh, and make sure you actually use heading tags properly. God, so many sites and apps seem to be designed with no thought put into heading structure, with random h2s and h3s strewn about at random. Especially don't forget the h1 tag, and try and make sure it (and the title tag for the page) are both unique on every page.

Basic I know, but come on. Too many SPAs seem to be built with no thought put into even the basics of HTML structure or on page SEO.

A brisk but thorough checklist that fulfills the promise of the title.

My only objection would be this bit (below). Specifically, "demands." How many engineers are in a positiin to demand anything? The truth is, for the most part, it's not engineers who are ruining the internet with bloat and privacy infringements. It's designers and marketing. They're more likely to forget the end user and opt for so nasty excess.

> "Demand that someone take ownership of each tracking pixel and tag that's added to a page, then, make these stakeholders justify their tags every six months. If you don't, people will ask you to add junk to the page until your team gets blamed for a slow site."

If a designer or marketer is forgetting the end user then they don't deserve the title "designer" or "marketer".

And to counter your point, I often see sites built by engineers that ignore the end user by not employing any of the best practices the UX/UI and marketing communities have uncovered over the years.

re: "If a designer..."

No argument from me. There's plenty of title inflation, and turds wearing lipstick.

re: "And to counter..."

I have to push back. Unless there's a glaring technical omission, then whatever you see was likely not the engineer's decision. That all happened prior to construction. And even a technical faux pas might not be the engineer's. More than once I've stuck my neck out and said, "...that's not ideal..." and was overruled and marginalized.

A designer or marketer's focus isn't ux/ui nor should it be.

A marketer is brand building. Could ux/ui be part of it sure but as long as feeling x is associated with product y they have done there job.

A design's job is to express that feeling through design.

UX/UI could be part of it depending on the group targeted. But it's like a red tie it could help but isn't required for that important business meeting.

Depends on context. There are many different types of design. A branding or marketing designer needs to trigger emotions, yes. A product designer needs to make it easy for a user to accomplish a task.

My main objection is medium grey text on a white background.

"but it looks fine on my $4000 MacBook with my perfect 25-year old eyes"

It is funny because your comment is slightly grey (I assume because a trusted community member downvoted you).

> trusted 25-year-old community member


> Never, ever, let marketers send newsletters and promotional e-mails from the same IPs that the websites are hosted on. A rogue employee who violates the CAN-SPAM act may result in the entire website being blacklisted.

Can someone expand on that one? The blacklist would only apply to the IP address being sent from and not the domain?

Both a domain and an IP can get blacklisted, however it's typically IPs blacklists that you find yourself dealing with if you're sending emails from your webserver(s).

You can sidestep these issues by using something like Mailgun or Sendgrid as your delivery mechanism for server-generated emails (password resets, account registration confirmations, etc). And always set up your SPF records to include these services as permitted senders.

You can use the same services to send marketing emails or there are of course services like MailChimp / Constant Contact / Campaign Monitor / etc that provide all the UIs for marketers. Same thing applies though, you gotta set up SPF records.

> it's typically IPs blacklists that you find yourself dealing with if you're sending emails from your webserver(s).

Why wouldn't they blacklist the domain if they know that IP is associated with your domain?

Would the domain blacklist only happen after you've had several IP addresses blacklisted?

All I can do is speculate as I don't know the answer to either question conclusively.

So...to speculate: I know that spam email is coming from an IP; I don't necessarily know that the IP is officially associated with a given domain. Sure, I could do a dig to see if the A record matches the IP, but blacklisting an entire domain is a pretty draconian step. I need my blacklist to be accurate, with a minimum of false positives, or its market value diminishes. Adding domains based on IP association is going to be more likely to induce false positives and increase the administrative overhead involved with my blacklist, even if that overhead is mostly automated through "get me off this blacklist" forms and the like. /speculation

As an aside, I've noticed that if a cloud provider recycles IPs (Digital Ocean) the chances of pulling a blacklisted IP off the heap is pretty good. At this point, even if all the server is doing is sending the occasional password reset email to a handful of internal staffers, we run the email through Mailgun. It just isn't worth dealing with the hassle of trying to get an IP off a blacklist.

Why does the name 'Web Engineer' bother me in this context? It had taken me a while just getting used to software engineer, even having a degree. Software dev, especially application building is mostly combining pre-built libraries. A web engineer is someone I would think drafted and refined the protocols and technologies used by app devs. Anyone else have thoughts on this? Is engineer == dev, STEM == STE*M now? UX engineer, marketing/growth engineer?

The interesting one for me was measuring error rates (if your JS heavy web page calls the server 100 times, if two users try to load and one gets 500 that should be 50% error rate but it looks like <1%

So it feels like a piece of JS that says I have seen everything that should load load ok, then reports home with the page seems a good idea.

Probably quite doable - can JS see the network load like web developer tools ?

JS errors, page size, HTML errors, bad web design, CSS problems, SSL and broken layout has nothing to do with organic results or SEO and it is not proved that it affects your organic result position.

Even if it does, in-page problems doesn't affect your result not even by 1%. 99% is backlinks and their quality (or course).

Example: check techcrunch.com and debug/inspect their homepage. 15 JS errors, 6mb in size, no Description/Keywords, awful <title>, bad HTML elements. Can you even remotely beat them in the keyword "startup news"? No way. No matter how hard you will try to "SEO" your website.

I will tell it again. "SEO" was always a gimmick marketing thing. It's crazy that even now ppl don't understand that the only thing that matters are quality backlinks, as a result of quality content.

Exactly! And I think the problem with an SEO guide to engineers in particular, is that it ends up being a list of technical things. Things that are easy to quantify and that are black or white. Those are marginal. Yet an engineer can increase the score in some tool from X to 1.257X so they’re happy.

It’s much harder to write a guide on how to engage and delight your audience, and how to get the attention of other sites and the authority and trust to get linked to.

of course

There can’t be a generic solution to “am I done loading everything” (see the halting problem). And tightly coupling every request to a list feels awfully fragile. I think the real problem is making 100 api calls to load a page...

Amen to that. Too many pages are complex for no real reason.

>A bot in Russia isn't automatically bad and a bot in the United States isn't automatically good. Plenty of bad actors deploy bots from within Amazon's U.S.-based AWS servers.

"Checked his post history. Russian bot confirmed."

Don't waste your time reading this article, it just mentions random points about how to have a functional web app, plus in a perfect world SEO should not exist, it's the Search Engine's job to find the content and show the user the content he is looking for, not the developers. I shouldn't have to add "hacks" in my code for the Google Crawler to understand my site the same way users already do.

Eh, not sure I'd say SEO shouldn't exist. At least, the techniques used for it are all things that should be the expected default on every site, like having a logical HTML structure with heading tags, having a useful unique title for every page, having a description shown in listings, logically linking to relevant content, rendering the page on the server rather than the client, etc.

Maybe black hat/spammy SEO shouldn't exist, but at a certain level, good SEO is basically just about designing a good product and marketing it well.

Agreed. A lot of the "good" stuff you can do for SEO nowadays is pretty close to "stuff you should be doing anyway" for accessibility, usability, etc.

I agree with doing SEO as of implenting stuff the way it should be implemented (use tags as they were supposed to do, add rich snippets for ratings and prices), but not with stuff done purely to optimize for a search engine. As I said, the article doesn't mention any real SEO techniques, but only good practices that should be known by any junior web dev (site should load fast, use a CDN, mind the cache, etc.)

Agreed, but it’s not a perfect world and those who take into account that headings, structured data, ordered lists, internal linking structure, etc have positive benefits and rewarded by Google will prosper...until the day that Google’s algo is sophisticated enough to not need those anymore.

Get off your high hirse mate

yes comes across as a lazy developer and not in a good way.

I'm hoping my free SEO monitoring app in Cloudflare can at least help with the technical dirty work. It is powered by the Cloudflare Workers platform which allows for quick fixing from within the app https://www.cloudflare.com/apps/ranksense

This is an interesting use of Cloudflare Workers.

My unsolicited advice is to list some specific SEO fixes and improvements in the app description. It's not clear to me what it does and the screenshots are hard to parse.

Thanks. This is actually great feedback. I will list the issues we fix so far in the next update.

Solid list overall.

"Consider using the "304-Not Modified" response code on large websites with lots of pages that don't change very often."

This is the only thing that, from an SEO perspective, I'd challenge. I can't even begin to see where the supposed benefit of this would be.

The spirit of the idea is that Google will see the 304 status and move onto the next page more quickly than if it received a 200 status and reconciled that version of the page with the version that was previously crawled.

Thanks for clarifying. I guess that makes sense from a crawl efficient/budget standpoint, and in helping preserve server resources.

For context, I've only come across a HTTP 304 status once in 9 years of SEO and crawling websites on a daily basis. I've no first hand experience of them being deployed in this way at scale on a live website and so haven't seen any server log analysis that demonstrates the efficacy of their usage etc. But it's an interesting idea nonetheless.

304 has a specific use case, which is that if the crawler says, I have version X of the page already, give me a 304 if that’s still current or a 200 with content otherwise.

With dynamically generated content you’re more likely to just see 200s, but I think Nginx sets Etags automatically on static content so it’s common to see 304s there.

I’m pretty surprised you haven’t seen it often, but I’d guess it’s more to do with whatever crawlers you’re using (they’d need to be caching content and headers), rather than the scarcity of the status code.

> A complex web page might call the server 150 times as it loads to completion.

That irked me a little. I'm assuming this includes fonts & images (not a bunch of ajax requests), but it still seems high.

I would add not just rely on robots.txt - use no index and/or password protect.

Robots stops crawling not indexing

SEO is extremely simple. This is all you need to know to rank top page on Google: Get lots of backlinks from sites with good Google ranking to blog articles on your site about the subject you want to rank for.

In other words, publish content people want to link to, and get big players to link to it.

That’s it. Other SEO advice is usually a distraction or snake oil.

there is no such thing as "SEO". Also, everything on the list has nothing to do with Search Engine Optimization.


How come taking case of your server stability is SEO???

I'm writing a little more about that because it's sad to read an article like that that makes no sense.

SEO doesn't exist because the ONLY way to optimize your website is to write good content and get quality backlinks. To be precise, 99% of "SEO" is backlinks and 1% is your HTML quality (in-page SEO).

To give you an example, you can have the worst HTML, no internal linking, to Sitemap, no nothing, but have 10-20 quality backlinks and still be #1 in Google.

I'm doing internet marketing for more than 15 years now and the proof is out there. Quoting something else from the article:

"Google will try to follow relative paths inside of Javascript, even when they don't exist. This can result in polluted crawl error reports."

So he author claims that a relative path to a URL (inside a JS call) will effect your organic results. I wonder if anyone understands that this makes no sense and especially SEO-wise.

"SEO doesn't exist" yet Google hires SEOs to optimize their own sites AND they are not focused on getting backlinks. Go figure! https://www.thinkwithgoogle.com/advertising-channels/search/...

Polluted crawl error reports won't effect organic rankings, but they will make it harder to discover a legitimate page that is broken, but shouldn't be.

The quality score could go down which can lower rankings with too many broken links.

Google has at least 50-100 variables it checks and assigns different weights to.

quality score has nothing to do with organic results.

Bollocks links are still important yes but get the basics wrong and you wont rank on page 1.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact