Search engine optimization is far from an exact science. Search results are a constantly moving target. Google says there are over 200 factors that go into their current algorithm.
Today, let’s address one question that comes up from time to time.
Does valid HTML really affect SEO?
The Truth About Validation and SEO
Search engines are how most of your customers look for services. If you can convert the leads that are coming to your website at a high rate, then getting more leads to your website means more revenue.
Some people have heard that having a site that validates is one of the 200 factors that Google uses to rank sites. This means that your website comforms to coding practices laid out by the World Wide Web Consortium, the group that is responsible for web standards. You can test your own site for free with the online Validator tool on their website. Go ahead and run a test, I’ll wait.
What you’ll probably see is that your site returned some errors. This might freak you out. You may think your site is broken, but usually these errors are not a big deal. Modern browsers render pages with these small errors just fine. Google has said several times that they have to return results for all pages, not just ones that validate.
Google’s own page does not validate according to the W3C (because they are trying to save bytes). But there are a few cases of errors that you should pay attention to and get fixed.
Validation Errors That You Should Fix
One type of error is when your HTML throws too many errors for the search crawler to parse, causing it to leave prematurely.
Web crawlers like Googlebot move through the internet, following links, and indexing the pages they come in contact with. Because there are so many sites in the world, these crawlers only spend so much time on each site. The more important the site, the more time they spend crawling and indexing there. Overly malformed code can cause the crawler to choke and depart early.
If you see this error when running a validation check, it may mean that search crawlers are encountering the same issue:
Cannot recover after last error. Any further errors will be ignored.
Another type of HTML validation error to pay attention to is unclosed tags or stray tags. These occur when there are extra or missing HTML elements in the website code. Browsers will still render the page correctly in most cases, but these can affect the tag nesting for the rest of the page.
In other words, every HTML tag that is opened, should be closed, and done in the correct order.
An Interesting Coincidence
Matt Cutts of Google said in this 2013 at the 1:14 mark that Google returns the best information, not the most cleanly coded sites, but he wouldn’t be surprised if there is a strong correlation between the two. He said then it was not a ranking factor, but may be in the future.
Marketing blogger Shaun Anderson did a test of his own in 2014, to see if Google preferred a page with valid HTML and CSS to an identical page without valid markup. Interestingly, Google chose to index the page with valid markup above the non-valid pages, but Anderson thought this could be due to the valid page being the last one edited.
Update: February 2016
In late January 2016, Google updated their Webmaster Guidelines. One of the phrasings in these guidelines made it sound like their prior stance on valid HTML not getting a boost was no longer true. This particular guideline under the headline Help visitors use your pages was changed to read “Use valid HTML.”
Google representative Jon Mueller said in a January 2016 video hangout that valid HTML is not a direct ranking factor, but invalid HTML can have some influence on how effectively Google crawls your site. Here’s the quote:
This came up recently with the change of the guidelines, with regards to change made in the webmaster guidelines. We mentioned use valid HTML. The question here is Is the W3C Validation (Broken HTML) ranking factor or should we care about it?
It is not directly a ranking factor. It is not the sense that if your site is not using valid HTML we will remove it from he index. Because I think we will have a pretty empty search results.
But there are a few aspects there that do come into play. On the one hand, a site with really broken HTML, something that we see really rarely, then it is really hard for us to crawl it and index the content because we can’t pick it up.
The other two aspects which are kind more in regards to structured data. Sometimes it is really hard to pick up the structured data when the HTML is broken completely. So you can’t easily use a validator for the structured data.
The other thing is in regards to mobile devices and cross browser support is if you have broken HTML then that sometimes really hard to render on newer devices.
John Mueller — January 29th, 2016
My Own Observations on Valid Markup and SEO
I did some competitive analysis of local sites in my industry, checking for valid code. Only one competitor site validated. The other sites I looked at all had errors. Some had 50+ errors, but none that affected page rendering. Some had fewer errors, but made the validator shut down with the aforementioned “Cannot recover”.
None of these were low ranking sites in local search. It appears that validation is a non-factor in SEO, as long as the page renders correctly in the browser.
Errors I Found And Fixed In My Own Code
I had a few errors that I didn’t realize I had. Here’s what they were and how I fixed them.
IE Conditional Classes Won’t Validate
When I first built this theme, I was still supporting Internet Explorer 7, 8, and 9. I was using a boilerplate code block plus other source code I had seen on the web. My old header file looked something like this:
<!DOCTYPE html> <!--[if IE 7]><html class="no-js ie7" lang="en"><![endif]--> <!--[if IE 8]><html class="no-js ie8" lang="en"><![endif]--> <!--[if IE 9]><html class="no-js ie9" lang="en"><![endif]--> <!--[if gt IE 9 | !(IE)]><!--><html class="no-js" lang="en-US"><!--<![endif]-->
The error I was getting was that the
element was already open. This was probably due to the fact had moved the
<meta http-equiv="X-UA-Compatible" content="IE=edge" /> tag above the
<head> when editing files recently. Apparently, this created a “shadow” element in the document tree.
Because I only have special IE8 code in my CSS, I decided to ditch the IE conditionals for IE7 and IE9. I also learned that putting a persistent
<html> element above the IE conditionals will make each page validate while supporting Internet Explorer. This is what the new code looks like:
<!DOCTYPE html> <html lang="en-US"> <!--[if IE 8]><html lang="en" class="ie8"><![endif]-->
Validating Google Fonts In W3C Validator
Many WordPress themes use Google Fonts to supply a selectable variety of fonts. These are usually called in through the
<head> via a separate style sheet for each font. Each style sheet is a HTTP request to an external server, and while these don’t cost a lot of time, I do what I can to improve page speed performance and shave milliseconds from the page rendering time.
My site uses a custom-built WordPress theme, and the link to Google Fonts is combined into a single link, like so:
<link href="http://fonts.googleapis.com/css?family=Source+Sans+Pro:400,600,700,400italic,700italic|Roboto+Condensed:400,700 rel="stylesheet" type="text/css" />
The W3C Validator said there was an error on this line.
http://fonts.googleapis.com/css…" for attribute href on element link: Illegal character in query: not a URL code point.
You’ll notice that when you use more than one font, Google gives you a link where the font-families are separated by a pipe [ “|” ] character.
What I didn’t realize is that I needed to encode the pipe character in Unicode UTF-8 as
%7C. At first, I thought the capital letters might be throwing the error. The error message also makes it sound like spaces throw an error, but there were none in this link. The fixed link now looks like this:
<link href="http://fonts.googleapis.com/css?family=Source+Sans+Pro:400,600,700,400italic,700italic%7CRoboto+Condensed:400,700" rel="stylesheet" type="text/css" />
I edit pages and posts in the WordPress text editor, not in the visual editor. This give me more granular control over the markup.
But WordPress sometimes tries to “help you out” by adding in
p tags when you hit the Return/Enter button. I had some stray
p tags on my homepage because of this, where I was adding
div elements and hitting Return, which accounted for about six errors. Normally, this doesn’t occur, but for some reason, it was on this page.
To solve this, I simply backspaced the
div elements to butt up against the closing
p tags. The errors disappeared.
The day after I cleaned up my code to validate it, I moved up one spot in the search rankings for one my main keywords. It’s important to realize that search rankings fluctuate a bit quite frequently. The Google search algorithm is tweaked constantly, and I see local search rankings change about once a week. The valid code likely had nothing to do with any move up or down.
As long as you are keeping your minor errors to a manageable amount, not screwing up your tag nesting structure, and not preventing the page from rendering correctly, there are scores of things you should focus your attention on before worrying about code validation.
But, if you can fix most of your validation errors with minimal effort, there’s certainly no harm in aiming for valid code, either.