In general, a lot of developers recommend relative URLs when linking to other pages on your site — and a lot of SEOs recommend absolute URLs. I personally do not recommend or ever use absolute URLs when doing internal linking on a site regardless of who recommends it.
Note that this blog post is not an attack on SEOs and what they do – nor does it suggest all SEOs recommend absolute URLs. But in my experience it is mostly SEOs who recommend this practice and the examples I cite below are both from prominent SEO blogs, hence the title of the post.
To be clear, I will explain the types of URLs I will refer to in this blog. Say you are on the page www.domain.com/folder/page-1 and you wish to link to www.domain.com/folder/page-2. You have three choices:
- Relative URL — page-2
- Root relative URL — /folder/page-2
- Absolute URL — https://www.domain.com/folder/page-2
Who is right — the developers or the SEOs? I would like to counter the claims of some SEOs that you should always use absolute URLs. These are my opinion but I do think the SEO stance is largely made out of ignorance and that by educating SEOs we can move away from using absolute URLs.
Relative or root relative?
In general I would recommend root relative links. If you add a root relative link in a template, for example, the link works regardless of how “deep” into the website you are. Conversely, a non-root relative link only works relative to the folder it was created for. Root relative links actually work more like absolute links but they don’t tie you to a particular domain. Some SEOs see this as a bad thing but I will explain later why I don't think that is so.
Root relative links are easy to read, unambiguous and highly portable and should be used most of the time. In certain situations though — those that are beyond the scope of this blog — pure relative links are better.
The argument for absolute links
In her article, “Should I Use Relative or Absolute URLs?”, Ruth Burr Reedy at Moz makes the case for absolute links using reasons that I think epitomise a lot of SEOs.
She gives three reasons:
The first reason she gives is “scrapers” — people who use software to make a copy of your site and publish it online. While having people scrape your site is bad, absolute URLs won’t offer protection to a determined scraper. It’s not difficult to parse the HTML and update the links to use a different domain.
The second reason, and the biggest and most common one, is “preventing duplicate content issues”. Preventing duplicate content is good practice and not something exclusive to SEO. Ruth’s argument is that absolute URLs mitigate duplicate content.
In her example there are four ways to get at the same content:
The argument goes that if you want Google to only crawl one domain (and that’s right, you do) then absolute links prevent you from having four versions of your site (using the above as an example) indexed by Google. Absolute links do prevent this but as I will explain later, they’re not a very good way to do it.
The third reason given is “crawl budget”. This refers to the maximum number of pages a search engine will crawl on your site. This differs from site-to-site depending on various factors but Ruth’s reason effectively goes alongside her second point of duplicate content. If you have duplicate content then part of your budget may be used up in indexing the same page multiple times. Is this a legitimate concern for most sites? Maybe, but fixing your duplicate content issues using my suggestions below will make this irrelevant anyway.
In his article “Why relative URLs should be forbidden for web developers“, Joost de Valk cites some additional reasons for using absolute URLS.
Firstly, he says that it can lead to your test site being indexed. Yes, it can, but only if you don’t set things up correctly in the first place.
He also mentions the phenomenon of “spider traps” where a relative URL leads to you linking the wrong page. Relative URLs can cause problems in certain situations — which is why I mentioned above they should not be the default choice — but root relative URLs are immune to this.
Then he says you should use absolute canonical URLs. I agree with this and will discuss canonical tags later but point of contention is about using relative and absolute links within the body of the page.
Fixing duplicate content problems with absolute URLs is a non-technical fix to a technical problem.
SEOs identify the right problems but offer the wrong solution
Some of the problems SEOs try to avoid are causes for legitimate concern. But why recommend absolute URLs to fix this problem?
An absolute URL is a simple thing to understand and because they know it fixes the problem, they recommend it. But fixing duplicate content problems with absolute URLs is a non-technical fix to a technical problem.
Absolute URLs cause serious portability issues. That is, when you want to move your site from a test environment to a live one — you have to update all your URLs every time. And what if you forget to do so? Then you have the exact same problem Ruth and Joost claim absolute URLs fix. SEOs are rarely involved in moving sites around so I think most of them don't understand how big an issue this is.
Relative URLs, on the other hand, are fully portable.
How to implement relative URLs to ensure no duplicate content
So, use relative URLs but to prevent duplicate content penalties you need the canonical benefits of what absolute URLs offer — but at the server or application level. This typically means you end up with a single place where you can control which domains are used to access which sites.
In additional to relative URLs:
- Put a site-wide 301 redirect that a) ensures the correct protocol is used (HTTP or HTTPS) and the correct domain name is used in a single redirect — this can be done by your application or, even better, your server (the latter means less load on your server)
- Get your application to add a canonical tag to every page (without hard-coding the domain name so it's easy to change if you need to); the tag should contain an absolute URL to your chosen “live” domain name
- Create a dynamic robots.txt file to prevent your dev site(s) getting indexed
And that’s it. It’s really simple and not even that time-consuming to achieve. But what if you’re not technical and don’t know how to do the above?
That’s my point! Non-developers should not be making technical decisions and I believe this is part of the reason absolute URLs are as popular as they are. Tell the developer the challenge (“we don’t want duplicate content”) and let them decide the best way to it. Let each person work to their strengths.
As a community, let’s stop using absolute URLs for links to pages on the same domain.