Someone Could Steal Your Link Juice Without You Knowing!
Late last month (March 2018), Tom Anthony published his discovery of an exploit in one of Google's sitemap submission methods that allows a site to quickly get indexed and start ranking for highly competitive keywords.
Anthony, who is head of Product Research and Development over at Distilled, discovered a loophole in Google Search Console's sitemap submission ping URL feature that allowed him to steal the ranking power of some websites to make a brand new site with zero links rank for highly competitive terms.
You can read his full explanation on his blog. In this post we'll give a simplified explanation of what he uncovered, how it worked for SEO purposes and what it means for you, as a marketer.
Submitting Sitemaps Via Ping with Open Redirects
In order to understand the nature of the exploit, you have to understand 2 main components:
- GSC's sitemap submission ping URL
- Open redirects
1. Sitemap submission ping URL
There are 3 ways to submit your site's sitemap to help Google crawl and index your URLs: directly in Search Console, in your robots.txt file and via Google's ping URL: https://www.google.com/ping?sitemap=.
By adding your XML sitemap's URL at the end of the ping URL, like this: https://www.google.com/ping?sitemap=https://www.example.com/sitemap.xml, you can submit your sitemap directly to Search Console.
2. Open redirects
"Open redirect" is when a website allows user-generated URLs to redirect to another URL on an external domain based on a parameter in the page's URL. The "open" part refers to the fact that these redirects will work with user-submitted links. For example, if example.com redirected users based on the url= parameter, someone could link to example.com/?url=https://www.website.com.
When someone clicks a link to that URL, example.com will redirect them to website.com.
For a really technical explanation of how open redirects work, see here.
Open redirects are most often used for blackhat link building campaigns as well as phishing and other scam emails.
Someone can create a copy of a login page for a reputable website, send out a link to that site's domain containing an open redirect, and capture people's passwords on their own site.
Ping sitemaps with open redirects
The problem, uncovered by Tom, is that Search Console's sitemap ping URL allowed you to submit a URL with an open redirect, even if it pointed to another domain.
Pinging https://www.google.com/ping?sitemap=https://www.example.com/?url=https://www.website.com/sitemap.xml would cause Google to associate the sitemap for website.com as the sitemap for example.com.
This gives the owner of one site control over a sitemap for another site they shouldn't have.
Even worse, sitemaps submitted via ping don't show up in Search Console.
So you'd never know if someone did this to your site.
How Would Hijacking Sitemaps Affect SEO?
The ability to use this for SEO benefit comes from the hreflang component in a sitemap.
Essentially, hreflang points Google to alternate versions of a page meant for users in other languages and/or countries. When added to a sitemap for a URL, hreflang looks like this:
<url><loc>https://www.example.com</loc><lastmod>2017-10-06</lastmod><changefreq>weekly</changefreq><priority>0.9</priority><xhtml:link rel="alternate" hreflang="en" href="https://www.example.com"/><xhtml:link rel="alternate" hreflang="fr" href="https://www.example.com/fr"/></url>
Since the alternate links are meant to be the same content, just targeted for a different country or language, Google will "share" the link juice between the original URL and the hreflang links.
Even when those links aren't on the same domain.
For a full explanation of XML sitemap syntax and best practices, see our guide here.
In his experiment, Tom was able to take the link juice of a highly authoritative UK retailer by adding hreflang links to his website in the sitemap for the original site.
The Results
You can read Tom's full blog post link at the start of this piece to get the full details, but to summarize: he was able to start getting some traffic within 48 hours of submitting his sitemap:
In a couple of days his site was competing with the likes of Amazon and Walmart:
At this point, he was seeing more than a million search impressions and more than 10,000 clicks in Search Console. Having done nothing more than submitting a sitemap.
The step-by-step process
Here's the step-by-step process for this process:
- Create a sitemap for website1.com and host it at website2.com/sitemap.xml.
- Add hreflang links to the website1.com entries pointing to URLs on website2.com.
- Submit sitemap to Google's ping URL using an open redirect pointing to the sitemap on website2.com: https://www.google.com/ping?sitemap=https://www.website1.com/url=https://www.website2.com/sitemap.xml.
- Start ranking for website1.com's keywords.
The owner of website2.com can now choose what to do with this domain that has generated a lot of traffic for some potentially very valuable and competitive keywords.
Since this is clearly a blackhat SEO technique, the implications are not good.
What Can You Do About This Issue?
Fortunately, Tom published his blog post detailing this exploit after Google had closed it. So spammers can't target your site using the sitemap URL ping using open redirects.
Unfortunately, since Google doesn't show sitemaps submitted via ping in GSC, you won't be able to see if someone used this technique on your website in your Search Console.
Instead, you'll have to do some digging in Google search results.
Use the site: and inurl: operators to search for URL redirect parameters used for redirects, to find open redirect URLs.
Preventing open redirect abuse
The simplest and most effective thing you can do is not use open redirects on your site. If you must use open redirects, there are a few things you can do to prevent spammers from abusing your site:
- Use robots.txt to disallow any URL with your redirect parameter. This will limit the blackhat SEO value of your website, but won't prevent email scammers from targeting you.
- Disallow any off-site redirects or any redirect pointing to an external domain "” or any domain that's not whitelisted.
- Hash the destination URL and include the signature as a URL parameter. This prevents the public from adding redirects to your URLs.
- Add a redirect page that requires a user to click on the link, or a button confirming the destination URL, rather than simply completing the redirect. We even found a number of examples where people tried to use open redirects on WooRank as part of a shady link building scheme. Add the nofollow attribute to this link to discourage blackhat link building and negative SEO using this page. Or, even better, noindex this page.
- Invalidate redirect URLs that don't start with "http" or "https" to prevent malicious code execution. This won't prevent open redirects that are used in phishing scams, so this should be done alone with number 2.