Update (May 29, 5:44 p.m. ET): Google released a statement, warning against assumptions based on “incomplete information.”
What you need to know
- Rand Fishkin of SparkToro obtained and published documents detailing Google Search’s internal APIs, search ranking factors, and Google’s data collection practices.
- Some of the leaked information contradicts Google’s public statements about its search algorithms and ranking factors.
- The documents were accidentally made public on GitHub between March 27 and May 7 and were later indexed by a third-party service.
A massive leak of what appears to be thousands of internal documents offers a rare glimpse into the inner workings of Google Search, suggesting that Google may have misled the public about its search engine’s operations for years.
The documents were handed over to Rand Fishkin of SparkToro, a software company, who then made them public. Fishkin, a seasoned SEO expert with over a decade of experience, says a source gave him 2,500 pages of documents, hoping to debunk “lies” Googlers had been telling about how the search algorithm actually works (via The Verge).
The documents feed into internal APIs and break down what affects search results. From these leaked documents, you can get a general sense of what works and what doesn’t for Google rankings, highlighting the key elements that matter most.
These leaks cover a wide range of topics, such as Google’s data collection, which sites get a boost on sensitive issues like elections, and how Google treats small websites.
Interestingly, some information contradicts what Google has said publicly. For example, Google has denied treating subdomains differently in rankings and has claimed they don’t use click-centric signals to index content, but leaks suggest otherwise, according to Fishkin.
Other surprises include using a sandbox for new pages, giving pages an “authority score” to bump them up in search results, and more.
Google has yet to respond to Android Central’s request for comment, but we’ll update this article when we do.
It appears that Google accidentally made these documents public on GitHub around March 27th, and they were removed by May 7th. However, a third-party service indexed them, so they are still accessible.
Although these documents reveal potential ranking factors, they do not specify the importance of each in the final ranking, as SEO expert Mike King pointed out in his summary.
Earlier this year, Google rolled out a major update to Search that prioritizes “useful” content. New algorithms are designed to determine whether a website is designed for search engines or real people.
Update
In a statement emailed to Android Central, a Google representative cautions the public not to jump to conclusions without all the facts.
“We would caution against making incorrect assumptions about Search based on out-of-context, out-of-date or incomplete information,” the spokesperson said. “We’ve shared extensive information about how Search works and the types of factors our systems weigh, while also working to protect the integrity of our results from manipulation.”
Google also mentioned that it traditionally does not comment on the specifics of its ranking systems. Sharing such sensitive information could help spammers and bad actors manipulate results, according to the company.
Search is always changing, and Google says it’s constantly tweaking its systems to deliver the best results. The spokesperson added that while the basic principles of Google’s rankings remain the same, individual signals may change frequently, be removed or simply tested and never used.
The search giant also reiterated its commitment to providing accurate information while protecting the integrity of search results. Finally, Google highlighted the potential for misinterpretation of leaked documents.