2013 Search Engine Ranking Factors
Correlations
To compute the correlations, we followed the same
process as in 2011. We started with a large set of keywords from
Google AdWords (14,000+ this year) that spanned a wide range of search
volumes across all topic categories. Then, we collected the top 50
organic search results from Google-US in a depersonalized way. All
SERPs were collected in early June, after the Penguin 2.0 update.
For each search result, we extracted all the factors we
wanted to analyze and finally computed the mean Spearman correlation
across the entire data set. Except for some of the details that I will
discuss below, this is the same general process that both
Searchmetrics and
Netmark
recently used in their excellent studies. Jerry Feng and Mike O'Leary
on the Data Science team at Moz worked hard to extract many of these
features (thank you!):
When interpreting the correlation results, it is important to remember that correlation does not prove causation.
Rand has a
nice blog post
explaining the importance of this type of analysis and how to interpret
these studies. As we review the results below, I will call out the
places with a high correlation that may not indicate causation.
Enough of the boring methodology, I want the data!
Here's the first set, Mozscape link correlations:
Correlations: Page level
Correlations: Domain level
Page Authority is
a machine learning model inside our Mozscape index that predicts
ranking ability from links and it is the highest correlated factor in
our study. As in 2011, metrics that capture the diversity of link
sources (C-blocks, IPs, domains) also have high correlations. At the
domain/sub-domain level, sub-domain correlations are larger then domain
correlations.
In the survey, SEOs also thought links were very important:
Survey: Links
Anchor text
Over the past two years, we've seen Google crack
down on over-optimized anchor text. Despite this, anchor text
correlations for both
partial and
exact match were also quite large in our data set:
Interestingly, the surveyed SEOs thought that an
organic anchor text distribution (a good mix of branded and non-branded)
is more important then the number of links:
The anchor text correlations are one of the most
significant differences between our results and the Searchmetrics study.
We aren't sure exactly why this is the case, but suspect it is because
we included navigational queries while Searchmetrics removed them from
its data. Many navigational queries are branded, and will organically
have a lot of anchor text matching branded search terms, so this may
account for the difference.
On-page
Are keywords still important on-page?
We measured the relationship between the keyword and the
document both with the TF-IDF score and the language model score and
found that the title tag, the body of the HTML, the meta description and
the H1 tags all had relatively high correlation:
Correlations: On-page
See my blog post on
relevance vs. ranking
for a deep dive into these numbers (but note that this earlier post
uses a older version of the data, so the correlation numbers are
slightly different).
SEOs also agreed that the keyword in the title and on the page were important factors:
Survey: On-page
We also computed some additional on-page correlations to check
whether structured markup (schema.org or Google+ author/publisher) had
any relationship to rankings. All of these correlations are
close to zero, so we conclude that they are not used as ranking signals (yet!).
Exact/partial match domain
The ranking ability of exact and partial match domains (EMD/PMD)
has been heavily debated by SEOs recently, and it appears Google is
still adjusting their ranking ability (e.g.
this recent post by Dr. Pete). In our data collected in early June (before the June 25 update), we found EMD correlations to be
relatively high at 0.17 (0.20 if the EMD is also a dot-com), just about on par with the value from our 2011 study:
This was surprising, given the MozCast data that shows
EMD percentage is decreasing, so we decided to dig in. Indeed, we do
see that the EMD percent has decreased over the last year or so (blue
line):
However, we see a see-saw pattern in the EMD
correlations (red line) where they decreased last fall, then rose back
again in the last few months. We attribute the decrease last fall to
Google's EMD update (
as announced by Matt Cutts).
The increase in correlations between March and June says that the EMDs
that are still present are ranking higher overall in the SERPs, even
though they are less prevalent. Could this be Google removing lower
quality EMDs?
Netmark recently calculated a correlation of 0.43 for
EMD, and it was the highest overall correlation in their data set. This
is a major difference from our value of 0.17. However, they used the
rank-biserial correlation instead of the
Spearman correlation
for EMD, arguing that it is more appropriate to use for binary values
(if they use the Spearman correlation they get 0.15 for the EMD
correlation). They are right, the rank-biserial correlation is
preferred over Spearman in this case. However, since the rank-biserial
is just the
Pearson correlation
between the variables, we feel it's a bit of an apples-to-oranges
comparison to present both Spearman and rank-biserial side by side.
Instead, we use Spearman for all factors.
Social
As in 2011, social signals were some of our
highest correlated factors, with Google+ edging out Facebook and Twitter:
SEOs, on the other hand, do not think that social signals are very important in the overall algorithm:
This is one of those places where the correlation may be
explainable by other factors such as links, and there may not be direct
causation.
Back in 2011, after we released our initial social results, I showed how
Facebook correlations could be explained mostly by links.
We expect Google to crawl their own Google+ content, and links on
Google+ are followed so they pass link juice. Google also crawls and
indexes the public pages on Facebook and Twitter.
Takeaways and the future of search
According to our survey respondents, here is how Google's overall algorithm breaks down:
We see:
- Links are still believed to be the most important part of the algorithm (approximately 40%).
- Keyword usage on the page is still fundamental, and other than links is thought to be the most important type of factor.
- SEOs do not think social factors are important in the 2013 algorithm (only 7%), in contrast to the high correlations.
Looking into the future, SEOs see a shift away from traditional
ranking factors (anchor text, exact match domains, etc.) to deeper
analysis of a site's perceived value to users, authorship, structured
data, and social signals: