We spend a lot of time in the SEO community debating terms and
definitions, even when they are established activities we've been doing
for years. This is doubly true for folks in tech who are not in the
search industry. If you have an abundance of free time, you can jump
into any Hacker News thread related to SEO and see there's still no agreement on whether or not SEO is a valid term or discipline.
"Semantic search is a search or a question or an action that produces meaningful results, even when the retrieved items contain none of the query terms, or the search involves no query text at all."
I've heard everything from "entity-based SEO" to "entity SEO" to "Search Entity Optimization" as descriptors for optimizing around entity-based results. I'd personally lean towards "semantic search optimization" or "semantic SEO," but I can guarantee one thing: It doesn't matter what you call it at the end of the day. Adjusting to the semantic search landscape will be part of the SEO's job description going forward.
It's obvious that Google's Knowledge Graph result above is generated primarily from the Freebase entry
on Sam Peckinpah. The shift that will be much harder to deconstruct
will be search results ranking sites that aren't clearly optimizing for
specific keyword queries, or may not contain what SEOs would consider
strong link profiles with exact- or partial-match anchor text. Consider
this result for the classic phrase from the movie 2001: A Space Odyssey:
The YouTube clips and other search results on the first page all
contain what you might expect to see in terms of on-page optimization
and anchor text profiles: keyword usage in the title/META tags/URL, and a
mix of exact- and partial-match anchor text in the link profiles. But
the IMDb and Wikiquote pages are a bit different, and don't contain
strong signals in either of those areas. There are quite a few links to
the IMDb page, but relatively little in the way of partial- or
exact-match anchor text that an SEO might be expect to see.
Additionally, while the phrase is found in the body content of the page,
the usual SEO sweet spots in the URL, internal anchor text, and HTML
title tag aren't optimized for the quote.
Gianluca Fiorelli recently wrote a piece on graphs and entity recognition, which addressed this topic and how it it may relate to co-occurrence and co-reference across web documents. Google released the Wikilinks Corpus this year, and in the release they describe a system of co-reference to add in entity resolution. Specifically, when are different mentions or queries referencing the same entity across web documents?
The Google/UMass Wikilinks project provides a good illustration of cross-document co-reference with two web documents that both link to the disambiguated entity 'Banksy' on Wikipedia:
Or in my previous example above, when people are searching for "I'm
sorry Dave," Google can fairly easily match that query to the entity 2001: A Space Odyssey
across web documents that co-reference the IMDb page, and return
results for that entity without relying on keyword string matches in
HTML tags and anchor text.
Interestingly enough, I've read two pieces from very sharp SEOs who have a different take on that. AJ Kohn makes a compelling case that keywords still matter as they are crucial in determining user intent and matching that to relevant results. While entity-based SEO and Knowledge Graph results attempt to guess user intent through localization, personalization, and entity disambiguation, there's nothing more clear in terms of intent than a keyword string of "hospitals in Seattle" or "What's the best Xbox 360 game?" (Obviously it's Bioshock.)
But there are a couple of signs that the keyword may be fading a bit as the ultimate arbiter of user intent. Consider the launch of Google's "conversational search," which layers what you've searched for, who you are, and where you are as intent modifiers to your query. Even stubborn old SEOs are coming to realize that there are layers of implicit intent in search results that we can't possibly unravel through keyword research or link graph metrics.
Mr. Bradley makes a very salient point in his SemTechBiz writeup (seriously, read that): Mobile is the driving force behind the semantic search revolution. Google, Bing, and Yahoo all see the writing on the wall with mobile adoption and the slow death of the desktop PC. Keywords may never die, but they're going to have a lot of company when it comes to determining user intent and serving relevant search results.
As a simplified mental model, you could group the search engine ranking factors into one of these categories:
It will be interesting to see what testing data and correlation
studies tell us about structured data markup as a ranking factor. If
Google and Bing can derive a clean signal from the presence of this
markup, it certainly meets other criteria we've typically used to mark
something as a ranking factor. Here at Moz, we'll soon be publishing
ongoing updates to the 2011 Search Engine Ranking Factors study. It should be interesting to once again see any changes in correlation data as well as the latest SEO survey results.
For many practitioners in the SEO industry, it feels like we may
have seen this movie before. Let's say, for example, that you spent
considerable time and money optimizing images with an eye on increasing
your visibility in Google image search. The recent UI change to Google image search results
likely had a significant negative ROI impact on that effort. There's a
very sound takeaway in that Define Media Group post: It's still a good
idea to adhere to SEO best practices for image search optimization, but
it likely changes how heavily you'd opt to prioritize that work versus
an activity that will yield more traffic or visibility. The same ROI
calculation should be applied to structured data markup, whether it's
schema.org, Open Graph Protocol, or Twitter Cards markup.
The vast majority of the rich snippets and Knowledge Graph elements in the search results are derived from Freebase and a small handful of other semantic data sources, such as the CIA World Factbook and MusicBrainz. Whether or not we choose to mark up our sites will have little effect on the current Google or Bing SERPs.
However, there's a massive amount of data still present in good old HTML, and the search engines are keen to use structured data to display that information. You can see the limitations of document retrieval and reliance on the link graph in any number of less-than-desirable search results. I believe Google and Bing will raise the bar on the quality of search results through the wider adoption of semantic data markup.
I also believe we should consistently hold them and any other structured data consumers accountable for making sure proper attribution and responsible user interface design are key parts of their structured data consumption. SEO has received a bad rap in some circles as simply being a vehicle for spam. The reality is that SEO heavy-lifting is behind many of the better search results you'll find. Going forward, the same guideline will apply to structured data.
A healthy web ecosystem will find a balance between search engine, user, and content publisher. Let's continue to remind the aggregators of our data of that as we continue down the semantic SEO path.
"Semantic search is a search or a question or an action that produces meaningful results, even when the retrieved items contain none of the query terms, or the search involves no query text at all."
I've heard everything from "entity-based SEO" to "entity SEO" to "Search Entity Optimization" as descriptors for optimizing around entity-based results. I'd personally lean towards "semantic search optimization" or "semantic SEO," but I can guarantee one thing: It doesn't matter what you call it at the end of the day. Adjusting to the semantic search landscape will be part of the SEO's job description going forward.
2. What do "entity-based search results" look like now?
The first wave of entity-based results in Google have been through "answer cards" and Knowledge Graph results. We're used to frequently seeing Google searches for people, places, and media object results that look like this:Gianluca Fiorelli recently wrote a piece on graphs and entity recognition, which addressed this topic and how it it may relate to co-occurrence and co-reference across web documents. Google released the Wikilinks Corpus this year, and in the release they describe a system of co-reference to add in entity resolution. Specifically, when are different mentions or queries referencing the same entity across web documents?
The Google/UMass Wikilinks project provides a good illustration of cross-document co-reference with two web documents that both link to the disambiguated entity 'Banksy' on Wikipedia:
3. So is the keyword dead?
Interestingly enough, I've read two pieces from very sharp SEOs who have a different take on that. AJ Kohn makes a compelling case that keywords still matter as they are crucial in determining user intent and matching that to relevant results. While entity-based SEO and Knowledge Graph results attempt to guess user intent through localization, personalization, and entity disambiguation, there's nothing more clear in terms of intent than a keyword string of "hospitals in Seattle" or "What's the best Xbox 360 game?" (Obviously it's Bioshock.)
But there are a couple of signs that the keyword may be fading a bit as the ultimate arbiter of user intent. Consider the launch of Google's "conversational search," which layers what you've searched for, who you are, and where you are as intent modifiers to your query. Even stubborn old SEOs are coming to realize that there are layers of implicit intent in search results that we can't possibly unravel through keyword research or link graph metrics.
Mr. Bradley makes a very salient point in his SemTechBiz writeup (seriously, read that): Mobile is the driving force behind the semantic search revolution. Google, Bing, and Yahoo all see the writing on the wall with mobile adoption and the slow death of the desktop PC. Keywords may never die, but they're going to have a lot of company when it comes to determining user intent and serving relevant search results.
4. Is structured data markup a ranking factor?
Wouldn't we love to know? Not to be rude and answer my question with a question, but when was the last time Google actually confirmed thing is a factor in their ranking algorithm? My memory says it was the site speed announcement in 2010. Readers should feel free to correct me in the comments if there is a more recent example.As a simplified mental model, you could group the search engine ranking factors into one of these categories:
- Popularity signals: Links, and the quality and quantity thereof in particular. Other visibility signals such as social media sharing would fall into this category.
- Relevancy signals: There's a whole lot that goes into this one, but a good reference point is the Google patent on phrase-based indexing.
- Things that dramatically affect user experience on a site: Hacked sites at the extreme, and smaller factors like site speed or reading level at the other end of the spectrum.
- Things that actually appear in the search engine results: Keywords in HTML titles, URLs, and META description tags (yes, they affect CTR at a minimum).
5. Will implementing schema.org markup actually hurt our search engine visibility in the future?
There have been a number of SEOs who raise valid concerns about the implementation of structured data markup. Will it enable scraper sites to easily take your data and use it to outrank you? Or worse, will Google vacuum up your data for its own purposes in Knowledge Graph results or increasingly sophisticated rich snippets? This tweet from Dennis Goedegebuure concisely sums up the latter concern, and it applies to Google, Bing, Facebook, Twitter or any other search engine or social media network:The vast majority of the rich snippets and Knowledge Graph elements in the search results are derived from Freebase and a small handful of other semantic data sources, such as the CIA World Factbook and MusicBrainz. Whether or not we choose to mark up our sites will have little effect on the current Google or Bing SERPs.
However, there's a massive amount of data still present in good old HTML, and the search engines are keen to use structured data to display that information. You can see the limitations of document retrieval and reliance on the link graph in any number of less-than-desirable search results. I believe Google and Bing will raise the bar on the quality of search results through the wider adoption of semantic data markup.
I also believe we should consistently hold them and any other structured data consumers accountable for making sure proper attribution and responsible user interface design are key parts of their structured data consumption. SEO has received a bad rap in some circles as simply being a vehicle for spam. The reality is that SEO heavy-lifting is behind many of the better search results you'll find. Going forward, the same guideline will apply to structured data.
A healthy web ecosystem will find a balance between search engine, user, and content publisher. Let's continue to remind the aggregators of our data of that as we continue down the semantic SEO path.
This article is an appealing of informative data that is interesting and well-written. I commend your hard work on this and thank you for this information. You’ve got what it takes to get attention.
ReplyDeleteDigital Marketing Services In India
Article was awesome. Thank you.
ReplyDeleteMarketing Research Services
Nice Blog!! Thanks for sharing.. Digital Marketing
ReplyDeleteGreat article.There have been a number of SEOs who raise valid concerns about the implementation of structured data markup.
ReplyDeleteBest SEO Company in Mumbai
Your posts about digital marketing are valuable for the readers.I really enjoyed reading this unique content.You can also read about Affordable Digital Marketing services Company
ReplyDelete