31/07/2013

Useful Search Engine Querie


The search engines have been gracious enough to give us special search
commands for understanding their vast amount of data. The commands
that I find the most useful are:

cache:
site:
inurl:
intitle:
+-|

These commands, when used in combination, are powerful and
sometimes prove essential for diagnosing SEO problems. To the search engineers that created these commands, I send my sincerest gratitude.
You make my job much easier. (It should be noted that I have intentionally
left out the search engine commands that I don’t use. Because of this
decision, I recommend that you don’t treat this as a comprehensive list. It
details only the commands that I find essential for SEOs.)
These commands are useful for filtering search results to show only
pages that contain certain attributes. This means if you find a webpage
that has an issue like a misspelling in a title tag, you can use the search
engines to find all of the occurrences of this on your website and use this
information to fix the problem.

Another example of this is for checking the effectiveness of keyword
targeting. It is a common SEO problem to have multiple pages targeting
the same keyword. This is a problem because then all of these pages must
compete with each other for rankings rather than the best practice that
would have all of these pages combined and one more powerful page
competing for rankings. 

You can query a search engine in the following ways to yield some useful
data points:

Normal search: What is the best way to see how the search
engines will act? Run a normal search. According to my web history I
search using Google about 17 times a day. This does not include the
internal searches I do on Google properties like Gmail and YouTube
or the searches I do on my phone. I have found that the best way to
better understand Google is to continually and constantly use it. After
all, our goal as SEOs is to improve our clients’ rankings. What better
way to do this than studying search results every day?

Quotes: As I am sure you are aware, putting search queries in
quotes limits results to exact matches. This extremely helpful when
you want to see if a random page is in the Google index. Simply find
a random sentence in the content, wrap it in quotes, and search for it.
If it is long enough, odds are it has only been written once on the
Internet and should return only one result. If it doesn’t appear it means
it isn’t indexed. If it appears more than once, it means your client has
duplicate content issues.

Cache: Cache is a copy of the file Googlebot downloads when it
visits a website. As an SEO, this information is extremely important
because it shows you exactly what Google sees. This is especially
useful for determining crawl rate and diagnosing potential geolocation
issues.


When viewing the cached version of a website, try clicking the link
labeled “Text-only version.” This shows a much better representation of
what Google sees. I can’t count how many hidden links I have found by
using this trick.




This simple query can tell you two important
things:First, it gives you an idea of the major sections of a
website. It also gives you an idea of how many pages are
indexed in Google. If you know that a given site has only
100 pages, and this query returns 100,000 results, you
know you have a duplicate content issue.
Additionally, it makes you aware of some of the
subdomains on the given site. This is extremely helpful for
understanding how Google thinks a site is organized.
inurl: This command limits search results to those where the
query appears in the URL. This is most useful when combined
with the site command (site:www.seomoz.org inurl:"Rand Fishkin").
Most SEO professionals find this technique most useful for
identifying URL parameter–induced duplicate content
(site:www.example.com inurl:"sessionid"). I use this after I identify a
problematic parameter and I want to find all of its occurrences.
intitle: Similar to the inurl command, the intitle command limits
results to only those where the query is in the title tag. This can
be helpful for many things including piracy (intitle:"index of mp3"),
vanity searches (intitle:"danny dover"), and SEO-related things
like duplicate title tag detection (intitle:"my company: Best product
ever page").

+: The plus sign, when placed directly before a term, tells
Google to search for exactly that term, not synonyms. For
example, a search for ghw bush will return results that assume you
mean “George Herbert Walker Bush”. A search for +ghw bush,
however, will return results that assume you want specific
references to “GHW” in the results
.
-: The minus sign is a tremendous aid to filtering queries, and it
can be used with specific query terms (cubs -chicago -baseball will
show you results for “cubs” that do not contain Chicago or
baseball) or in conjunction with specific operators discussed in
this section. Searching for "danny sullivan" -
site:searchengineland.com will return results about Danny Sullivan
that appear anywhere except for SearchEngineLand.com. This
operator works similarly to filter out title contents (music -
intitle:mp2) and URL contents (site:nytimes.com –inurl:pagemode=print
shows all indexed pages from nytimes.com that are not “printfriendly”
versions).

|: The pipe symbol symbolizes an “OR” search and can be used
with regular query terms or with the commands listed in this
section, primarily when you’re looking for multiple items within a
given dataset. For example, site:example.com
inurl:sessionid|jsessionid will find URLs that contain either
“sessionid” or “jsessionid” in indexed URLs from example.com.
Similarly, site:seomoz.org danny|rand will return pages from
SEOmoz.org that contain either “danny” or “rand” in the copy.
(Pages that include both “danny” and “rand” will also be
included with this operator, so it’s a true “and/or” operator, not
an “exclusive or” operator.)
The search engine commands in Google must be started with a
lowercase letter or they won’t work properly.



Common Questions These Queries Can Answer

As I’ve already alluded to, you can use these queries to quickly answer
some key questions.


Is This Page Indexed?

To answer this question all you have to do is search for the URL preceded
with the inurl command. For example, the query
inurl:"digg.com/users/jayadelson" checks to see if the Digg profile for Digg’s
CEO is indexed. Hint: It is.
Does This Page Suffer from Duplicate Content


Problems?

If you have to ask, the answer is likely yes. To be sure, you can use any of
the search engine commands previously discussed to check. Alternatively,
you can use my preferred method and search for a full sentence from the
page with the site command. For example, the query site:google.com "Gmail
stores, processes and maintains your messages, contact lists and other data related
to your account in order to provide" shows you that Google has its Gmail
privacy policy posted on two different URLs. Tsk, tsk.


Omitted Results

Sometimes Google will mask similar results on searches. When it does this, it
provides an indication of this with a link that says “omitted results”. In these
cases, it is important to click this to see the pages that Google has decided are
duplicate pages.
About How Many Pages on This Domain Are

Indexed?

This question can be dangerous because the number that is returned is not
always accurate. The major search engines have data centers located in
many places around the world that contain different versions of indices with
different amounts of URL. This means that if you check the amount of
pages indexed from one site, it can vary depending on which data center
you happen to be accessing at that time. (Note: the data center you are
accessing is not disclosed on the search result page.) If you are asked by
a client for indexation numbers, you can generate a rough estimate by
using the search site:example.com and using the number of results. If you do
this, it is important to let the client know the problems with this metric.



0 comments:

Post a Comment