By Lee Underwood
JupiterMedia
As a web developer, it's important to know how search engines work.
While the details are rather complex, we'll look into the basics of
crawler-based search engines (this involves a certain amount of speculation
as the exact calculations are a closely guarded secret).
The index of a crawler-based search engine is built through the use
of robots (spiders,
web crawlers) which operate on a fixed set of instructions. The robot
selects a page to visit from a list of links (a "queue") gathered from
web pages that were previously searched. It fetches the web page, collects
certain information (such as visible text, meta tags, links, etc.)
and sends it to an indexing program. The information is then entered
into a database, ready for searching inquiries, then the newly gathered
links are entered into the queue for a future visit and the process
begins again.
Link Analysis
I need to write a paper on the topic definition essay example. Finally found.
Every major search engine uses link analysis as a part of its ranking algorithm ,
according to Danny Sullivan, editor of Search
Engine Watch . It differs from link popularity in that links are
given a "weight" (rank of importance) determined by a preset calculation,
whereas in link popularity a web page's importance is ranked according
to how many hyperlinks are pointing to that page, regardless of where
they came from. According to Google: "In essence, Google interprets
a link from page A to page B as a vote, by page A, for page B, but
Google looks at more than the sheer volume of votes, or links a page
receives; it also analyzes the page that casts the vote. Votes cast
by pages that are themselves 'important' weigh more heavily and help
to make other pages 'important.' Important, high-quality sites receive
a higher PageRank, which Google remembers each time it conducts a search....
Google combines PageRank with sophisticated text-matching techniques
to find pages that are both important and relevant to your search.
Google goes far beyond the number of times a term appears on a page
and examines all aspects of the page's content (and the content of
the pages linking to it) to decide if it's a good match for your query" ( Google
Technology ).
Note: An interesting discussion on the anatomy of search engines can
be found in the original PageRank paper by Google's founders Sergey
Brin and Lawrence Page, " The
Anatomy of a Large-Scale Hypertextual Web Search Engine ." Although
somewhat long, it makes for good reading.
Teoma , owned by Ask
Jeeves , makes use of what it calls "Subject-Specific Popularity." This
technology, according
to Teoma , "ranks a site based on the number of same-subject
pages that reference it, not just general popularity." Teoma's process
allows for a fine-tuned search using the authority of the link as
a part of its relevance. Web sites are grouped into "communities" that
have the same topic. Searches are then further refined within the
communities, using Subject-Specific Popularity.
Outgoing links are not used in the algorithm for good reason. Think
about it for a moment. The web developer creates the outgoing links.
If those links were used in the algorithm, he would only need to link
to the most popular sites on the web to increase his site's search
engine listing position.
Topic Sensitive PageRank
Another method involved in ranking web sites is called "Topic Sensitive
PageRank" (TSPR). It's an enhancement to Google's PageRank. Instead
of ranking the pages solely based on all incoming links, TSPR gives
weight to links that relate to the page's main subject area, like Teoma.
Links from sites not directly related to the page's subject matter
(topic), are assigned a lesser degree of weight in the calculation.
A similar process is called "Hilltop."
There's much discussion about what method Google uses for site ranking.
Topics include the use of TSPR and something called "block ranking," which
basically groups internal links and uses that as a starting point for
the original PageRank algorithm ( Google
PageRank Calculations to Get Faster? ).
Keywords
Keywords on a web site are just that: key words . Nothing magical.
They are the words that potential visitors might enter into a search
engine that could lead them to your site.
For example, to find a particular music site, I might enter the search
terms, "blues music." On Google, that returns about 796,000 links.
But let's say that I'm interested in blues music from the Mississippi
Delta. Then I would add the word "delta" to my search: "blues music",
delta (the quotes keep the words together). That narrows it down to
30,500 links. That's still a lot of links but many of them probably
aren't relevant to what I want and it's a little more manageable than
796,000. Of course, the search could be further refined.
The search engine returns these particular pages because they have
the words blues music and delta on them (or in the anchor text of links
pointing to them, see below); those are the keywords , in this case.
If you have a web site about blues music then you need to add those
words somewhere on your web pages so they can be found by people who
want to know more about blues music in the delta. You might also be
able to have them included in the anchor links pointing to your site.
But do you just add the keywords at random on the page or maybe within
a comment tag?
It doesn't quite work that way. The algorithms used by the search
engines have become very sophisticated and would pick that up right
away. Instead, you should have content that contains the keywords,
such as articles about the delta blues, albums and songs with the words
in their titles and song lyrics containing the words, however, you
don't want the page to just contain those keywords or be heavily weighted
in their content. According to Google's standards, no more than 2%
of the words on a page should be actual targeted keywords. A general
rule of thumb is 2%-8%. It's called " keyword density " and refers
to the percentage of keywords contained within the total number of
indexable words on a web page.
Generally, keywords are put in the title and meta tag. They should
also have prominence within the page, i.e.at the beginning of the web
page and at or near the beginning of a paragraph or a sentence.
There are places on the web to help determine good keywords. Some
internet marketing specialists provide special copywriting services
just for keyword placement, but that may be going a bit overboard.
Anchor Text
The visible text used in a hyperlink is called " anchor text ". For
instance, "delta blues music" is the anchor text in the following statement: "Get
all your <a href="http://www.bluesmusic.com/"> delta blues music </a> here." It's
used to highlight the underlying link. Search engines use this text
to enhance the relevance of the link in a related search request. The
relevancy of the link to your overall site increases its weight, or
value, with the search engine.
The page targeted by the link is enhanced, not the page the link is
on. The anchor text will only help the current page if any keywords
appear in it, as in our example above.
Anchor text works best when used within the context of the web page.
It's important to make sure the links actually say something, as in
our example above. That's much better than using, "Get all your delta
blues music <a href="http://www.bluesmusic.com/"> here </a>."
Don't try to come up with snappy ideas of using anchor text. Search
engines compare the relevancy of your content to the links that are
in it. They need to make sense for both the search engine and your
visitors. There's not much point in having a site at the top of the
search engines if you don't have anything for your visitors when they
get there and they have no reason to return. Actually, that type of
site won't last long at the top of the search engines, if it ever makes
it there in the first place.
Anchor text can be very important. Your search engine rankings can
be increased even if the anchor text used by another page does not
appear on your page. Doing a Google search for " miserable
failure " will list three web pages at the top (not counting the
paid ad) that don't have those words anywhere on their page. In this
case, it's known as Google
bombing .
Content, Content, Content
As I said, the exact calculations used by search engines, and the
manner in which they're used, are closely guarded secrets. They are
the very foundation of what makes a particular search engine different
from all the rest. Opinions as to what methods are used by search engine
companies varies even among the leading experts in the field.
Providing high caliber content is one of the most important things
you can do to increase your site's search engine ranking. Doing so
will keep your visitors coming back and they will recommend your site
to others. Eventually, the more important, heavier-weighted links will
point to your site, creating important incoming links. Google puts
it very simply, "make pages for users, not for search engines." According
to Danny Sullivan: "do the basic, simple things that have historically
helped with search engines. Have good titles. Have good content. Build
good links. Don't try to highly-engineer pages that you think will
please a search engine's algorithm. Focus instead on building the best
site you can for your visitors, offering content that goes beyond just
selling but which also offers information, and I feel you should succeed."