Search Engine Optimization
Search Engine
A program that searches documents for specified keywords and returns a list of the documents where the keywords were found. Although search engine is really a general class of programs, the term is often used to specifically describe systems like Alta Vista and Excite that enable users to search for documents on the World Wide Web and USENET newsgroups. Typically, a search engine works by sending out a spider to fetch as many documents as possible. Another program, called an indexer, then reads these documents and creates an index based on the words contained in each document. Each search engine uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query.
Search Engine Optimization
Search Engine Optimization, also known as SEO, is the process of increasing the amount of visitors to a Web site by ranking high in the search results of a search engine. The higher a Web site ranks in the results of a search, the greater the chance that that site will be visited by a user. It is common practice for Internet users to not click through pages and pages of search results, so where a site ranks in a search is essential for directing more traffic toward the site. SEO helps to ensure that a site is accessible to a search engine and improves the chances that the site will be found by the search engine.
Or
The process of improving web pages so, that it ranks higher in search engine for targetted keywords with the ultimate goal of generating more revenue from the web site.
There are many SEO techniques. In general, these techniques can be group into two main areas: On Page SEO and Off Page SEO.
On Page SEO
"On Page" SEO simply refers to the text and content on our web site pages. Basically editing our page and content so the Search Engine can find our webpage when a surfer is searching for our web sites particular topic.
On Page Elements
On Page Elements refer to the html tags within the page. They include Heading Tags (<H1>), Title Tags, Bold Tags, Italic tags on our web page. Below is an example of phrase "SEO Company" used in a Heading (<h1>) and Bold (<b>)
Off Page SEO
"Off Page" SEO refers to the text and "inbound" linking from OTHER web site pages to our web site pages. This is a much more difficult if not impossible factor to completely control.
Off Page Factor
Off Page Factors are factors outside of our normal control of the content of our site. They are often referred to by search engine optimization experts as the "inbound links" our site receives from OTHER websites. It is almost like a popularity contest in that search engines factor in how many other websites have visited our site, thought the information was relevant to their site and "linked to us"
Spider search engines
Search engines compile their databases using "spiders" or "robots" ("also called bots") to crawl through web space from link to link, identifying and perusing pages.
Directory engines : In contrast to spider search engines, directories are not automatically generated but are compiled by editors. A website that is submitted to a directory is subsequently catalogued and linked to one ore more topics. As the directories are set up by experienced editors, they generally produce more targeted results.Example is Yahoo.com .
Hybrid search engines : Many search engines today combine a spider engine with a directory service. The directory normally contains pages that have already been reviewed and accessed.
FFA lists (Free For All Link) : Free-for-all link lists are pages in which links to website can be entered together with a short comment. Owners of websites thereby enter the registration themselves, and there is no editing board to review the entries, it is free and become immediately effective. Promoting web site to FFAL is not very unpopular because of poor quality directory listings.
Pay Per Click search engine : A Pay Per Click (also known as Pay Per Ranking, Pay Per Placement or Pay Per Position) search engine enables you to list your site at the top of the search results according to the keyword bid. Examples are Overture.com, 7Search.com,FindWhat.com etc
Meta : Metasearchers or Metacrawlers do not compile databases. Instead, they search the databases of multiple sets of individual engines simultaneously. Examples are Dogpile, MetaCrawler, Mamma etc.
How do Search Engine Work?
1) The searcher types a query into a search engine.
2) Search engine software quickly sorts through literally millions of pages in its database to find matches to this query.
3) The search engine's results are ranked in order of relevancy.
All search engines go by this basic process when conducting search processes, but because there are differences in search engines, there are bound to be different results depending on which engine you use.
Terms use in SEO
Absolute link
When writing a hyperlink from one page to another, an absolute path would be written as <a href="http://www.domain.com/directory/page.html">Link</a>
Address
This is the location where a website or web page can be found on the internet. Example: http://www.PR3.co.uk/ or http://www.PR3.co.uk/seo/seo-glossary.php.
Agent Name
An agent name is the name of the software accessing a web page, e.g. Googlebot/2.1 is one of Google’s spiders or MSIE x.0 is Microsoft Internet Explorer.
Algorithm
An algorithm is a unique formula used to judge the relevance of a web page.
Alt Text
This is used to describe what an image is when the mouse pointer hovers over the image. The correct syntax is: <img src="/images/image.gif" alt="Place your ALT text here!" width="50" height="50">
Anchor Text
This is the text that appears in a clickable text link, for example in This Link, the words “This Link” are the anchor text. The correct syntax is: <a href="http://www.seopramod.co.uk">This is anchor text</a>
Backlink
A backlink is a hyperlink pointing to your site from an external source.
Blind Traffic
This is traffic that is extremely low quality often by low relevance pages.
Black Hat SEO
This is the practice of “unethical” SEO, techniques to gain an advantage over your competition using techniques that are generally against Google’s ‘webmaster guidelines’ and other search engine. An example of a black hat technique would include cloaking or doorway pages.
Blog
Short for weblog, a blog is a quick way to publish news and happenings through a browser based admin panel.
Bot
Short for robot. It is also known in some cases as a search engine spider whose role is to search the web for documents and changes in existing documents, read the contents of the document and send them to the indexer. Bots are simply pieces of software that automate a data gathering process, whether that is content from website's to email address harvesting.
Browser
Software used to browse the internet. Typical examples include Microsoft Internet Explorer (MSIE x.0) and Firefox.
C-Class IP
An IP is the location of a web server on the internet; typically C-Class IP’s are used by SME’s. When building networks for the purpose of SEO, it is important that when cross linking is carried out, it is carried out from servers on different C-Class IP’s. If site A links to site B and both sites originate from the same IP, the chances are, depending on the scale of cross linking, that you may receive a penalty. If however you use a different C-Class IP, the only way these sites can be linked back to the originator is through whois data.
Click Through Rate (CTR)
CTR is a percentage of clicks in relation to the number of impressions. For example, if you run a Google sponsored ad that is displayed 800 times in one day and is clicked 10 times, the click through rate (CTR) would be 2.4%.
Cloaking
Cloaking is the use of server side software to detect when a search engine spider is crawling your website. As soon as a spider is detected, it is served with a different version of your site, a different version of what your visitors see and which will be
Comment Tag
The comment tag is sometimes used to insert keywords in which are only visible in the source code of an HTML document. Some people say they make a difference in search engine ranking, others, myself included, believe that they carry no weight at all. Correct syntax: <!-- Comments go here-->
Conversion Rate (CR)
A conversion rate is a ratio or percentage of how many unique visitors have carried out the main purpose of the website or web page, whether the main purpose is to sell an item or generate new members for your forum. Example: Your main purpose is to sell widgets and your site generates 10,000 unique visitors which generates 300 sales, the CR would be:
300 ÷ 10,000 x 100 = 3% CR (or 1:33, i.e. 1 widget sold for every 33 unique visitors)
Cookie Cutter
A quick way to create hundreds or thousands of pages in a short time, often using a template file and a database often redirect to a specific page.
CPA
Cost per action.
CPC
Cost per click. There is a cost per click for all clicks, whether it is from PPC adverts or organic listings from SEO.
CPL
Cost per lead. Similar to CPS, except instead of making a sale, you will be bringing in new clients, for example, loan applicants.
CPM
Cost per thousand. Advertising networks sell advertising in blocks of 1,000 page impressions.
CPS
Cost per sale. This value is how much money needs to be spent to make one sale. In our example above, 1 sale was made out of every 33 unique visitors, so if our CPC is $0.50 our cost per sale would be 33 x $0.50 = $16.50.
Crawl
When a search engine spiders a site crawling from page to page by following links.
Cross Linking
The method of creating multiple sites and hyperlinking them all together.
CSS (Cascading Style Sheet)
CSS is a way to control how pages appear, from one external file you can control the fonts, colors, sizes, positioning and more. For more information about CSS.
De-listing
When a web page has been removed from a search engines index, this may be due to a penalty from the search engine in question.
Dead Link
A hyperlink pointing to a non-existent URL resulting in an error 404.
Description
A meta tag used to describe or summarize the contents of a web page. The correct syntax is: <meta name="description" content="Describe the contents of your web page here!"> Some search engines such as yahoo uses the description tag as its Snippet.
Directory
A directory categorizes website's by topic and lists them into categories, and is edited by hand rather than an automated crawling process. To be included in a directory you must submit you website URL, along with a title, description and sometimes some keywords, and your submission will be reviewed by an editor and maybe edited or moved to a more suitable category before being included.
DNS
Domain Name Server is what translates a domain name into an IP address of the server/location where a website is stored.
DMOZ
DMOZ or ODP (Open Directory Project) is the largest most comprehensive human edited directory of the web. It is maintained by a global community of volunteer editors and its results feed many other directories.
Doorway Page
Doorway pages are created to achieve a high ranking position in search engine results often by keyword stuffing. Doorways are against Google's guidelines because they are often mass produced with only minor changes to each page and often use automatic redirects so the user will not see the requested page as he/she will be redirected to somewhere else.
Dynamic Site
A dynamic site uses a database such as MySQL to store and serve its content.
Error 404
When a hyperlink is pointing to a location on the web that doesn't exist, the error returned is an error 404.
FFA
FFA stands for free for all, and are sites that use software to automatically update a links page when someone submits a link to it. No human administration is required and therefore FFA sites are common ground for spammers.
Flash
Flash is software by Macromedia to create vector based graphics, animations and complete user interfaces. Can be used to create complete website's.
Frames
A technique used to load multiple HTML documents within the same page to give the look of one page when it is really 2 or more pages.
Forum
A forum is an online notice board where users can post questions or post answers or comments to those questions. Popular SEO forums include, Search Guild,
Google Dance
When Google updates its index, results in the Serps fluctuate between data centers. Search terms ‘dance’ up and down the serps as the index is updated across over 100,000 servers in around 14 worldwide data centers.
GoogleGuy
GoogleGuy is a Google employee who regularly posts in the Google forum at Webmaster World with the username ‘Googleguy’ .
Googlebot
The agent name of Google’s search engine spider which crawls the web to create its searchable index.
Heading Tag (<H1>, <H2>)
A HTML tag used to headline a web page and is done to describe the contents of a page. Correct syntax: <h1>Headline goes here!</h1>
Hidden Text
Is text that is the same color as the background Example, white text on a white background. Spammers use this as a technique to stuff keywords on a page with out which are not visible to its visitors.
.htaccess
.htaccess is an Apache file that allows server configuration instructions.
HTML
Hypertext Markup Language (HTML) is that standard language to create web page’s and is basically a set of instructions used by a browser to enable it to display a web page correctly.
HTML Validation
As with all languages, there is a proper way to use it and an improper way to use it. Valid HTML is HTML that has been written in accordance with WC3 and uses the correct syntax.
HTTP
Hypertext Transfer Protocol (HTTP) is the way documents are transferred across the web.
HTTPS
HTTPS is a secure protocol to transfer information across the web by encrypting the information before transferring.
Inbound Link (IBL)
An inbound link is a link pointing to your website from an external site.
Indexer
An indexer reads the documents that are fetched by a search engine spider and indexes them according to the words contained in each document as well as external factors such as quantity and quality of inbound links and the anchor text used on those inbound links.
IP Address
An IP address is a numeric address to the location of a server or computer accessing the internet. An IP address is made up of four groups of numbers known as octet’s e.g. 212.227.127.86. The first octet determines what class of IP it is, for example, the first octet in a class-c IP’s range from 192 to 223, the second and third octets are also used as an identifier and the last octet is to determine the host of the IP.
JavaScript
JavaScript is a client side programming language designed to be embedded into HTML documents. It is used to create simple programs such as clocks and calendars that can be seen on web page’s. JavaScript can also be used to create website navigation but search engines find this difficult to navigate through since search engine spiders do not have the luxury of a JavaScript interpreter like those installed in browsers such as Microsoft Internet Explorer.
KEI
Keyword Effectiveness Index from WordTracker.com is a comparison of how many times a keyword is searched for on a monthly basis with the number of competing website's for that term. The higher the KEI the higher the ROI since there will be a lot of traffic with a minimal number of competitors.
Keyword / Keyword Phrase
A keyword is the subject matter a web page is targeting. A travel agent may sell flights, flights would be a keyword, however, single words are extremely diluted with competing sites so web marketers tend to target niche keyword phrases such as cheap flights, or better still, cheap flights to Orlando.
Keyword Density
The number of times a keyword appears on a web page as a percentage of the total number of words used.
Keyword Stuffing
Techniques used to stuff keywords into a HTML document which carry verry little or no weight at all. Places that can be stuffed with keywords include keyword tags, title tag, comment tag.
Keyword Tag
An HTML tag which is used to describe or summarize the page with a series of keywords. The correct syntax is: <meta name="keywords" content="keyword 1, keyword 2, keyword 3">
Link Building
Link building is as it sounds, building up inbound links to a website to increase link polularity.
Link farm
A site that features links in no particular order which are totally unrelated to each other. Links on pages like these offer no link popularity benefit whatsoever.
Link Popularity
The number of inbound links to a website, you can check your link popularity with our link popularity tool.
Meta Refresh
A method used to automatically redirect a user to another page. Example:
<head>
<meta http-equiv="refresh" content="0;URL=http://www.domain.com/dir/page.html">
</head>
Meta Search
Meta Tags
HTML statements used for instructions to search engines. Examples include:
<meta name="description" content="Description">
<meta name="abstract" content="Abstraction">
<meta name="keywords" content="Keywords">
<meta http-equiv="Content-Language" content="EN-US">
<meta name="author" content="www.imagewebsolutions.com/">
<meta name="distribution" content="Global">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Cache-Control" content="no-cache">
<meta name="robots" content="FOLLOW,INDEX">
<meta http-equiv="refresh" content="60">
<meta name="revisit-after" content="1 Days">
<meta name="expires" content="mon, 01 jan 2010 01:30:00 GMT">
<meta name= “rating” content= “general”>
Mirror Sites
A mirror site is a site that exacltly duplicates another site. For example, the same content on domain.co.uk as domain.com including pages in directories and sub-directories.
Misspellings
Sometimes people deliberately spell somthing wrong in order to rank high and wait for people to accidentally misspell that same keyword. For example Dubia should be spelled Dubai - the correct spelling has 17.5 million competing sites wheras the incorrect spelling has just 158,000. Misspellings are generally added in the keyword tag so it won't show in the snippet.
Mod_Rewrite
Mod_Rewrite is an apache module which will allow URL's to be rewritten. Often this is used to converts dynamic URL's with multiple query strings into static URL's. An example of this would be to convert the dynamic URL yourdomain.com/search.php?day=31&month=may&year=2005 to yourdomain.com/search-31-may-2005.htm
This is done by adding the following in a .htaccess file:
Options +FollowSymLinks
RewriteEngine on
RewriteRule search-(.*)-(.*)-(.*)\.htm$ /search.php?day=$1&month=$2&year=$3
MSSQL
Microsoft's relational database used on windows servers.
MySQL
Open source relational database Structured Query Language.
Natural Listing
When a query is searched for and the result page has loaded, the first few results are sponsored links or PPC (pay per click). These results are paid advertising and below these ads are the organic listings. Organic listings are what SEO's are able to achieve by carrying out various on-page and off-page optimization techniques. Niche
To specialize in an individual segment of a specific market or to capitalize in an untapped market.
NOFOLLOW
An attribute used in a link to instruct search engines not to follow the link. Correct syntax: <a href="http://www.domain.com/dir/page.php" rel=”nofollow”>Text Link</a>
ODP
Open Directory Project. (note: for detail Domz)
Off-Page Factors
Factors that alter search engine positions that occur externally from other website's. The most important off-page factor are backlinks and more importantly, the correct use of keyworded anchor text within the clickable text of the link pointing to a website.
On-Page Factors
Factors that alter search engine positions that occur internally within a page of a website. Such factors could include use of text, headings and sub-headings etc.
Organic Listing
When a query is searched for and the result page has loaded, the first few results are sponsored links or PPC (pay per click). These results are paid advertising and below these ads are the organic listings. Organic listings are what SEO's are able to achieve by carrying out various on-page and off-page optimization techniques.
Orphan
When a web page has no navigable links pointing to it, it is often referred to as an orphan page. Orphan pages are mainly the result of gateway pages, where hundreds of pages are engineered to appear high in the SERPS and often redirect to a homepage.
Outbound Link
This is a link on a web page that points to an external site.
Overture
An advertising network used by search engines such as Yahoo & MSN who display PPC adverts in its results page. Advertisers can bid on the keywords that are searched for on these search engines and the highest bidder will appear higher in the search engine results as a “sponsored result”.
Page Impressions
Is how many times pages in your website have been loaded, for example, a user goes to the homepage at yourdomain.com, clicks a link and then goes back to the home page. This user has created three page impressions.
PageRank
PageRank™ is a unique algorithm developed by Google founders Larry Page and Sergey Brin at Stanford University and determines the importance of a web page. The main factor behind the PageRank algorithm is link popularity. If one site links to another site, then Google interprets this link as a vote, the more votes cast, obviously the more important the page must be. When Google calculates PageRank (commonly known as PR) not only does it take into account the number of votes cast, but also how important the actual vote is. For example, if you were to go for a job interview with a good reference from your previous supervisor at hand it would be looked at by the interviewer as a positive point. But if that same reference was from the CEO, it would be considered far superior. A link from a high PR site would increase the importance of the page being linked to.
PFI
Pay For Inclusion.
PPC
Pay per click. For instant traffic, pay per click is an excellent approach to search engine marketing. Website owners can bid on keywords and pay a set amount for each click. Clicks can cost as little as $0.05 but as high as $80.00. (mesothelioma) You can find out how much keywords are at the Overture Advisor Center.
Query
Another word for a search term.
Query Sting
In dynamic website's content is stored and served from a database and specific information is stored in specific tables of that database. To display specific content from that table in a web page, the URL of that web page must contain a reference to that table. For example, our directory displays its listings from a database, the categories of our directory are stored in a table named 'c', each category has an ID for example c=25 displays listings for Web and Internet related sites. The more query strings in a URL the less likely it will be spidered. For sites with large amount of query strings in its URL’s. See also Mod_Rewrite to convert to static looking URL's
Ranking
Ranking is when you achieve a position in the natural listings in search engine results when you search for a keyword you are targeting. When you search for a keyword, it will return, for example, 1 – 10 of about 15,400,000, out of those 15 million results only the first 1,000 results are available.
Reciprocal Linking
In your link building campaign some webmasters will only link to your site if they receive a link to their site in return. Linking back to a site is reciprocal linking.
Redirects
The use of specific code to redirect the user to another page, this can be done manually using a redirect script or automatically using a meta refresh.
Referrer
A referrer is the URL of the page that the visitor came from when he entered a website.
Relative Link
When writing a hyperlink from one internal page to another, the relative path would be the path to the web page relative to the root directory of the site and would be written like: <a href="../directory/page.html">Link</a> (the two dots are represent the root directory).
Robot
It is also known as a search engine spider whose role is to search the web for documents and changes in existing documents, read the contents of the document and send them to the indexer. Bots are simply pieces of software that automate an data gathering process, whether that content from website's to email address harvesting.
Robots.txt
Robots.txt is a text file which is stored in the root directory and instructs robots what they can and cannot do. For example, you can instruct all robots not to view the contents of your images directory.
ROI
Return on investment (or ROI) is the amount of money made from a marketing or advertising campaign.
RSS
Rich Site Summary / Really Simple Syndication is a medthod to syndicate your site content. This is done by creating an XML document which summarizes specific site content such as news, blog posts or comments and forum threads. An example of RSS/XML syndication would be PR3.co.uk's Blog Feed.
Sandbox Theory/Effect
The general theory behind sandbox is that Google filters new website's for between two and six months to discourage spammers and short term link brokers. The sandbox will filter out all backlinks so a site will on face value appear to be penalized. Some say, myself included, that regardless of the age of a site, the sandbox can be triggered if the site suddenly adds a lot of content and/or backlinks, allintext searches return your site, but regular searches won’t.
Search Engine Optimization
Search engine optimization (or SEO for short) is a set of methodologies aimed at improving the visibility of a website in search engine listings. The aim of search engine optimization is to use various techniques which will enable a website to be placed as high up in the search engine results pages (SERPS) as possible, by developing the on-page and off-page factors that search engines use to create their results. A well optimized website can draw thousands of low cost referrals by achieving a top place in organic results.
Search Query
The search term that is used to find website's in search engines. For example, if I am looking for a used car, I may type ‘used cars Tallahassee’ into Google. In this example, the words ‘used cars Tallahassee’ is the search query.
SEM
SEM stands for Search Engine Marketing.
SEMPO
Search Engine Marketing Professional Organization (SEMPO for short) is a non-profit professional association working to increase awareness and promote the value of Search Engine Marketing worldwide.
SERP(S)
Search Engine Result Page.
Sitemap
A sitemap is a page that lists links to all the different pages in a website, although the general rule of thumb is to keep the number of links on a particular page below 100. If you have more than 100 pages a link to a secondary sitemap of category specific sitemap is advisable.
Slurp
Slurp is the agent name of Yahoo’s search engine spider.
Snippet
The snippet is the description of a web page that is used in a search engine result page to describe its listings. In Yahoo, the snippet is generally the description tag of an HTML document. In other search engines such as Google the snippet is taken from the content of the web page according to what the search query was.
Spam
In terms of SEO, spam is using techniques on pages that do not necessarily adhere to a search engines TOS. Spam can also be referred to as posting links to a site in newsgroups or forums.
Spider
A piece of software designed to crawl a website or crawl the internet. A spider scans the contents of a web page and follows links to other pages to read. A search engine spider is designed to find new documents or changes in existing documents and send them back to the search engine indexer so it can analyze them through its algorithm.
Spider Trap
When a search engine spider is spidering a dynamic site it can often get trapped in an endless loop of code, this may be due to the script a spider is attempting to read requiring some form of input to continue.
Splash Page
An intro page to a website, often includes nothing more than a company logo and strapline or flash animation with a button or text link directing the user to "Enter the Site".
Stemming
Stemming is an algorithm developed to reduce a search query to its stem or root form, in other words, variations of particular words such as past tense and plural and singular usage are taken into account when performing a search, For example, applies, applying & applied matches apply.
Stop Word
Stop words are very common words such as ‘a, the, and & that’ and are filtered out of your search query.
Syntax
The correct use of language when coding a website.
Title Attribute
In a text link a title attribute can be added to the link which is similar to ALT text which describes the link. Correct syntax <a href="http://www.pr3.co.uk/seo/index.php" title=”Title”>Text Link</a>
Title Tag
The title tag is one of the most important HTML tags for on-page SEO. It should be used to describe the web page using targeted keywords using no more that 60 characters (including spaces).
TLD
Top Level Domain. Some free hosts provide you with a domain which is part of their TLD, for example, with a free account with 50megs.com your domain would be “domain.50megs.com” – 50megs.com is the TLD. Other TLD's include .co.uk, .net, .biz etc.
Traffic
The number of visitors a website receives over a given period, generally this number is classed as unique visitors.
TOS
Terms of Service.
Unique Visitor
When someone visits a website, his/her IP address is logged so if he/she returns later on that day, the visit won’t be counted as a unique visit but as a page impression. When that IP is logged it is logged a unique visit.
URL
Uniform resource locator (or URL) is the World Wide Web address of a website.
User Agent
An agent name is the name of the software accessing a web page, e.g. Googlebot/2.1 is one of Google’s spiders or MSIE x.0 is Microsoft Internet Explorer.V
Whois Data
Registration data such as the company name, address and telephone number when registering a TLD domain name.
White Hat SEO
Referrs to ethical practice of SEO methodologies that adhere to search engine TOS.
XML
eXtensible Markup Language
Different types of Search Engine
Automated Search Engines
Alta Vista
HotBot
Lycos
Geographically Specific Resources
YELL
UKIndex
WWLib
Classified Directories
Yahoo
Galaxy
Subject Specific Gateways
SOSIG
OMNI
ADAM
Meta Search Engines
MetaSearch
Dogpile
Some Generalisations...
|
Search Engines |
Classified Directories |
Meta Search Engines |
Geog. Specifiec |
Subject Gateways |
Data |
Robot |
Manually |
Other |
Automated |
Usually |
Indexing |
Full Text |
Key words |
- |
Full Text or |
Usually |
Query |
Key word |
Browsable |
Key word |
Input Field |
Browsable |
What Goes into the Index?
In General...
- Automated Search Engines
- Every word on the page goes into the index.
- Classified Directories
- The key words provided by the person who submits the page to the Directory go into the index.
- The page is listed under the Category chosen by the person submitting the page.
Tips for Getting Noticed (by Automated Search Engines)
- Make sure you have descriptive text containing key words on your home page;
- Try to use key words in the title;
- Use important key words early on in the document;
- Always use ALT text for images;
- Make sure you have used as many potential key words as possible;
- Submit all your key pages individually;
- Make sure all important pages are linked from the home page;
- Resubmit regularly;
- Use META tags, especially when descriptive text is limited for some reason.
What do they do?
- They can be used to control the summary generated from your page by some (alta vista, HotBot, InfoSeek) search engines;
- They help provide key words if your page lacks text for some reason - e.g. when a frameset is being specified or there is extensive use of graphics;
- Can improve relevancy but they do not guarantee that your page will appear higher in the list of results.
What do they look like?
<HTML>
<HEAD>
<TITLE>Search Engines - An Evaluation</TITLE>
<META NAME="DESCRIPTION" CONTENT="Information Resources Discovery on the World Wide Web: An Evaluation of Tools and Methodologies">
<META NAME="KEYWORDS" CONTENT="search engines, classified directories, meta search engines, subject specific gateways, indexing, boolean syntax">
</HEAD>
<BODY>
....
</BODY>
</HTML>
It is thought that the architecture of WWLib-TNG will resemble the following diagram (but this is an early sketch):
There are essentially six components:
- A Spider that automatically retrieves documents from the Web;
- An Archiver that receives Web pages from the Spider, stores a local copy, assigns to it a unique accession number and generates a new metadata file. It also distributes local copies to the Extractor, Classifier and Builder and adds subsequent metadata generated by the Classifier and the Builder to the assigned metadata file;
- An Extractor that analyses pages provided by the Archiver for embedded hyperlinks to other documents. If found, URLs are passed to the Archiver where they are evaluated to check that they are pointing to locations in the UK, before being passed to the Spider;
- A Classifier that analyses pages provided by the Archiver and generates DDC classmarks;
- A Builder that analyses pages provided by the Archiver and outputs metadata which is stored by the Archiver in the document's metadata file and is also used to build the index database that will be used to quickly associate keywords with document accession numbers;
- A Searcher that accepts query strings from the user and uses them to interrogate the index database built by the Builder. It then uses the resulting accession numbers to retrieve the appropriate metadata and local document copies and then uses all this information to generate detailed results, ranked according to relevance to the original query.