Unit 2

Ace your homework & exams now with Quizwiz!

SEO (Search Engine Optimization)

the process of maximizing the number of visitors to a particular website by ensuring that the site appears high on the list of results returned by a search engine.

Index

All the data gathered by the spiders is compiled into an enormous index, continually expanded and updated. When a user enters one or more keywords into the search box on his/her browser, the search engine examines its index and finds matches for the keywords required. Of course, unless the search is for an extremely obscure topic, there would be many matches (there is a game to find a search topic on which Google returns only one result!). But even a search on the "Namibian Economy" today (8/1/2017) yielded 448 000 results, or 'hits', and "Lady Gaga" returned 72 900 000 hits! All these are web addresses, which you could click on, in theory, to see a page relating to Lady Gaga!

Techniques used by Search Engines

>Index >Page Ranking >Page Ranking algorithm for gogle >

Basic Search Strategies

Earlier versions of search engines allowed complex Boolean searches (AND, OR, NOT etc.). Although an 'advanced' search option is still usually available (pushed on to the second page), searches have generally been 'dumbed down' so that the searcher is simply confronted with one large box to type keywords. We rely on the intelligence of the search engine to interpret, parse, paraphrase or correct the spelling of your request, if necessary. Some users do not even bother to type a URL into the browser address field, even if they know it; they simply enter the name of the site (or something like it) into Google which is set as their home page and let Google find it. Nevertheless, mastering some aspects of advanced search can dramatically improve search results. We will start by looking at the basics of the search strategies that you can use to find the information you are looking for, and then how you can refine those searches. We will be using Google to explain these strategies.

Page rank algorithm for google

For Google, the most popular search engine by far, the core algorithm is known; it is called the Page Rank. Incidentally, this algorithm to rank pages was invented by a man with the surname "Page" - Larry Page, the co-founder of Google! The Page Rank algorithm runs iteratively (repeating cycles) over the amount of incoming links: In the first run, pages that have a high number of links to them rank highest. But with only one run this algorithm could easily be tricked: Just create a worthless page on the WWW, and then create thousands of pages (called link farms) containing nothing but links to your page. This trick would make your page immediately rank high in the first run of the Page Rank algorithm, and Information Competence (ICT521S) 4 Namibia University of Science and Technology (NUST) such tricks (called link bombing) have been used extensively in the early days of the World Wide Web. Here comes the improvement of the Page Rank algorithm: In second and further runs, the Page Rank of the pages linking to a page is considered. This makes it important to not only have links to your site, but to have important (i.e., high-ranking) pages link to it. A bogus page containing only links would in itself have a low Page Rank, and the original page would rank lower and lower in successive runs of the Page Rank algorithm

Searching for information

Let's start by looking at what is available on Google immediately when you type your search phrases. We are going to type in "Lady Pohamba Hospital" and you will see the following appear on your Google page: 1. The number of hits (29,400) and the amount of time it took to find them (0.37 seconds) 2. Google information box on the right hand side. 3. Search information categories. For example, if you click on images, you will only see images related to your search phrase. If you click on news, you will see newspaper articles related to your search phrase.

page ranking

Most search engines keep their exact ranking algorithms confidential. This is because all web site publishers, especially commercial ones, would like their pages to appear high or at the top of the search rankings, and if they knew exactly how the ranking algorithms worked, they could manipulate their sites to achieve this. Nevertheless, the general principles are known. These include: Traffic to the site - is it a site which gets a lot of visits, and also recent visits? (Dormant sites will be downgraded in the rankings) Incoming links - do many other web sites link to the site, indicating that it is recognised as being important in the online community? Social popularity - has the site received a large number of Facebook 'likes'?

Using quotation marks to narrow your search

Now you may think it is not needed, but adding quotation marks around the phrase you are looking for narrows the search considerably.

Search Engines

Sophisticated software which indexes billions of web sites with keywords and retrieves them according to search requests.

Meta search engines

There is a difference though with so-called 'consolidated search engines', which act like a search 'insurance broker'. A broker takes your insurance inquiry and farms it out to different insurance companies to get the best policy for you. Similarly, a consolidated search engine takes your search keywords and forwards them to a group of primary search engines. You then get their consolidated reply. Maybe you get a broader set of results from this. Examples are search.com and dogpile.com

Smart Search Engines

What search companies now are talking about are smart engines which 'know' what you are looking for without you having to tell them! This seems to mean that as you begin to type your search, the engine will suggest topics which it has, starting with those letters. This means that as soon as you type "Cat..." results for catalyst, catamaran etc. would appear in a drop down list. If the word or phrase you want appears in the list, you can click on it straight away, which saves a little time, but many people will be irritated at being 'second guessed' in this way. The engine may also remember your search history so that it knows you are more likely to be interested in catalysts than catamarans (you previously searched for information on various chemicals).

How Web Trawlers or Spiders work?

When you upload a new web page or site, sooner or later a spider will visit it and note its contents for the engine's index. How would the algorithm become aware of the new page? There are two possibilities. Either someone links to your page from a web site that is already known to the spider. When that page is due to be visited again by the spider the algorithm will recognise the new link and include it in its pages to be visited. Or, if nobody knows about your page yet, you can manually register it with the search engine so that it starts including it in its spider visits. The spiders must continue their journey around the web endlessly because web sites are continually being updated,

Basic search techniques

You can type in whatever you are looking for on the Google search box. However, there are certain strategies which allow you to perform searches more efficiently. We will now discuss them one by one.

Objectives

describe generally how a search engine works evaluate different search engines use a search engine for simple and advanced information retrieval

How do search engines work?

web search engine operates entirely automatically or algorithmically, without human intervention. It must do, if 30 trillion pages can be searched and a list of results displayed in a fraction of a second Of course, not every one of 30 trillion pages is checked when a search is made. An extremely sophisticated indexing system is created. The precise mechanism depends on the particular search engine, but firstly automated browsers called trawlers or more appropriately since this is the Web, 'spiders', spend their lives visiting web pages The spider will note keywords of the site, collect title and location, its textual content, and metatags (define!), information about the content of the page such as author, creation date, and categories the web page falls into. Information about images on the page can also be recorded, not yet directly, but if the image has a text caption or title. Google index is now at 100 million gigabytes. A good match between the index and search keywords, and repeated matches, to some extent, will reinforce this. In the 'metatag' part of the page code, the page writer can include keywords to 'help' the indexing spider, and this is good if used with discretion


Related study sets

Chapter 16 Sorting, Searching, and Algorithm Analysis

View Set

Chapter 9: Teaching and Counseling

View Set

Combining like terms, distribution, and factoring

View Set