Web Option Topic
Define the term search engine.
A program that searches for and identifies items in a database that correspond to keywords or characters specified by the user, used especially for finding particular sites on the World Wide Web. A search engine is really a general class of programs, however, the term is often used to specifically describe systems like Google, Bing and Yahoo! Search that enable users to search for documents on the World Wide Web.
Evaluate methods of searching for information on the web.
A search engine in general is defined as an information retrieval system. As such it can be used for navigational search or for research. These differ in that the user usually knows what document he/she is looking for, while research is focused on finding new, previously unknown documents. Apart from this difference, search engines can differ in their scope and target audience. For instance web search engines such as Google, Bing, Yahoo! or DuckDuckGo aim at searching the entire publicly accessible web (World Wide Web). Other search engines may limit their scope. For instance, Google Scholar searches for academic papers only, while Cloud Kite is a search engine for searching cloud documents from Google Drive. The YouTube search bar can be considered a search engine for videos from the same platform only, while the similar concept applies to online shopping websites.
Describe the purpose of a URL
A URL is a URI that identifies a resource and also provides the means of locating the resource by describing the way to access it. A Uniform Resource Locator (URL), commonly informally termed a web address is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. A URL implies the means to access an indicated resource, which is not true of every URI. URLs occur most commonly to reference web pages (http), but are also used for file transfer (ftp), email (mailto), database access (JDBC), and many other applications
Explain the functions of a browser.
A software tool for retrieving, presenting, and traversing information resources on the web.
Define the term semantic web
An idea from the World Wide Web inventor Tim Berners-Lee that the Web as a whole can be made more intelligent and perhaps even intuitive about how to serve a user's needs. Although search engines index much of the Web's content, they have little ability to select the pages that a user really wants or needs. • The semantic web links people, articles, books, songs and other information together in meaningful ways. • When enough meaningful links and relationships are made context is created. Therefore, when a person searches a word, the context around that word is what gives it meaning. The semantic web gives all these relationships. It can start helping you interact with the things your interested in once they understand what you're interested in.
Discuss the management of issues such as copyright and intellectual property on the web.
As information becomes publicly available on the web, it is important to specify how this information can be used. The most standard way to protect intellectual property is through specifying copyright. However, as sharing information becomes more important, there are various licenses available. Creative Commons gives freedom to share, adapt, and even use commercially information. Has different redistributions and some may allow usage without crediting, but may not indicate it is their own intellectual property.
Discuss the use of white hat and black hat search engine optimization.
Black hat: • Uses aggressive SEO strategies that exploit search engines rather than focusing on human audience - short term return. • Include usage of: o Blog spamming o Link farms o Hidden text o Keyword stuffing o Parasite hosting o Cloaking Keyword stuffing: •Overuse of keywords •The reason search engines don't rely on meta tags anymore. •Not really effective anymore, because most search engines don't use meta tags anymore. Link farming: •A group of website that all hyperlink to every other site. •Usually done by some program. Hidden texts and links •Text that can't be seen by the end user, but can be found by the search engine. •Considered search spam. •Usually identified by search engines as search spam. Blog comment spamming: Automated posting of hyperlinks for promotion on any kind of publicly accessible online discussion board. Most advanced discussion boards allow to report spam, which decreases effectiveness. Content Automation: • Process of creating content of the website in a automatic matter by using a tool or script. • It means the content of the website will be automatically generate using a tool and published on the website. Advantages: - Website will become a very large (As per content, Not ranking of Website) at less time. - Effort will be less, as the content is generated automatically. - Sudden growth in traffic Disadvantages: - Sudden dropdown of the website might occur. - Violates the search engine guidelines - Website may be banned or blacklisted from the search engine Scraping: •Copies content from popular websites. •Often to get more visits and sell advertisements Paid Links: Paying for links on other sites to receive more visits Doorway pages: • Doorway pages are simple HTML pages that are fully optimized for search engines. • Doorway pages target specific keywords or phrases for search engines, but not for users. • When users visit the page, the page automatically uses JavaScript or Meta refresh property to redirect visitors to another page Cloaking: Presenting different content to web spiders than to users, by delivering content based on IP addresses. White Hat: Guest Blogging: •The process of writing a blog post for someone else's blog is called guest blogging •Increases backlinks to guest blogger's site and search engine rankings •Must be a lot of Guest blogging to get lots of backlinks •And also highly depend on "authoritative" blog Link Baiting: •Link baiting to encourage people to click on a link, usually done by writing sensational or controversial content or title. •It is highly inciting to read content that is sensational or controversial •But it does not increase ranking directly, only through the viewership it might get a higher ranking; and not anyone falls for "link-baity" content Quality Content: •Search engines evaluate the content of a web page, thus a web page might get higher ranking with more information. •This will make it more valuable on the index and other web pages might link to your web page if it has a high standard in content. •It is time consuming to create good content -But in long run it will be worth the effort Site optimization: •Through manipulation of content wording and site structure; tweaking content, and meta tags maximizes search engine efficiency. robots.txt : •Getting indexed by crawler and prevent duplication of content preventing algorithm to index it as redundant information •Prevents crawler from indexing irrelevant or redundant information
Describe how web pages can be connected to underlying data sources
Connection strings is a string that specifies about a data source and the means to connect to it. Commonly used for database connection. Client-side Cookies: Cookies are small files stored on a user's computer. They hold data specific to a website or client and can be accessed by either the web server or the client computer. Cookies contain data values such as first-name and last-name. Once the server or client computer have read the cookie through their respective codes, the data in the cookie can be retrieved and used for a website page. Cookies are created usually when a new web page is loaded. Disabling cookies on your computer will abort the writing operation that creates cookies. However, some sites require cookies in order to function. Cookies are used to transport information from one session on a website to another. They eliminate the use of server machines with huge amounts of data storage, since cookies are more efficient and smaller. Server-side Databases: A database is an organized collection of data, which allows to retrieve specific data easily based on queries. Data are usually organized in a way that allows the application to find data easily. There is different logical models of how to organize data in a database, e.g. relational models, object models, navigational models and more. A database is accessed (in order to retrieve data, update them, administration) through a database management system (DBMS), such as for example MySQL, PostgreSQL, MongoDB, etc. . These systems usually differ in the database model that they use. The DBMS usually provides some sort of library through which scripts in various languages (e.g. PHP, JavaScript, ASP.NET) can make queries and read or manipulate data. XML: XML is a flexible way to structure data and can therefore be used to store data in files or to transport data. It allows data to be easily manipulates, exported, or imported. This way, websites can also be designed independent from the data content. An example use of XML are RSS feeds, where it's used to store data about a feed.
Distinguish between lossless and lossy compression.
Data compression reduces the amount of space needed to store files. Therefore, by compressing files more data can be stored on a single disk or sent across a bandwidth. Lossless: the data can be retrieved without losing any of the original information Example: Text compression: • Keyword encoding • Run-length encoding • Huffman encoding Lossy: Some information is lost in the process of compaction Example: JPEG compression and MP3 compression
Identify the characteristics of: • internet protocol (IP) • transmission control protocol (TCP) • file transfer protocol (FTP).
TCP and IP together comprise a suite of protocols that carry out the basic functionality the web. Internet Protocol (IP)[edit] IP is a network protocol that defines routing to addresses of the data packets.[1] Every computer holds a unique IP address and IP ensures the process of getting all data to the destination. Transmission Control Protocol (TCP)[edit] Information sent over the internet is broken into "packets" and sent through different routes to reach a destination. TCP creates data packets, puts them back together in the correct order, and checks that no packets were lost. File Transfer Protocol (FTP)[edit] FTP is the protocol that provides the methods for sharing or copying files over a network. It is primarily utilized for uploading files to a web site and certain downloading sites may utilize an FTP server. However, HTTP is more common for downloading. When using FTP, the URL will reflect as such with ftp:.
Distinguish between the text-web and the multimedia-web
Text-web refers to all the text based web pages, like Wikipedia. Multimedia refers to pages that use pictures, videos and sound, like YouTube. It is much easier for Google to index the text web than it is to index the multimedia web. More sophisticated tools and techniques that link and interpret the meaning of multimedia pages will make the semantic web much more powerful than Google searches.
Identify the characteristics of the following: • uniform resource identifier (URI) • URL.
A URL is a URI that identifies a resource and also provides the means of locating the resource by describing the way to access it A URL is a URI A URI is not necessarily a URL
Explain why distributed systems may act as a catalyst to a greater decentralization of the web.
A distributed system is a software system in which components located on networked computers which communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal and thus this causes to have everything on other computers and not to make a computer 'boss' which is a head as all of them are on the same level. Distributed systems consist of many different nodes that interact with each other. For this reason they are decentralized by design, which you can see in this comparison. Therefore, the importance of distributed systems for a decentralized web lies in their benefits and disadvantages compared to classic centralized client-server models. Benefits higher fault tolerance stability scalability privacy data portability is more likely independence from large corporations such as Facebook, Google, Apple or Microsoft potential for high performance systems Disadvantages more difficult to maintain harder to develop and implement increased need for security Personal conclusion While some decentralized systems such as Bitcoins are gaining traction and some other systems like Git or Bittorrent have been around for a good time already, most part of the internet is still centralized, as most web applications follow the client-server model, which is further encouraged by corporations wanting to make profit. I found this post from Brewster Kahle's Blog on the topic very interesting.
Explain why the effectiveness of a search engine is determined by the assumptions made when developing it.
A search engine will return results based on the algorithms and parameters used when being developed. These algorithms and parameters are based on assumptions and therefore a search engine can only be effective as long as these assumptions are met. While assumptions can come close to reality, users search in different ways and therefore it can be hard to make universal assumptions
Discuss the relationship between data in a meta-tag and how it is accessed by a web crawler.
A special HTML tag that provides information about a Web page. Unlike normal HTML tags, meta tags do not affect how the page is displayed. Instead, they provide information such as who created the page, how often it is updated, what the page is about, and which keywords represent the page's content. Many search engines use this information when building their indices. Answer depends on different crawlers, but generally speaking: The title tag, not strictly a meta-tag, is what is shown in the results, through the indexer The description meta-tag provides the indexer with a short description of the page The keywords meta-tag provides...well keywords about your page While meta-tags used to play a role in ranking, this has been overused by many pages and therefore meta-tags are not considered by most search engines anymore. Crawlers now mostly use meta-tags to compare keywords and description to the content of the page to give it a certain weight. For this reason while meta-tags do not play the big role it used to, it's still important to include them.
Distinguish between an Ontology and Folksonomy
A system for classifying and organizing information. That organization must follow the rules and systems required by the ontology, and is probably performed by "professional" workers, like web-developers. Made of classes and classes are related to other classes (e.g. Person, mother, teacher, learning, student, child). Classes may have constraints - such as gender, or the fact that a student can only be one person. Classes can be separated into individuals or instances. Assign meaning and relationships.• An ontology is an explicit formal specification of a shared conceptualization A folksonomy is a more informal ontology that has evolved through the use of tags posted by ordinary users. A folksonomy may involve specific tools, like "LIKE" buttons and tags, but without specific rules or systems: Then normal "folks" take care of applying tags to web-sites, without following any rules.
Distinguish between ambient intelligence and collective intelligence.
Ambient intelligence Ambient intelligence is closely related to ubiquitous computing (see C.3.) and is based on the idea of a computing being integrated unobtrusively in the environment, providing intelligent services as people need it. Characteristics commonly attributed to ambient intelligence include (Zelkha et al. 1998; Aarts, Harwig & Schuurmans 2001): embedded: integration of devices into the environment context aware: personal identification and situational needs of the user personalized: different responses according to personal interests/preferences adaptive: they can change in response to you anticipatory: providing information when adequate Probably the most common application of ambient intelligence would be the concept of a smart home, including automation and smart assistants such as Amazon Echo. The Internet of Things provides a potential platform for making ambient intelligence possible. Collective intelligence Collective intelligence is the intellectual outcome of a group of people working together, which could range from families to international companies. The internet plays an important role as it can connect people that wouldn't have done so otherwise. The fact that people can share information on their websites, so that others can find it through search engines like Google greatly contributes to collective intelligence. Wikipedia is another example of how the internet can bring people together to create high-quality intellectual content. In other areas of knowledge there exist different tools that improve collaboration in a similar way. More examples include: open source software platforms like GitHub; facilitate the creation of open source software like Linux Splice, an application that allows musicians to collaborate and share ideas easily photo sharing platforms cloud application
Outline future challenges to search engines as the web continues to grow.
As the web grows It becomes harder to filter out the most relevant information, and paid results (ads) play an important role. Some data become more semantic as well and search engines will need to adapt to this. Since the number of webpages and the number of authors increase rapidly, it is getting more and more important for search engines to filter the information the user wants. Due to the larger amount of data in the world wide web, the crawlers have to be designed more efficiently
Describe the function of the common gateway interface (CGI).
CGI is a standard way for web servers to interface executable programs installed on a server that generate web pages dynamically. It is the specification which allows web pages to be connected to the servers database. This is a standard protocol for web servers to execute console programs (applications that run from the command line) in order to generate dynamic websites. It implements an interface for the web server (as in the software) to pass on user information, e.g. a query, to the application, which can then process it. This passing of information between the web server and the console application is called the CGI. Thanks to CGI, a variety of programming languages such as Perl, Java, C or C++ can be used, which allow for fast server-side scripting.
C.6.10 Discuss how collective intelligence can be applied to complex issues
Climate change Climate CoLab is a project by the MIT Center For Collective Intelligence where people work with each other to "create, analyze, and select detailed proposals for what to do about climate change". Finance In one paper scientists have analyzed collective trends from Twitter posts in order to try to predict stock market indicators, like Dow Jones, NASDAQ and S&P 500. Emotional outbursts appeared be a good indicator in this early research. Astronomy Galaxy Zoo is a project where people contribute to classify a large number of stars and galaxies. Reddit Place Reddits Place was a project run by Reddit during 3 days, where users of the platform could color a single pixel on a 1,000 by 1,000 pixel canvas every 5 minutes. The result is quite amazing, considering different interests of different users. While this is certainly not a very serious project, nor is it a particularly complex issue, it might be considered a creative case of collective intelligence.
Describe how cloud computing is different from a client-server architecture.
Cloud computing is defined with these essential Characteristics: • On-demand self service • Broad network access • Resource Pooling • Rapid Elasticity • Measured service Cloud computing is hosting on remote servers on the internet to store, manage, and process data rather than on a local server or personal computer. Cloud computing more widely shares resources than in the client-server architecture. Client-server architecture merely refers to the communication between client and server and the distribution of "responsibility".
Discuss whether power laws are appropriate to predict the development of the web.
Degree distribution • Another factor to look at when considering connectivity is the degree distribution of a network. • The degree of a page is the number of connections (links) it has, which can further categorized into incoming and outgoing links. • It is very obvious that the number of pages with a higher degree decreases. The distribution suggests a power law, although it does not exactly fit the model. • Degree of a page is the number of links it has (in/out) • Degree distribution is how many pages there are with an increasing degree
Outline the different components of a web page.
Head contains title and meta tags, metadata. Metadata describe the document itself or associates it with related resources such as scripts and style sheets. Body contains headings, paragraphs and other content. Title defines the title in the browser's toolbar. Meta tags are snippets of text that describe a page's content but don't appear on the page itself, only in the page's code. Helps search engines find relevant websites. body The main part of the page document. This is where all the (visible) content goes in. Some other typical components: Navigation bar Usually a collection of links that helps to navigate the website. Hyperlinks A hyperlink is a reference to another web page. Table Of Contents Might be contained in a sidebar and is used for navigation and orientation within the website. Banner Area at the top of a web page linking to other big topic areas. Sidebar Usually used for a table of contents or navigation bar.
Identify the characteristics of the following: • hypertext transfer protocol (HTTP) • hypertext transfer protocol secure (HTTPS) • hypertext mark-up language (HTML) • uniform resource locator (URL) • extensible mark-up language (XML) • extensible stylesheet language transformations (XSLT) • JavaScript. • cascading style sheet (CSS).
Hypertext Mark-up Language (HTML): HTML is the standard markup language used to make web pages. Characteristics: - Allows for embedded images/objects or scripts -HTML predefined tags structure the document -Tags are marked-up text strings, elements are "complete" tabs, with opening and closing, and attributes modify values of an element - Typically paired with CSS for style Cascading Style Sheet (CSS) -CSS sheets describe how HTML elements are displayed. It can control the layout of several web pages at once. Extensible Mark-Up Language (XML) - XML is a markup specification language that defines rules for encoding documents (to store and transport data) that is both human- and machine- readable. XML, as a metalanguage, supports the creation of custom tags (unlike HTML) using Document Type Definition (DTD) files which define the tags. XML files are data, not a software. Extensible Stylesheet Language Transformations (XSLT) XSLT is a language for transforming XML documents into other XML documents or other formats such as HTML. It creates a new document based on the content of the existing one. Javascript[edit] JavaScript is a dynamic programming language widely utilized to create web resources. Characteristics include: - Server side - Supports object-oriented programming styles - Does not include input/output - Can be used to embed images or documents, create - dynamic forms, animation, slideshows, and validation for forms Also used in games and applications
Describe the interrelationship between privacy, identification and authentication.
Identification: Process that enables recognition of a user described to an automated data processing system. In human terms, client and merchant engage in mutual identification when they -- for example -- tell each other their names over the phone. Authentication: A positive identification with a degree of certainty sufficient for permitting certain rights or privileges to the person or thing positively identified. The act of verifying the claimed identity of an individual, station or originator. In a human contact by phone, the client and merchant might recognize (authenticate) each other by their familiar voices. The classic methods for correlating virtual and physical identities in cyberspace are parallel to methods used for authenticating human beings in the physical world. The four categories of authenticating information are: • What you know -- the password or passphrase, for example; • What you do -- e.g., how one signs one's name or speaks; • What you are -- e.g., one's face or other biometric attributes such as fingerprints; • What you have -- e.g., a token such as a key or a certificate such as a driver's license. Authorization: The granting to a user, program, or process the right of access In the real world, we experience authorization every time a merchant queries our VISA or MasterCard service to see if we are authorized to spend a certain amount of money at their establishment. Piggybacking - the unauthorized use of an existing session by unauthorised personnel. It is quite commonplace for users to initiate a transaction on a terminal or workstation and then to walk away from their unprotected session to go do something else. If a dishonest person sits at their place, it is possible to misuse the absent person's session. A common problem of piggybacking is the misuse of someone else's e-mail program to send fraudulent messages in the absent person's name. Another example might have the thief stepping into a session to change an order or to have goods sent to a different address but be paid for by the session initiator's credit card. Hijacking - allows an attacker to take over an open terminal or login session from a user who has been authenticated by the system. Generally take place on a remote computer, although it is sometimes possible to hijack a connection from a computer on the rote between the remote computer and your local computer. Hijacking occurs when an intruder uses ill-gotten privileges to tap into a system's software that accesses or controls the behavior of the local TCP [Transmission Control Protocol]. A successful hijack enables an attacker to borrow or steal an open connection (say, a telnet session) to a remote host for his own purposes. In the likely event that the genuine user has already [been] authenticated to the remote host, any keystrokes sent by the attacker are received and processed as if typed by the user
Private cloud
In a private cloud model a company owns the data centers that deliver the services to internal users only. Advantages: Scalability Self-provisioning Direct control Changing computer resources on demand Limited access through firewalls improves security Disadvantages: Same high costs for maintenance, staffing, management Additional costs for cloud software
Public cloud
In a public cloud services are provided by a third party and are usually available to the general public over the Internet. Advantages: Easy and inexpensive because the provider covers hardware, application and bandwidth costs Scalability to meet needs No wasted resources Costs calculated by resource consumption only Disadvantages: No control over sensitive data Security risks
Explain why the web may be creating unregulated monopolies.
In theory the world wide web should be a free place where anybody can have a website. However, hosting a website usually comes with a cost - registering a domain name, getting a hosting service or investing in servers oneself, creating and maintaining the website (requires technical knowledge or the cost of hiring a web developer). In addition, to reach an audience further marketing through SEO (see C.2) is usually necessary to get good rankings in search engine results. This means that for the normal individual a traditional website is not the best option. A better alternative is to publish content on an existing platform, e.g. micro blogging on Twitter, blogging on Wordpress or Blogspot, sharing social updates on Facebook, sharing photos on Flickr, etc. . This comes with improved comfort for users. However, it easily leads to unregulated monopolies in the market because users usually stick to one platform. Tim Berners-Lee describes today's social networks as centralized silos, which hold all user information in one place. This can be a problem, as such monopolies usually control a large quantity of personal information which could be misused commercially or stolen by hackers. There are certainly many more concerns which won't fit into the scope of this site. New multinational online oligarchies or monopolies may occur that are not restricted by one country. Innovation can drop if there is a monopoly. There is therefore danger of one social networking site, search engine, browser creating a monopoly limiting innovation. Tim Berners-Lee describes today's social networks as centralized silos, which hold all user information in one place.[9] Web browsers (Microsoft) Cloud computing is dominated by Microsoft. Facebook is dominating social networking. ISPs may favor some content over other. Mobile phone operators blocking competitor sites. Censorship of content.[10] Net Neutrality[edit] A principle idea that Internet Service Providers (ISP) and governments should treat all data and resources on the Internet the same, without discrimination due to user, content, platform, or other characteristics.
Evaluate the use of decompression software in the transfer of information
It can be only used with Lossless compression. It is helpful if you do not have the original file. It might not bring every bit back and some minor details might be missing. Evaluation of lossy compression Significant reduction of file size -> important for file storage, transfer of data over the internet E.g. image files can be reduced to be around 90% smaller before quality degradation is noticeable Most important use is streaming multimedia files and VoIP -> bandwidth is usually limited However, doesn't work with all file types -> text files or binary data cannot be compressed in a lossy way, as the meaning of the data are lost Different things to consider: Compression speed Decompression speed Compression ratio Think about streaming and reducing file size Evaluation of lossless Compression When compressing a file if decompressed will have the same data/Information as the initial file Important when compressing an installation file and programs It is required that the installation files and programs' information is the same in the compressing and decompressing phase No loss in quality in lossless compression in Images and Audio files Larger file sizes than lossy compressions
Text compression
Keyword encoding: • Frequently used words are replaced with a single character • To decompress the document, you reverse the process Limitations: - Keywords cannot be apart of the original text - Separate symbols for "The" and "the" - Most frequently used words are short Run-length encoding: • A sequence of repeated characters is replaced by a flag character, followed by the repeated character, followed by a single digit that indicates how many times the character is repeated. E.g. o AAAAAAA o *A7 Can also be used in simple pictures or faxes (black and white pictures). One reason that it works so well with scanned pages the number of consecutive white pixels is huge. In fact, there will be entire scanned lines that are nothing but white pixels. o Binary 0 = white o Binary 1 = black Huffman encoding: • Dr David Huffman • If we only use a few bits to represent characters that appear often and reserve longer bit strings for characters that don't appear often, the overall size of the document being represented is small
JPEG Compression:
Lossy compression. With a lot of images (especially photographs), there's no need to store the image exactly as it was originally, because it contains way more detail than anyone can see. The pixels which are very similar in colours are blocked together. This is the advantage of JPEG: it removes information in the image that doesn't have so much impact on the perceived quality. Furthermore, with JPEG, you can choose the tradeoff between quality and file size. Limitation: - Edges of photos may become blurry as you zoom in
Define the terms: mobile computing, ubiquitous computing, peer-2-peer network, grid computing.
Mobile computing: - Laptops - Smartphones - Tablets - They are portable computers Common characteristics between mobile computers: - Wireless access to information - Internet Access - Cloud services - Sensors and other data capture - GPS - Accelerometer - Cameras - Wireless communication with other devices (Bluetooth, Near-Field Communication, internet) Ubiquitous computing is the notion that computers are everywhere and can communicate with each other. Requires sensors and wireless communication. Ubiquitous computing is roughly the opposite of virtual reality. Ubiquitous computing forces the computer to live out here in the world with people. Integrate computers with human life. Peer-to-peer network: Every computer/node/peer in a network is both a client and a server. Peers are equally privileged, equipotent participants in the application. Peers make a portion of their resources, such as processing power, disk storage or network bandwidth, directly available to other network participants, without the need for central coordination by servers or stable hosts. Peers are both suppliers and consumers of resources, in contrast to the traditional client-server model in which the consumption and supply of resources is divided. Grid computing: A computer network in which each computer's resources are shared with every other computer in the system. Processing power, memory and data storage are all community resources that authorized users can tap into and leverage for specific tasks. A grid computing system can be as simple as a collection of similar computers running on the same operating system or as complex as inter-networked systems comprised of every computer platform you can think of. •Uses your processor when youre not using it (idle) and the software will run part calculations •Download and install, connect to internet, software will make use of your processor and process its own calculations on them •Grid computing treats CPU power as a shared, collective resource (like electricity)
Explain why there needs to be a balance between expressivity and usability on the semantic web.
One of the aims of the semantic web is to give information more expressive power, so that computer systems can understand the meaning of information and its relation to other pieces of information. In theory this should allow for more powerful applications and something similar to a global database. For this reason, expressivity is the guiding factor to how semantic information will be. However, giving information more expressive power may come at an expense of usability. A language like RDF comes with high expressive power, but requires the user to specify relations to other information and to classify a given piece of information - in this case through triples. This might be feasible and convenient for building shared scientific databases, but for common user interactions, such as in social networks, the process will be counterproductive. It is therefore essential for the semantic web to be easy to use - ideally the markup being similar to natural language. For non-expert user generated content folksonomies probably offer the best usability as they are as easy as typing in related tags. Folksonomies provide a very low level of expressive power, but they still allow systems to suggest similar content to the user and to identify certain trends. For scientific knowledge databases, more expressive power provides users with better searchability and a potentially improved work flow - usability can usually be sacrificed, because users are experts in their fields anyway. A common tool for web masters to structure data on their websites is Schema.org, which is currently supported by major search engines, such as Google, Bing, Yandex and Yahoo! . This tool provides common vocabulary for on-page markup, which helps search-engines to understand the information on a given page and provide richer search results. For instance it could be used to display contact information of a company website right in the web results or to show thumbnail pictures of an article. As not all content needs to be marked up in Schema.org, this allows web masters to only mark up as much as is usable for them, while providing a high degree of expressive power.
Outline the principles of searching algorithms used by search engines.
PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. Hyperlink-Induced Topic Search - The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that it held, but were used as compilations of a broad catalog of information that led users directly to other authoritative pages. In other words, a good hub represented a page that pointed to many other pages, and a good authority represented a page that was linked by many different hubs.
Describe the range of hardware used by distributed networks.
Peer-to-peer: architectures where there is no special machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and as servers. Client-server: architectures where smart clients contact the server for data then format and display it to the users. Input at the client is committed back to the server when it represents a permanent change. Three-tier: architectures that move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are three-tier. This of course depends on the different types of distributed systems, but most generally speaking on a low level multiple CPUs need to be interconnected through some network, while at a higher level processes need to be able to communicate and coordinate. For each approach to distributed system, more specific types of hardware could be used: Mobile computing: wearables (e.g. Fitbit ), smartphones, tablets, laptops, but also transmitters and other hardware involved in cellular networks Ubiquitous computing: embedded devices, IoT devices, mobile computing devices, networking devices Peer-to-peer computing: usually PCs, but can include dedicated servers for coordination Grid computing: PCs and servers Content delivery networks (CDNs) is a system of distributed servers. They can cache content and speed up the delivery of content on a global scale Blockchain technology(e.g. Bitcoin, Ethereum) are decentralized and based on multiple peers, which can be PCs but also server farms Botnets can probably be considered a form of distributed computing as well, consisting of hacked devices, such as routers or PCs
Evaluate the structure of different types of web pages.
Personal pages are pages created by individuals for personal content rather than for affiliations with an organization. Usually informative or entertaining containing information on topics such as personal hobbies or opinions. Blogs or Weblogs is a mechanism allowing for publishing periodic articles on a website. Search Engine Pages or Search Engine Results Page (SERP) display results by a search engine from a query. Forums or online discussion boards usually organized by topics where people can hold conversations through posted messages. Typically has different user groups which define a user's roles and abilities.
Describe the different types of web page.
Personal pages are pages created by individuals for personal content rather than for affiliations with an organization. Usually informative or entertaining containing information on topics such as personal hobbies or opinions. Blogs or Weblogs is a mechanism allowing for publishing periodic articles on a website. Search Engine Pages or Search Engine Results Page (SERP) display results by a search engine from a query. Wiki Characteristics: A website or database developed by many different users Any user can add and edit content Forums or online discussion boards usually organized by topics where people can hold conversations through posted messages. Typically has different user groups which define a user's roles and abilities.
MP3 Audio format compression
Popularity of MP3 came because it offered a stronger compression ratio than other formats available at the time. Employs both lossy and lossless compression. First it analyses the frequency speed and compares it to mathematical models of human psychoacoustics (the study of the interaction between the ear and the brain). Discards information that cant be heard by humans. Finally, bit stream is compressed using a form of Huffman encoding to achieve additional compression
C.6.9 Discuss how ambient intelligence can be used to support people
Pros home care systems for elderly or handicapped people: emergency assistance, autonomy enhancement, comfort real-time shopping: improved experience for consumers, who could buy a good when they see it (e.g. some piece of clothes they see someone else wearing) personal information provides better means for risk assessment for insurance companies: personalized insurance fees Cons privacy concerns: how many data are collected and who controls them? surveillance discrimination: is a personalized insurance fair? Particularly concerning genetic conditions for health insurance risk of automating too much: if computer systems fail, this may have a very large impact reliability, maintainability compatibility between systems dystopian ideas of Brave New World ? It is also important to consider the devices and technologies that are necessary for ambient intelligence, especially regarding different types of sensors that allow the system to act based on context. This may include cameras for facial recognition, RFID tags and sensors for accounting (e.g. smart fridges), but possibly also nanotechnology or biometrics.
Explain the importance of protocols and standards on the web.
Protocols are a set of rules for communication that ensure proper, compatible communication for a certain successful process to take place e.g. TCP/IP. Protocols ensure the universality of the web. Standards, on the other hand, are a set of technical specifications that should be followed to allow for functionality but do not have to be necessarily followed in order to have a successful process to take place e.g. HTML. Without them, it would be like communicating in a foreign language without knowing the foreign language. e.g. without TCP, there would be no transport protocol and packets would be lost. e.g. without HTML, there would be no standard scripting language for displaying webpages and different web browsers may not display all pages
Describe the main features of the web graph such as bowtie structure, strongly connected core (SCC), diameter
SCC: •Strongly connected core from or to which many nodes lead to/from. •Can reach all nodes in OUT •Cannot reach nodes in IN IN-section: •made up of nodes that can reach the SCC •Cannot be reached by the SCC OUT-section: •made up of nodes that can be reached by the SCC •Cannot reach the SCC Tubes: •Nodes not part of the SCC • made up of nodes linking the IN- or the OUT-section. Tendrils: •made up of nodes that are not connected to the SCC •Connected to either the IN- or the OUT-sections Diameter: •Different definitions -Usually the average path length between two random nodes •Usually considering individual parts (e.g. SCC) only, as connectivity between parts is limited (I.e. nodes in IN can usually not be reached from OUT)
Describe the different metrics used by search engines
Search Engine Share of Referring visits: how the web page has been accessed: through direct access, referral pages or search engine results. Can indicate how meaningful traffic is. Search Engine Referral: different search engines have different market shares; knowing which search engine traffic comes from helps to find potential improvements for certain search engines Search terms and phrases: identify the most common search keywords and optimize Conversion rate by search phrase/term: percentage of users that sign up coming from a search term Number of sites receiving traffic from search engines: As large websites have many pages, it is important to see if individual sites are being accessed through search engines Time taken: time spent by a user on a page after access through the search engine. Indicator for how relevant the page is and what resources were accessed Number of hits: a page hit is when a page is downloaded. This is a counter of the visitors of the page and gives a rough idea of the traffic to the page Quality of returns: quality of how a site gets placed in a return. Say how high it is ranked by search engines. Quantity of returns: how many pages are indexed by a search engine Relevance: Is determined by different programs like PageRank etc. which evaluate and determine the quality of web sites and put them high on the Index The bigger the index the more pages the search engine can return that have relevance to each query User experience: Search engines look to find the "best" results for the searcher and part of this is the user experience a site provides. This includes ease of use, navigation; direct and relevant information; professional, modern and compatible design; high-quality, legitimate and credible content
Suggest how web developers can create pages that appear more prominently in search engine results.
Search engine optimization: • Designing, writing, coding and programming your entire website so that there is a good chance that your web pages will appear at the top of search engine queries for your selected keywords and key phrases. To optimize: - Keyword research - what people are searching - placing keywords strategically throughout the site - A user-friendly site navigation scheme - URL structure that search engines can crawl - Objective third party link development - Keyword in title tags - Rich Header tags - Documents rich with relevant text information - Keyword in alt tag - Keywords in internal links pointing to the page • Keywords in the domain and/or URL • The order key words appear • Domain Strength • Domain Speed • Local/specific domain name • Quality Hosting • When the Domain was registered • Strength of the links pointing to the domain • The topical neighborhood of domain based on the in bound and out bound links • Historical use and links pattern to domain • Inbound links and referrals from social media • Inbound Link Score Negative Penalties (for SEO purposes) If you are using the social media for the sole purpose of boosting your rankings, Google might penalise your website. Google will not penalise your website if you are using these channels for normal activities. • Forums • Facebook • Twitter • Flickr • Directory Themes • Article Submissions • Blogs However, avoid: • Too much 1 to 1 Link Exchange • Paid Links • FFA's (Free for all back links)
Outline the purpose of web-indexing in search engines.
Search engines index websites in order to respond to search queries with relevant information as quick as possible. For this reason, it stores information about indexed web pages, e.g. keyword, title or descriptions, in its database. This way search engines can quickly identify pages relevant to a search query. Indexing has the additional purpose of giving a page a certain weight, as described in the search algorithms. This way search results can be ranked, after being indexed. Indexing allows for speedy searching and to provide high relevancy. Web-crawlers retrieve copies of each web page visited. Each page is inspected to determine its ranking for specific search terms. Spiders search google's index of the web. They fetch a few web pages then follow any links on those webpages, and then links on those pages and so on. When someone searches for something google's software searches the index of the web to find those than include the searched terms. To narrow down the results over 200 questions are asked about the page - like how many times the page contains the keyword, do they appear in title or url, quality of page, pagerank.
Evaluate the use of client-side scripting and server-side scripting in web pages.
Server-side scripting runs on server, requires a request sent and return data. More secure for client. Includes PHP, JSP, and ASP.NET. Client-side scripting runs script on client's side. Can pose security risk to client, but faster. Includes JavaScript and JSON. Server-side Website logic that runs on the server. Common tasks include the processing of search queries, data retrieval from a database and various data manipulation tasks. A good example are online-shops, where items are displayed based on a search query. Once the user decides to buy an item, server-side scripts check user credentials and make sure that the shop receives the order. Technologies include: CGI Direct execution(e.g. ASP, PHP) Client-side This is scripting that happens in the browser of the client. It is used for animations, form validation and also to retrieve new data without reloading the page, e.g. in a live-chat. Some technologies: JavaScript AJAX JQuery
Explain that search engines and web crawling use the web graph to access information.
Simple algorithm In its basic form the algorithm will start from a seed (a number of pages from which to start) and put all the URLs in a queue. It will then loop through the queue until it is empty, each time dequeuing an URL, requesting its document, indexing this document while also collecting links from it. These links will be added to the queue, if they haven't been visited yet. Adaptive crawling A more advanced crawler algorithm will prioritize on what to crawl and adapt the queue live so that more relevant information is indexed first. There is an additional stage in the algorithm, where the document is analyzed for relevance, so that the queue is reorganized accordingly
Discuss the use of parallel web crawling.
Size of the web grows, increasing the time it would take to download pages To make this reasonable "it becomes imperative to parallelize the crawling process (Stanford) Advantages Scalability: as the web grows a single process can not handle everything Multithreaded processing can solve the problem Network load dispersion: as the web is geographically dispersed, dispersing crawlers disperses the network load Network load reduction Issues of parallel web crawling Overlapping: parallel web crawlers might index the same page multiple times Quality: If a crawler wants to download 'important' pages first, this might not work in a parallel process Communication bandwidth: parallel crawlers need to communicate for the former reasons, which for many processes might take significant communication bandwidth If parallel crawlers request the same page frequently over a short time it will overload servers
Explain the role of graph theory in determining the connectivity of the web.
Small world graph • This is a mathematical graph whereas not all nodes are directly neighbors, but any given pair of nodes can be reached by a small number of hops or better said with just a few links. This is due to nodes being interconnected through interconnected hubs. • 2 Properties of the small world graph: o Mean shortest-path length will be small • Most pairs of nodes will be connected by at least one short path o many clusters (highly connected subgraphs) o Analogy: airlines flight whereas you can reach any city most likely in just under three flights. o Examples: network of our brain neurons o Maximizes connectivity o Minimizes # of connections 6 degrees of separation • This originates from the idea that any human in the world is related in some way over 6 or less connections (steps). This idea can be taken further in a more general perspective on a graph, whereas any given pair of nodes within the network can be reached with just a maximum of 6 steps. • The idea itself can be applied to the web graph, suggesting high connectivity regardless of big size. o Not necessarily small world graph • High connectivity between all nodes Web diameter (and its importance) • The average distance (as each edge has the same path length, this would be steps) between two random nodes. • This is important because it is an indicator of how quickly one can reach some page from any starting page in average. This is of importance for crawler, which want to index as many pages as possible in the shortest path. o average distance between two random nodes (pages) o important for crawlers to have a guide of how many steps it should take to reach a page o a factor to consider, is if the path is directed or undirected o often there is no direct path between nodes Importance of hubs and authorities (link to C.2.3) Hubs and authorities have special characteristics: • Hubs: have a large number of outgoing links • Authorities: have a large number of incoming links • For connectivity, this means that a larger number of hubs improves connectivity, while authorities are more likely to decrease connectivity as they usually do not link to many other pages.
Explain the differences between a static web page and a dynamic web page.
Static web pages contain the same content on each load of the page, but dynamic web pages' content can change depending on user input. Static websites are faster to develop and cheaper to develop, host, and maintain, but lack the functionality and easy ability to update that dynamic web sites have. Dynamic web pages include e-commerce systems and discussion boards. Static Websites Sites that only rely on the client-side and don't have any server-side programming. This means that all content is available through the HTML, CSS and JavaScript files served to the client. This way the server doesn't do anything else than serving files. The website can still be dynamic in the sense of JavaScript manipulating the available content, e.g. for animations and such. Advantages: Lower cost to implement flexibility Disadvantages: Low scalability Hard to update Higher cost in the long term if content is to be updated Dynamic Websites Sites that include server-side programming as well, usually to retrieve content dynamically from a database. This allows for data processing on the server and allows for much more complex applications. Advantages: information can be retrieved in an organized way from databases allows for content management systems low ongoing cost, unless design changes or extra features are implemented Disadvantages: sites are usually based on templates, which information is fed into less individual sites higher initial cost usually larger codebase
Distinguish between the surface web and the deep web.
Surface web by definition, can be found using normal search engines. Deep Web, by its definition, can not be seen by surface web. The surface web's search bots are based on popularity of backlinks, by the nature of deep web such backlinks will not be found under normal web conditions. Therefore, if there is such a backlink and the surface web can see it, the search result will likely be buried deep in the result stack. Surface web is often websites of companies, people, and bloggers. Like uncle Joe who writes about local county history trying to remember what his uncle told him. The common man's authoritative score is questionable, and there is lots of junk research that is being offered as good. Vetting of sources and citations is very much required and may be difficult. Deep web will have images of court records, census records, maybe archives of old newspapers. The deep web is largely academic databases and government archives which are highly authoritative. Vetting of sources is much easier and quicker than Surface Web. Surface web - pages that can be reached (and indexed) by a search engine - pages that can be (eventually) reached by linking from other sites in the surface web Deep web - pages that cannot be reached (and indexed) by a search engine - considerably larger than the surface web These include: dynamically generated pages (as a result of queries /produced by JavaScript / downloaded from servers using AJAX/Flash) pass-word protected pages pages without any inlinks Only a fraction of the data on the web is accessible by conventional means.DEEP WEB = INVISIBLE WEB = HIDDEN WEB = that portion of World Wide Web content that is not indexed by standard search engines. The deep Web consists of data that you won't locate with a simple Google search. No one really knows how big the deep Web really is, but it's hundreds (or perhaps even thousands) of times bigger that the surface Web. This data isn't necessarily hidden on purpose. It's just hard for current search engine technology to find and make sense of it. The surface Web consists of data that search engines can find and then offer up in response to your queries. But in the same way that only the tip of an iceberg is visible to observers, a traditional search engine sees only a small amount of the information that's available -- a measly 0.03 percent
Distinguish between the internet and World Wide Web (web).
The World Wide Web (WWW) is one set of software services running on the Internet. The WWW represents the actual resources held on the various servers that are linked to the internet. The Internet itself is a global, interconnected network of computing devices and the communication links that allow data to be transferred between server and client. This network supports a wide variety of interactions and communications between its devices. The World Wide Web is a subset of these interactions and supports websites and URIdentifiers
Describe the role of network architecture, protocols and standards in the future development of the web
The future development of the web can only be guessed, but linking to topic C.6 one trend is to make data more meaningful in order to create a semantic web. This will certainly require some new standards, but the ideal is that established network architectures, protocols and standards can still be used. It is therefore important for these to be secure enough, to be extensible and scalable. Scalability is also important as the web grows and cloud applications are deployed on very large scales. As new web applications emerge and more sensitive data are handled, security also plays a very important role. The transition from SSL to TLS is a good example of how some protocols will probably need to be replaced by newer ones - be it because of security flaws or because design restrictions.
Hybrid cloud
The idea of a hybrid cloud is to use the best of both private and public clouds by combining both. Sensitive and critical applications run in a private cloud, while the public cloud is used for applications that require high scalability on demand. As TechTarget explains, the goal of a hybrid cloud is to "create a unified, automated, scalable environment that takes advantage of all that a public cloud infrastructure can provide while still maintaining control over mission-critical data".
Discuss the effects of a decentralized and democratic web.
The term 'Decentralized Web' is being used to refer to a series of technologies that replace or augment current communication protocols, networks, and services and distribute them in a way that is robust against single-actor control or censorship. Benefit: - More control over data: possible improved privacy - Making surveillance harder - avoid censorship - Possible faster speeds issues: - Barrier to usability: difficult for non-technical users to host their content - Less practical sometimes - DNS alternatives necessary for legible domain names - Higher maintenance
Describe the aims of the semantic web
To retrieve a larger variety of information in a more "intelligent" way. To make the web the ultimate (machine-readable) database with the facility to link data across different enterprises. The web should become a highly collaborative medium. Common vocabularies and methods for handling and querying data need to be developed and agreed upon
Compare the major features of: • mobile computing • ubiquitous computing • peer-2-peer network • grid computing.
Ubiquitous computing is being perpetuated by mobile computing. The idea is spreading and manifesting. P2P addresses is more about assuring connectivity and a network of shared resources, while grid network focuses more upon infrastructure. Both deal with the organization of resource sharing within virtual communities. Ubiquitous computing commonly are characterized by multi-device interaction (P2P and grid), but are not necessarily synonymous.
Describe how emergent social structures and folksonomies are changing the web:
Users are most interested in bookmarking things they believe are personally important or interesting. Users are often willing to share their knowledge. It creates more connections between webpages. The content of the web is shaped by the users, which could make it unorderly. Trends shape the web. Smartphones - replace watch - always reachable - share what you are doing, more expressionist tablets - mobile - replace books (printed version) wearables - healthier lifestyle modifications: - hard to live without - always provide information - information overflow - more educated about current affairs
Describe how the web is constantly evolving.
Web 1.0 It is the "readable" phrase of the World Wide Web with flat data. In Web 1.0, there is only limited interaction between sites and web users. Web 1.0 is simply an information portal where users passively receive information without being given the opportunity to post reviews, comments, and feedback. Web 2.0 It is the "writable" phrase of the World Wide Web with interactive data. Unlike Web 1.0, Web 2.0 facilitates interaction between web users and sites, so it allows users to interact more freely with each other. Web 2.0 encourages participation, collaboration, and information sharing. Examples of Web 2.0 applications are Youtube, Wiki, Flickr, Facebook, and so on. Web 3.0 It is the "executable" phrase of Word Wide Web with dynamic applications, interactive services, and "machine-to-machine" interaction. Web 3.0 is a semantic web which refers to the future. In Web 3.0, computers can interpret information like humans and intelligently generate and distribute useful content tailored to the needs of users. One example of Web 3.0 is Tivo, a digital video recorder. Its recording program can search the web and read what it finds to you based on your preferences.
Describe how a web crawler functions.
Web crawlers, also known as web spiders, are internet bots that systematically index websites by going through different links while collecting information about the site. Also copies the site for index. Bot also known as a web robot is a software application that runs automated tasks or scripts over the Internet and can do so at a high rate. Usually repetitive tasks. Web crawlers can be stopped from accessing a page with a robots.txt file through robot exclusion protocol. A web crawler, also known as a web spider, web robot or simply bot, is a program that browses the web in a methodical and automated manner. For each page it finds, a copy is downloaded and indexed. In this process it extracts all links from the given page and then repeats the same process for all found links. This way, it tries to find as many pages as possible. Limitations: They might look at meta data contained in the head of web pages, but this depends on the crawler A crawler might not be able to read pages with dynamic content as they are very simple programs Robots.txt Issue: A crawler consumes resources and a page might not wish to be "crawled". For this reason "robots.txt" files were created, where a page states what should be indexed and what shouldn't. A file that contains components to specify pages on a website that must not be crawled by search engine bots File is placed in root directory of the site The standard for robots.txt is called "Robots Exclusion Protocol" Can be specific to a special web crawler, or apply to all crawlers Not all bots follow this standard (malicious bots, malware) -> "illegal" bots can ignore robots.txt Still considered to be better to include a robots.txt instead of leaving it out It keeps the bots from less "noteworthy" content of a website more time spend on indexing important/relevant content of the website
Discuss how the web has supported new methods of online interaction such as social networking.
Web democratization: The way people access and contribute to the Internet. Many early Web pages were static, with no way for users to add to or interact with the information. In some ways, many companies thought of the Internet as an extension of television -- browsers would look passively at whatever content the Web provided. Other companies had different ideas, though. The Web 2.0 philosophy emphasizes the importance of people's interactions with the Internet. Everyone has an opportunity to contribute to the Web. And, by paying attention to what users are looking for and doing online, a company can provide better service and build customer loyalty. Some Web pages absolutely depend upon user contributions -- without them, there'd be no Web site. Wikis are a good example of this. Users can enter information, modify existing data or even delete entire sections in wikis. Web 2.0 and the increase of dynamic web pages have allowed for user contribution to greatly proliferate and the widespread usage of social networking, blogging, and comment sections.
Outline the difference between the web graph and sub-graphs.
Web graph Web graph describes the directed links between web pages in the WWW. It is a directed graph with directed edges Page A has a hyperlink to Page B, creating a directed edge from Page A to Page B Sub-Graph A set of pages that are part of the internet Can be a set of pages linked to a specific topic ex.: Wikipedia -> one topic but references(and hyperlinks) to other web pages Can be a set of pages that deal with part of an organization
Describe how the web can be represented as a directed graph.
What is a graph? In graph theory, a graph is a set of nodes (also called vertices) that can be connected through edges. Graphs are used to model the relation between objects. Vertex: in a web graph each web page (say URL - Unique Resource Locator) is represented by a vertex Edge: in a web graph each hyperlink is represented by a directed edge Why is the web graph not complete? A complete graph would mean that each vertex is connected with each other. However, not all web pages are hyperlinked to each other, which is why the web graph is not a complete graph.
Describe how a domain name server functions.
When you enter a URL into your Web browser, your DNS server uses its resources to resolve the name into the IP address for the appropriate Web server. The Domain Name System (DNS) is a central part of the Internet, providing a way to match names (a website you're seeking) to numbers (the address for the website). Anything connected to the Internet - laptops, tablets, mobile phones, websites - has an Internet Protocol (IP) address made up of numbers. Your favourite website might have an IP address like 64.202.189.170, but this is obviously not easy to remember. However a domain name such as bestdomainnameever.com is something people can recognize and remember. DNS syncs up domain names with IP addresses enabling humans to use memorable domain names while computers on the Internet can use IP addresses.
Distinguish between interoperability and open standards.
nteroperability can be defined as "the ability of two or more systems or components to exchange information and to use the information that has been exchanged". In order for systems to be able to communicate they need to agree on how to proceed and for this reason standards are necessary. A single company could work on different systems that are interoperable through private standards only known to the company itself. However, for real interoperability between different systems open standards become necessary. Open standards are standards that follow certain open principles. Definitions vary, but the most common principles are: public availability collaborative development, usually through some organization such as the World Wide Web Consortium (W3C) or the IEEE royalty-free voluntary adoption The need for open standards is described well by W3C director and WWW inventor Tim Berners-Lee who said that "the decision to make the Web an open system was necessary for it to be universal. You can't propose that something be a universal space and at the same time keep control of it." Some examples of open standards include: file formats, e.g. HTML, PNG, SVG protocols, e.g. IP, TCP programming languages, e.g. JavaScript(ECMAScript)
C.6.5 Describe how folksonomies and emergent social structures are changing the web.
they can improve search results they can be used to detect trends, as for example in Twitter they can be used to discover new content: similar tags can be identified and be used for content suggestions can be used for a more individual experience, as web services learn what tags a user likes customized advertising by analyzing user preferences and interests through tags
Discuss the effects of the use of cloud computing for specified organizations
•Less costly •Device and location independence •Maintenance is easier •Performance is easily monitored •Security is interesting. Cloud: A wide array of really huge datacenters with thousands of servers and a power load to rival a small city If you're buying cloud services, you don't have to buy a new server, rack, or datacenter to add users to the service. You just add accounts, user by user, resource by resource, and pay for just how much you use. This is great when you don't want a huge amount of capital investment to be tied up in compute-based physical gear.