Unit 5 - Big Data Vocabulary

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

***screen scraping

"Copy and pasting" the action of using a computer program to copy data from a website.

***descriptive analytics

Data that is clear data and has no interpretation. Facts. This type of data has a lower level of utility, but an extremely high confidence level as they can be shown as true facts.

***cache

1. To store data locally in order to speed up subsequent retrievals. 2. Reserved areas of memory in every computer that are used to speed up instruction execution, data retrieval and data updating

indexing

A common method for keeping track of data so that it can be accessed quickly. Like an index in a book, it is a list in which each entry contains the name of the item and its location.

***filter bubble

A filter bubble is a state of intellectual isolation that can result from personalized searches when a website algorithm selectively guesses what information a user would like to see based on information about the user, such as location, past click-behavior and search history.

***automated summarization

Automated summarization, in which a computer program produces a shortened version of a selected body of text. "This like when you read SparkNotes for a book. Someone had to actually read the book and produce the shortened version."

***spider bot

A spider is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the Web all have such a program, which is also known as a "crawler" or a "bot."

***regression analysis

Attempts to find a function that models the data with the least error.

usable data

Data that can be understood and used without additional information.

data vs. information

Data are simply facts or figures — bits of information, but not information itself. When data are processed, interpreted, organized, structured or presented so as to make them meaningful or useful, they are called information. Information provides context for data.

data collection

Data collection is the process of gathering and measuring information on targeted variables in an established systematic fashion, which then enables one to answer relevant questions and evaluate outcomes.

***curation of information

Content curation is the process of gathering information relevant to a particular topic or area of interest.

***extraction

Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage (data migration).

data storage

Data storage is the recording of information in a storage medium.

***crowdsourcing

Ex: GoFundMe The practice of obtaining information or input into a task or project by enlisting the services of a large number of people, either paid or unpaid, typically via the Internet.

***big data

Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.

***generation loss

Generation loss is the loss of quality between subsequent copies or transcodes of data. Anything that reduces the quality of the representation when copying, and would cause further reduction in quality on making a copy of the copy, can be considered a form of generation loss. File size increases are a common result of generation loss, as the introduction of artifacts may actually increase the entropy of the data through each generation.

***predictive analytics

More useful than descriptive analytics, but a lower confidence level than descriptive as it is a prediction. Uses descriptive analysis' "hard facts" to extrapolate (make inferences) about where unknown data may lie Ex: Given that 90 of the 100 CS graduates were employed within six months in 2011, it is __ % likely that 108 of the 120 CS graduates in 2012 will

***utility

Most of the time, we share our personal digital data because we receive something of value in return, which we call utility. This is give-and-take, and most decisions about divulging personal data are relative to their contexts.

knowledge extraction

Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources.

data persistence

Persistent Data denotes information that is infrequently accessed and not likely to be modified.

privacy concerns

Privacy is the ability of an individual or group to seclude themselves, or information about themselves, and thereby express themselves selectively.

collaboration

The action of working with someone to produce or create something

human computation

The word compute means to calculate or reckon, meaning a computer is simply something that calculates or reckons information. Humans have been computers for thousands of years; evidence of counting dates back to 30,000 B.C.E. Only in the past century have we defined computer as a digital/mechanical device that calculates for us

unstructured data

What it sounds like. Data that is messy and not organized

structured data

What it sounds like. Structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations

data set

a collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer.

***relational database

a database structured to recognize relations among stored items of information.

***concatenation

a series of interconnected things or events

data processing

a series of operations on data, especially by a computer, to retrieve, transform, or classify information.

***prescriptive analytics

compiles predictive hypotheses and recommends a plan of action to maximize the likelihood of something happening. Much higher utility level. confidence is the lowest here, because all of the prediction errors associated with previous analyses are compounded.

analytics

information resulting from the systematic analysis of data or statistics.

***ReCAPTCHA

reCAPTCHA is a CAPTCHA-like system designed to establish that a computer user is human and, at the same time, assist in the digitization of books. "Make sure you are not a robot system when you go to websites"

***citizen science

the collection and analysis of data relating to the natural world by members of the general public, typically as part of a collaborative project with professional scientists.

***visualization

the representation of an object, situation, or set of information as a chart or other image.

storage

the retention of retrievable data on a computer or other electronic system; memory.


Ensembles d'études connexes

Chapter 23- Endocrine Emergencies

View Set

Weather and Climate [WEATHER AND CLIMATE]

View Set

Chemistry Test #1: Chapter 10: Moles, Molar Mass and Volume, Empirical and Molecular Formulas, Percent Composition

View Set

8.3 Describing and Analyzing Data

View Set

Econ 120 Pearson (practiceHW+Quizzes)

View Set

NUR 3420- Pharmacology Exam 3-ATI Questions

View Set

Exercise Physiology: Muscle Strengthening

View Set

lab test unit 3: fungi and protozoa information

View Set