Module 5: Big Data

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Volume

- a large amount of data

Variety

- large data diversity

Velocity

- the speed of data processing

Veracity

- verification of data

Value

- what big data can bring to the user

Business Intelligence

Business Intelligence is a procedure of processing the raw data and looking for valuable information for the purpose of improving and better understanding the business. Using BI can help to make fast and accurate business decisions.

Cloud computing

Cloud computing is a term describing computing resources stored and running on remote servers. The resources, including software and data, can be accessed from anywhere by means of the internet.

Data Lake

A data lake is a repository that stores a huge amount of raw data in its original format. While the hierarchical data warehouse stores information in files and folders, a data lake uses a flat architecture to store data. Each item in the repository has a unique identifier and is marked with a set of metadata tags. When a business query appears, the repository can be searched for specific information, and then a smaller, separate set of data can be analyzed to help solve a specific problem.

Artificial Intelligence

Artificial Intelligence is an intelligence presented by machines. It lets them perform tasks normally reserved for humans, such as speech recognition, visual perceptions, decision making or predict some information.

Data Scientist

Big Data Scientist is a person who can take structured and unstructured data points and use his formidable skills in statistics, maths, and programming to organize them. He applies all his analytical power such as contextual understanding, industry knowledge, and understanding of existing assumptions to uncover the hidden solutions for business development.

Biometrics

Biometrics is a technology linked to recognizing people by their physical traits, like face, height, etc. It uses Artificial Intelligence algorithms.

Data Visualization

Data visualization is a proper solution when a quick look at a large amount of information is required. Using graphs, charts, diagrams, etc. allows the user to find interesting patterns or trends in the dataset. It also helps when it comes to validating data. The human eye can notice some unexpected values when they are presented in a graphical way.

SQL

SQL is a language used to manage and query data in relational databases. There are many relational database systems, such as MySQL, PostgreSQL or SQLite, etc. Each of these systems has its own SQL dialect, which slightly differs from others.

Internet of Things

Internet of things, IoT, in short, is a conception of connecting devices, such as house lighting, heating or even fridges to a common network. It allows storing big amounts of data, which can later be used in real-time analytics. This term is also connected with a smart home, a concept of controlling house with phone etc.

Data Warehouse

It is a system that stores data in order to analyze and process it in the future. The source of data can vary, depending on its purpose. Data can be uploaded from the company's CRM systems as well as imported from external files or databases.

Data Mining

It is an analytical process designed to study large data resources in search of regular patterns and systematic interrelationships between variables, and then to evaluate the results by applying the detected patterns to new subsets of data. The final goal of data mining is usually to predict customer behavior, sales volume, the likelihood of customer loss, etc.

Machine Learning

Machine Learning is the ability of computers to use them without programming new skills directly. In practice, this means algorithms that learn from data when processing them and use what they have learned to make decisions. Machine learning is used to exploiting the opportunities hidden in big data.

Neural Network

Neural networks are a series of algorithms that are recognizing relationships in datasets, through a process that is similar to the functionality of the human brain. An important factor of this system is that it can generate the best possible result, without redesigning criteria for the output. Neural networks are very useful in financial areas, for instance, they can be used to forecast stock market prices.

Hadoop

When people think of big data, they immediately think about Hadoop. Hadoop (with its cute elephant logo) is an open-source software framework that consists of what is called a Hadoop Distributed File System (HDFS) and allows for storage, retrieval, and analysis of very large data sets using distributed hardware.


Kaugnay na mga set ng pag-aaral

Fused (Run-on) Sentences and Comma Splices

View Set

CMS Grade 7 Science (6-2) Divisions of the Nervous System (pp. 182-189)

View Set

Ch 32: Management of Patients with Immune Deficiency Disorders

View Set

U.S. History Topic 3 Review TEST

View Set

Chapter 11: Thermal Energy and Heat Study Guide

View Set

Unit Circle (Tan, Cos, Sin of 30,45,60 DEGREES)

View Set

HWST 107 Study Guide #2 - Unit 5

View Set

Texas Real Estate Finance - Chp 8 - Government Loans

View Set