Data Science Terminology

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Natural Language Processing (NLP)

a field of computer science involved with interactions between computers and human languages.

Extract, Transform and Load (ETL)

a process for populating data in a database and data warehouse.

Event Analytics

a process that shows the series of steps that led to an action.

Data Science

a recent term that has multiple definitions but is generally accepted as a discipline that incorporates statistics, data visualization, computer programming, data mining, machine learning and database engineering to solve complex problems.

Discriminant Analysis

a statistical analysis that takes advantage of known groups or clusters in data to derive the classification rule. It involves cataloguing the data as well as distributing it into groups, classes or categories.

Regression Analysis

a statistical technique for defining the dependency between continuous variables. It assumes a one-way causal effect from one variable to the response of another variable.

Correlation Analysis

a statistical technique for determining a relationship between variables and whether that relationship is negative or positive.

Classification Analysis

a systematic process for obtaining important and relevant information about data using classification algorithms.

Big Data

data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage and process

Data Modeling

development of a graphic representation defining the structure of data for the purpose of communicating the data needed for business processes between functional and technical people or for communicating a plan to develop how data is stored and accessed among application development team members.

Exploratory Analysis

finding patterns within data without standard procedures or methods. It is a means of discovering the data and finding the data set's main characteristics, it constitutes an important part of the data science process.

Topological Data Analysis

focusing on the shape of complex data and identifying clusters and any statistical significance that is present within that data.

Behavioral Analytics

investigates humanized patterns in the data.

Data Engineer

Develops and manages infrastructure that deals with big data. A specialist in Data Wrangling. Well versed with tools such as Hadoop, NoSQL and MapReduce. Sets up data pipelines

EDA

Exploratory Data Analysis - a first step in exploring data without statistical modeling and inference

Data Analyst

Collects data, visualizes data with various tools and looks for patterns and insights. Knows basic statistics and has business/domain knowledge

Data Cleansing

Raw data with missing values, bad delimiters or inconsistent records is repaired to syntactic and semantic correctness

Data Wrangling

Transform raw data into a form suitable for analysis. For example, combining multiple datasets, removing inconsistencies, converting into a specific format

Risk Analysis

the application of statistical methods on one or more datasets to determine the likely risk of a project, action or decision.

Machine Learning (ML)

the field of computer science related to the development and use of algorithms to enable machines to learn from what they are doing and become better over time.

Artificial Intelligence (A.I.)

the field of computer science related to the development of machines and software that are capable of perceiving their environment and taking appropriate action when required (in real-time), even learning from those actions.

Predictive Analysis (Predictive Analytics)

the most valuable analysis within big data as it helps predict what someone is likely to buy, visit or do as well as how someone will behave in the (near) future. It uses a variety of different data sets such as historical, transactional, social, or customer profile data to identify risks and opportunities.

Time Series Analysis

the process of analyzing well-defined data obtained through repeated measurements of time. The data has to be well-defined and measured at successive points in time spaced at identical time intervals.

Root-Cause Analysis

the process of determining the main cause of an event or problem.

Data Mining

the process of finding certain patterns or information from data sets in an automated way. This is one popular way to perform data exploration.

Clustering Analysis

the process of identifying objects that are similar to each other and grouping them in order to understand the differences and the similarities within the data.

Anomaly Detection

the systematic search for data items in a dataset that deviate from a projected pattern or expected behavior.

Business Intelligence

the theories, methodologies and processes to make data, understandable and more actionable.


संबंधित स्टडी सेट्स

CPCU 520 Assignment 1: Overview of Insurance Operations

View Set

Domain 1: Concepts and Terminlogy

View Set

Anatomy Ch. 51: Summarized Neurovascular Structures of the Upper Limb

View Set

NURS1410: Unit II Enteral and Parenteral Nutrition NCLEX style questions

View Set

CIS 330 Net 3 The OSI Model & the TCP Protocol Suite

View Set

History 800B Lesson 12 - Reconstruction

View Set

Econ 201 - The market forces of supply and demand (CH 4)

View Set

Chapter 5 Survey and Observations

View Set