data science terms

¡Supera tus tareas y exámenes ahora con Quizwiz!

Model Steps to solve problem

-Iterative Process: prepare data; build model (from scratch or resource); train model -deploy model, use model

Supervised Learning

-data is labeled and model trained to make correct predictions; part of ML; ex: regression, classification

Random Forest

An algorithm used for regression or classification that uses a collection of tree data structures trees "vote" on the best model

DAX

Data Asset Exchange created by IBM; finding open data sets for enterprise applications for images, video, text, audio etc; normally easier to adapt and license- even has videos on do to do everything

DDL/DML for SQL

Data definition language statements: define, change or drop data Data Manipulation Language statements: read and modify data

DBMS

Database Management System; set of software tools for the data in the database; also RDBMS (regional)- what SQL is

KNN

K Nearest Neighbor algorithm; method for classifying cases based on their similarity to other cases; near each other- "neighbors";

Development enviroments

IDEs; help to implement, execute, test and deploy work

Clustering

Machine Learning technique that involves the grouping of data points

MAX

Model Exchange from IBM; free resource for DL models- ready to use, customizable DL microservice; model serving microservices expose standardized REST API- predicting endpts

SPSS

Statistical Package for the Social Sciences; build predictive models, preform statistical analysis of data, etc

Application Programming Interface (API)

application programming interface was originally understood to be an application specific computing interface exposed by a particular software program or operating system to allow third parties to extend the functionality of that software application beyond its capabilities as they existed out of the box

API

application programming interface; medium between program to software;

TCO

Total cost of ownership; factor considered when purchasing new products and services; identify the cost of a product or service over its lifetime.

commerical software

aka seldom payware- produced for sale or that serves commercial purposes

ANOVA

analysis of variance; stat comparison of groups; either 1) ftest- variation between sample group means divided by variation within sample group pvalue- confidence degree

natural language processing

branch of AI; interaction between computers and humans using natural language

machine learning

brand of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed; development of computer programs that can access data and use it learn for themselves; identifies patterns in data

target attribute

categorical variable with discrete values

Spark

cluster computing framework enabling processing data by computing clusters; computing in parallel

Libraries

collection of functions and methods that enable you to preform a wide variety of actions without writing the code yourself

Table

collection of related things; columns- properties/attributes; create, insert select update delete def from SQL

SQL

communicating within databases; language for a database to query data

open source software

computer software where source code licensed to copyright holder- grants users the rights to study, change, and distribute the software to anyone and for any purpose

Docker

container platform making it easier to build applications and deploy; can be on Github

Model Building

creating a machine or deep learning model using an appropriate algorithm w a lot of data

Unsupervised learning

data not labeled; model tries to identify patterns without external help; learning problems w clustering, anomaly detection (identifying outliers), reinforcement learning (not able to adapt to best route)

array

data structure consisting of a collection of elements; variables assigned to array

Pandas

data structures and tools for data cleaning, manipulation, and analysis; table w columns and rows; scientific computing

Seaborn

data visualization program; used espically for heat maps, time plots etc

Keras

deep learning neutral networks

TensorFlow

deep learning; production and deployment

PyTorch

deep learning; regression, classification

ETL

extract, transform, load; task of data integration and transformation in the classic data warehousing world

Deep Learning

field of machine learning where computers learn to make intelligent decisions on their own; normally involves deeper level of automation vs other algorithms; frameworks: TensorFlow, PyTorch, Keras

dervive

finding rate of change of variable

Framework

focused solutions meant for data scientists who don't know much coding etc

Aggregate Function (SQL)

function where the values of multiple rows are grouped together to form a single summary value

Hardware Vs software

hold in your hand vs telling the computer how to work

Multilinear regression

identify the strength of effect the indep vari have on dep vari; predict if impact of indep varis changes dep varis

Area plot

in matplotlib; area chart or area graph displays graphically quant data; think of line graph with horizontal blocks through it to show quant; stacked by default

Information vs Data Model

info- conceptual level defining relationships; data- concrete w details, blueprint of any database system

float vs integer

integer- no decimal, float- can have; python

Data Pipeline

is a system that captures, organizes, and routes data so that it can be used to gain insights. Raw data contains too many data points that may not be relevant. Data pipeline architecture organizes data events to make reporting, analysis, and using data easier

Classification

learn the relationship between a set of feature variables and a target interest

NumPy

libraries based on N-dimensional arrays; enabling you to preform mathematical functions on these arrays; Pandas: built on top of NumPy for data visualization to communicate findings of analysis;

Matplotlib

library known for data visualization- graphs and plots; Seaborn based on this- for plots

Model deployment

makes the model built available for 3rd part applications

statistical modeling

mathematical model that embodies a set of statistical assumptions concerning the generation of sample data.

Kmeans Clustering

method of vector quant; partition k observations into k clusters in which each observation belongs to the cluster with the nearest mean

Regression

observing model to analyze relationships between variables-how they contribute/related to producing a particular outcome together; predicting continuous variable

Cloud computing

on demand availability of computer system resources, especially data storage and computing power, without the direct active management by the user; good for scalability, access anywhere, disaster recovery

Data Visualization

part of initial data exploration process and can be used as final deliverable

Jupiter Notebooks

perform data cleaning, pre-processing, and exploratory analysis

Community Data License Agreement

permission to use and modify data

Microservice

pre trained DL model, code that preprocesses the input before analyzed by the model and code that post processes model output, standardized public API- making high availability; model-serving microservices expose standardized REST API

sequence mining

predicting the next event ex: click-stream in websites

Data Manipulation

process of changing data to make it easier to read or be more organized

Data Modeling

process of creating a data model for the data to be stored in a Database. This data model is a conceptual representation of Data objects, the associations between different data objects and the rules

Database

repository for data including the modification, addition, and querying; relational database forms relationships between tables; application (ex python)-> sql-> database instance

REST API

representational state transfer application programming interface; medium between client to resource; file to web service to client

querying

request for data or information from a database table or combination of tables. This data may be generated as results returned by Structured Query Language (SQL) or as pictorials, graphs or complex results, e.g., trend analyses from data-mining tools

Scikit-Learn

stat modeling including regression, classification, clustering; built on NumPy, SciPy, and matplotlib; machine/deep learning

Regression Analysis

statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables

3 Types of Machine Learning Models

supervised learning, regression, classification

pivot table

table of stats that summarizes the data of a more extensive table

artifical intelligence (AI)

the creation of a machine to mimic cognitive human intelligence

Execution Environments

tools where data processing, model training, and deployment take place


Conjuntos de estudio relacionados

Chapter 36 Management of Patients with Musculoskeletal Disorders

View Set

Foundations In Business Final (Exam 3) TCU Ackall

View Set

STUDY NOTES FOR EXPLORING INNOVATION

View Set

Chapter 5 MCQs and short answers

View Set

sherpath EAQ- cancer- MOC Exam 4

View Set

NTR315 Exam 3 Osteoporosis/ Osteopenia

View Set