Data Science Chapter 1 to 2

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

SAS Enterprise Miner

Allows users to run predictive and descriptive models based on a large volumes of data across the enterprise

alpine miner

provides a graphical user interface for creating analytic workflows, including data manipulations and a series of analytic events such as staged data-mining techniques

Matlab

provides a high-level language for performing a variety of data analytics, algorithms, and data exploration

scientific method

provides a solid framework for thinking about and deconstructing problems into their principal parts.

business intelligence analyst

provides business domain expertise based on a deep understanding of the data, key performance indicators, key metrics, and business intelligence from a reporting perspective. they generally create dashboards and reports and have knowledge of the data feeds and sources

Business intelligence analyst

provides business domain expertise based on a deeper understanding on the data

SAS

provides integration between SAS and the analytics sandbox via multiple data connectors

data scientist

provides subject matter expertise for analytical techniques, data modeling, and applying valid analytical techniques to given business problems. designs and execute analytical methods and approaches with the data available to the project.

CRISP-DM

provides useful input on ways to frame analytics problems and is popular approach for data mining

database administrator

provisions and configures the database environment to support the analytics needs of the working team. these responsibilitites may include providing access to key databases or tables and ensuring the appropriate security levels are in place related to the data repositories

data conditioning

refers to the process of cleaning data, normalizing datasets, and performing transformations on the data

project sponsor

responsible for the genesis of the project. provides the impetus and requirements for the project and defines the core business problem. this person sets the priorities for the project and clarifies the desired outputs

business user

someone who understands the domain area and usually benefits from the result. this person can consult and advise the project team on the context of the project

reframe business challenges as analytics challenges

specifically, this is a skill to diagnose business problems, consider the core of a given problem, and determine which kinds of candidate analytical methods can be applied to solve it.

medical information

such as genomic sequencing and diagnostic imaging

quantitative skill

such as mathematics or statistics

semi-structured

textual data files with a discernible pattern that enables parsing such as Extensible Markup Language (XML)

data deluge

the result of the prevalence of automatic data collection, electronic instrumentation, and online transactional processing (OLTP). mobile sensors, social media, video surveillance, medical imaging, smart grids, gene sequencing

data users and buyers

these groups directly benefit from the data collected and aggregated by others within the data value of chain

technology and data enabler

this group represents people providing technical expertise to support analytical project, such as provisioning and administrating analytical sandboxes, and managing large-scale architectures that enable widespread analytics within companies and other organizations.

design, implement, and deploy statistical model and data mining techniques on Big Data

this set of activities is mainly what people think about when they consider the role of the Data Scientist.

RDBMS (Relational Database Management System)

this store characteristics of the support calls as typical structured data.

true

true or false: for each gigabyte of new data created, an additional petabyte of data is created about the data.

smart devices

which provide sensor-based collection of information from smart electric grids, smart buildings, and many other public and industry infrastructures.

develop insights that lead to actionable recommendations

it is a critical to note that applying advanced methods to data problems does not necessarily drive new business value.

OpenRefine

it is a free, open source, powerful tool for working messy data

skeptical mind-set and critical thinking

it is important that data scientists can examine their work critically rather than in a one-sided way

Alpine Miner

provide a GUI front end for users to develop analytic workflows and interact with Big Data tools and platforms on the back end

SQL

provide an alternative to in-memory desktop analytical tools

data engineer

leverages deep technical skills to assist with tuning SQL queries for data management and data extraction, and provides support for data ingestion into the analytic sandbox.

data aggregators

marked as make sense of the data collected from the various entities from the "SensorNet"or the "Internet of Things."

(BI vs Data Science)

-BI tends to provide reports, dashboards, and queries on business questions for the current period or in the past. It makes it easy to answers questions related to quarter-to-date revenue, progress toward quarterly targets, and understand how much of a given product was sold in a prior quarter or year. -Data science tends to use disaggregated data in a more forward-looking, exploratory was, focusing on analyzing the present and enabling informed decisions about the future.

communicative and collaborative

must be able to articulate the business value in a clear way and collaboratively work with other groups, including project sponsors and key stakeholders.

data warehouse

Centralized database that stores data from several databases so they can be easily analyzed.

technical aptitude

namely, software engineering, machine learning, and program skills.

applied information economics

provides a framework for measuring intangibles and provides guidance on developing decision models, calibrating expert estimates, and deriving the expected value of information

analytic sandbox

attempts to resolve the conflict for analyst and data sciencticts with EDW and more formally managed corporate data.

business user

benefits from the result

unstructured

no inherent structure

DELTA framework

offers and approach for data analytics projects, including the context of the organization's skills, datasets, and leadership engagement

Quasi-structured

Textual data with erratic formats that can be formatted with effort and software tools

MAD skills

offers input for several of the techniques mentioned that focus on model planning, execution, and key findings

SPSS Modeler

offers methods to explore and analyze data through a GUI

phase 6

operationalize

discovery

phase 1

data preparation

phase 2

model planning

phase 3

model building

phase 4

framing

process of stating the analytics problem to be solved

Octave

a free software programming language for computational modeling, has some of the functionality of Matlab

Enterprise Data Warehouse

are a critical for reporting and BI task and solve many of the orblems that proliferating spreadsheets introduce, such as which of multiple versions of a spreadsheet is correct.

big data

can come in multiple forms, including structured and non-structured data such as financial data, etc.

SQL analysis services

can perform in-database analytics of common data mining functions, involve aggregations, and basic predictive models

hadoop

can perform massively parallel ingest and custom analysis for the we traffic parsing, GPS location analytics, genomic analysis... etc/

data warehouse

centralize data containers in a purpose-built space. supports BI and reporting, but restricts robust analysis.

ETLT

combination of extracting, transforming, and loading data into the sandbox

phase 5

communicate results

structured data

data contain a defined data type, format, and structure

curious and creative

data scientists are passionate about data and finding creative ways to solve problems and portray information

Unstructured data

data that has no inherent structure, which may include text documents, PDFs, images, and video. Common phenomenon that bears closer scrutiny

Analytic Sandbox

enable high-performance computing using in-database processing.

workspaces

enables teams to explore many datasets in a controlled fashion and are not typically used for enterprise-level financial reporting and sales dashboards

project manager

ensures that key milstones and objectives are met on time and at the expected quality

quasi-structured

erratic data formats

semi-structured

files with a discernible pattern that enables parsing.

WEKA

free data mining software package with an analytic workbench

data devices

gather data from multiple location and continuously generate new data about this data

analytic sandbox

gathered from multiple sources

R

has a complete set of modelling capabilities and provides a good environment for building interpretative models with high-quality code. ability to interface with databases via an ODBC connection and execute statistical test....

Data Savvy professional

has less technical depth but has basic knowledge of statistics or machine learning and can define key questions that can be answered using advanced analytics.

data collectors

includes sample entities that collect data from the device and users.

Data Wrangler

interactive tool for data cleaning and transformation.; developed at Standford University and can be used to performed many transformations on a given datasets

Phyton

is a programming lanugage that provides toolkits for machine learning and analysis, such as scikit-learn, numpy,spicy, pandas etc

deep analytical talent

is technically savvy, with strong analytical skills. members possess a combination of skills to handle raw, unstructured data and to apply complex analytical technique at massive scales.


Set pelajaran terkait

Leadership & Management Quiz 1-4

View Set

Table 6-2 Comedogenic Ratings of Botanical Oils

View Set

Geology 1200 Lecture Quiz 4 (Appold)

View Set