INFS 394 Final

¡Supera tus tareas y exámenes ahora con Quizwiz!

URL

A Unifrom Resource Locator is a string of characters that is an address of an object that is retrievable using protocols deployed on the World Wide Web

Style

A combination of settings, including colors and linestyles, that has been stored in a file (style sheet)

pyplot

A matplotlib module that provides a MATLAB-like interface for simple plotting

Request

A message formatted in a specific way to obtain information from or communicate with a server

Numpy

A python library with specific defined objects, uses compiled libraries (written in coding languages like c, 4chan) to perform operations on ndarray objects

Figure-level function

A seaborn plotting function that sets up a matplotlib figure gontaining plot(s) and makes it easy to spread out a visualization across multiple axes

Array functions

Create numpy arrays, usually from a list or other sequence - all data must be of same data type

HTML

Hypertext Markup Language is the core language of the World Wide Web. Web browsers use this language to display content

HTTP

Hypertext Transfer Protocol is an application-level protocol for distribute, collaborative, and hypermedia information systems

JSON

JavaScript Object Notation is a data interchange standard built on two structures, one being a collection of name/value pairs and the second being an ordered list of values

ndarray objects

N-dimensional array objects, enable operations to perform faster than if written in Python itself

ndarray objects

NumPy multidimensional (N-dimensional) array objects that have operations performed using compiled libraries

Numpy is short for:

Numerical Python Library

Pandas built on ____ Package

Numpy

Pandas Series

Object containing an array of data (numpy ndarray) and an index (can be array of associated labels ala dictionaries); efficient through pre-compiled operations able to be performed on Pandas Series

Pandas Dataframe

Object containing two-dimensional tabular format, can have an index to reference rows, but also has column names which are often created by loading data in

columns

Pandas Feature - modifying df structure; attribute to view or change column names

index

Pandas Feature - modifying df structure; attribute to view or change index (row headings)

.set_index(columnvals)

Pandas Feature - modifying df structure; method to create index use 1+ column

.drop()

Pandas Feature - modifying df structure; used to remove a column or row

.rename()

Pandas Feature - modifying df structure; used to rename specific columns

read_csv(filename)

Pandas Function, reads in data in a csv (Comma-separated values) file and creates a data frame using the first row (of header info) to name columns

read_json(filename)

Pandas Function, reads in data in a json (javascript object notation) string, and creates a data frame to field names to name columns

zip

Python built-in sequence function that pairs up elements of several lists, tuples, or other sequences to join them together

.loc[]

Reporting information from a df; method to locate records that match specific criteria (i.e., df.loc[index<5000].describe)

.astype()

Reporting information from a df; method used to convert/change data type

df.column.describe()

Reporting information from a df; only reports for a specified column

df.describe()

Reporting information from a df; reports descriptive statistics for all numeric columns or if any strings in columns, for those columns

.describe

Reporting information from a df; reports descriptive statistics, can limit pieces being reported

.count

Reporting information from a df; reports the number of records (rows), largest value, smallest value

dataframe.head()

Reporting information from a df; returns the first 5 records

.dtype

Reporting information from a df; the datatype property

df["column"].dtype

Reporting information from a df; will report the data type of ndarray

type(df["column"])

Reporting information from a df; will report the type of object of the column

df[["column1", "column5"]].describe()

Reporting information from a df; will show descriptive statistics for a specified list of columns

REST

Representation State Transfer is an architecture that enables users to read data from web applications

Cleaning and Preparing Data: Reducing Fields

Save a list of columns in the data frame by specifying a list of columns for the index of data frame through dropping specific columns or: subset_df = df[[column list]]

Website's Terms of Services

Specifies the rules and conditions of the use of a website

NaN

a NumPy value representing "Not a Number", equivalent to nulls or NAs in other languages

series

a Pandas data structure, which is an object that contains a NumPy ndarray and an index

BeautifulSoup

a Python package for obtaining data from HTML and XML files

seaborn

a Python package for statistical data visualization based on matplotlib

statsmodels

a Python package that has classes and functions for the estimation of many different statistical models and for conducting statistical tests and statistical data exploration

matplotlib

a Python package used to produce publication-quality figures on many platforms

SciPy

a Python package with a stats module that has over 80 statistical functions and numerous other features, such as functions that can generate variables that follow different distributions

Server

a computing device that provides shared access to files

DataFrame

a data structure provided by the Pandas library that has a two-dimensional tabular format (organized into rows and columns)

Endpoint

a destination URL that, when accessed, will return data. This could be HTML code or structured data in a format such as JSON

NumPy

a foundational package for scientific computing in Python. Numeric Python Library (NumPy) allows access to multidimensional array manipulation that uses libraries written in C and Fortran

Parser

a function or program that breaks data into smaller parts

data visualization

a graphical or pictorial representation of information that we create to help perceive or recognize meaning in data and to communicate that information to others

mplot3d

a matplotlib module that we can use to create 3D plots

urllib package

a package that comes with Python that has several modules for working with URLs

MATLAB

a programming language designed to operate primariliy on whole matrices and arrays

sodapy

a python package for obtaining data from the Socrata Open Data API

Web query

a query applied to a URL that requests specific data

axes-level function

a seaborn plotting function that draws onto a single matplotlib axes and does not otherwise affect the rest of the figure

logistic regression

a technique through which a binary categorical {0,1} variable, representing a dichotomy within the population, is related to a set of explanatory independent variables

Web scraping

a term that refers to the extraction of data from webpages

API key

a unique identifier used to enable communication with an API, frequently for billing purposes

API

an application programming interface (API) is a defined structure used to enable communication between two applications to access information such as web content

seaborn heatmap function

an axes-level function that plots rectangular data as a color-encoded matrix

seaborn countplot function

an axes-level function that shows the counts of observations in each categorical bin using bars

dtype

array function, tells you the data type of array

ndin

array function, tells you the number of dimensions of the array

seaborn distplot function

charts the distribution of values for a variable

R-squared

coefficient of determination, measure of fitness of a model. the variation in the dependent variable explained by the variation in the independent variable(s)

Pandas crosstab function

computes a cross-tabulation of two or more factors. By default, computes a frequency table of the factors

Outliers

data points with unrealistic values that might inappropriately skew calculations or data points that are simply wrong

descriptive statistics

division of statistics that measures characteristics of data of interest, including measures of location and measures of spread

interferential statistics

division of statistics that uses samples of data to reach conclusions about parameters of interest in the population

HTML table

element of webpages used to represent elements of data with more than one dimension

Scatterplots

graphs that present the relationship between the two interval-ratio variables

kwargs

keyword arguments, selectively used by specifying the name of the argument and what value it is to have

seaborn FacetGrid function

maps a data set onto multiple axes arrayed in a grid of rows and columns that correspond to levels of variables in the data set

matplotlib axes function

matplotlib function in the pyplot module that is used to create a figure object. A matplotlib figure can have numerous axes, each containing a different plot

Series .reset_index() method

moves the index values into a DataFrame as a column

NaN

object representing Not a Number (missing value), helps identify something that is missing

pip

package manager, common way of installing numpy

Parsing

parsing is the act of systematically going through data, analyzing data character by character, or searching patterns, keywords, or other elements

Kernel data estimate

plots the shape of a distribution encoding the density of observations on one axis along with the height along the other axis

DataFrame .pivot() method

reshapes a DataFrame based on column values. Raises a ValueError if there are any duplicate values in the column used in the pivot

.head(n), .tail(n)

return the first or last n records, if n not specified 5 is default

Statistics

science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making decisions that are more effective

Panda is a common structure for when we want to look at:

specific attributes or columns

DataFrame .stack() method

stacks the prescribed level(s) from columns to index, and when the columns have a single level, the output is a Series

linear regression

statistical model that fits a straight line through the data using the method of least squares. In its simplest form, this model is y hat= mx +b where y hat is the estimate of the dependent variable y and is equal to the slope m times the independent variable x, plus the intercept b

business intelligence

technologies and processes used by business organizations to gain insights from large quantities of data

azimuth

the angle in the x-y plane of a 3D chart

Elevation

the angle in the z-plane of a 3D chart

Data cleaning

the process by which data are prepared for use by applying techniques to format and adjust observations and values

ggplot style

used to adjust the style to emulate ggplot, a popular plotting package for R

Tags

used to delimit the start and end of elements in the markup in HTML

R-style formulas

used to represent statistical models with the dependent variable on the left-hand side, a tilde symbol to separate the left-hand side from the right-hand side, and the independent variables on the right-hand side, using the plus sign to add new columns to the model

Histogram

used to visualize the distribution of values (frequencies) of a continuous variable

bar charts

used to visualize the frequencies that occur for discrete variables

dependent variable

variable that a model predicts, often use y to denote this variable

Dummy variable

variable that takes the value 1 or 0 to indicate that something is or is not a member of a category, also known as an indicator variable

independent variable

variable used to fit a statistical model, typically represented by x. When there are more than one of these, it is important that they be independent of each other

categorical variable

variable used when data are of the nominal level of measurement, where we only classify and count observations of a qualitative variable


Conjuntos de estudio relacionados

5 - TCP/IP Protocols - Self Test

View Set

BSC2011 Exam 3 learning catalytics

View Set

Hinduism and Buddhism & India's Caste System

View Set

CH1 Completing the Application, Underwriting and Delivering the Policy

View Set