Data Analytics Exam 1 Review

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

LOOKUP Functions

•VLOOKUP-looks up a value in the leftmost column of a table and returns a value in the same row from a column you specify. •HLOOKUP-looks up a value in the top row of a table and returns a value in the same column from a row you specify. •INDEX-returns a value or reference of the cell at the intersection of a particular row and column in a given range. •MATCH-returns the relative position of an item in an array that matches a specified value in a specified order. •CHOOSE-returns a value from a list based on the position in the list, specified by index_num.

Relative and absolute addressing in Excel formulas

A relative address uses just the row and column label in the cell reference An absolute address uses a dollars sign before either the row or column or both Using a dollar sign before a row label keeps the reference fixed to the row but allows the column reference to change if the formula is copied to another cell A dollar sign before both the row and column will keep the reference fixed for both the row and column

Nominal Data, Interval Data, Categorical Data, Ordinal Data, and Ratio Data

Data may be classified into four groups: 1.Categorical (nominal data) which are sorted into categories according to specified characteristics. 2.Ordinal data which can be ordered or ranked according to some relationship to one another. 3.Interval data which are ordinal but have constant differences between observations and have arbitrary zero points. Interval data allow meaningful comparison of ranges, averages and other statistics. 4.Ratio data which are continuous and have a natural zero point. Most business and economic data, such as dollars and time, fall into this category.

Decision Models

Decision models typically have three types of input: •Data, which are assumed to be constant for purposes of the model. •Uncontrollable inputs, which are quantities that can change but cannot be directly controlled by the decision maker. •Decision options, which are controllable and can be selected at the discretion of the decision maker. Decision options are often called decision variables.

Descriptive Analytics

Descriptive analytics is the most commonly used and most-well understood type of analytics. Descriptive analytics summarizes data into meaningful charts and reports, for example, about budgets, sales, revenues, or cost.

Decision Models w/ Analysis Types

Descriptive models explain behavior and allow users to evaluate potential decisions by asking "what-if?" questions. Descriptive decision models describe relationships allowing a manager to make a decision, without telling a manager what to do. Predictive models focus on what will happen in the future. Prescriptive models help decision makers identify the best solution to decision problems. Prescriptive models use optimization—the process of finding a set of values for decision options that minimize or maximize some quantity of interest.

Logical Functions

Logical functions depend on whether one or more conditions are true or false. A condition is a statement about the value of a cell, either numeric or text. Three useful business analytics applications are the following: •IF(condition, value if true, value if false)—a logical function that returns one value if the condition is true and another if the condition is false. •AND(condition1, condition 2 . . . )—a logical function that returns TRUE if all conditions are true and FALSE if not. •OR(conditions 1, condition 2, . . .)—a logical function that returns TRUE if any condition is true and FALSE if not

Models

Many decision models can be formalized using a model. A model is an abstraction or representation of a real system, idea, or object. Models capture the most important feature of a problem and present them in a form that is easy to interpret. A model can be as simple as a written or verbal description of some phenomenon, a visual representation (visual model) such as a graph or flowchart, or a mathematical or spreadsheet representation.

Time Value of Money

Net Present Value (NPV) measures the worth of a stream of cash flows, taking into account the time value of money. The discount rate reflects the opportunity costs of spending funds now versus achieving a return through another investment, as well as the risks associated with not receiving returns until a later time. A positive NPV means that the investment will provide added value because the projected return exceeds the discount rate.

Predictive Analytics

Predictive analytics seeks to predict the future by examining historical data, detecting patterns or relationships in these data, and then extrapolating these relationships forward in time.

Prescriptive Analytics

Prescriptive analytics uses optimization to identify the best alternatives to minimize or maximize some objective.

Statistics for Categorical Data

Statistics such as means and variances are not appropriate for categorical data. Instead, we are interested in the fraction of data that have certain characteristics. The formal statistical measure is called proportion.

Excel Functions

The easiest way to locate a particular function is to select a cell and click on the Insert function button (fx) which can be found under the ribbon next to the formula bar. Some of the more common functions in Excel include: •MIN(range) smallest value in a range of cells •MAX(range) largest value in a range of cells •SUM(range) sum of values in a range of cells •AVERAGE(range) average of the values in a range of cells •COUNT(range) # of cells in a range that contains numbers •COUNTIF(range, criteria) # of cells within a range that meet a specified criterion

Phases in Problem Solving

The fundamental purpose of analytics is to help managers solve problems and make decisions. Problem solving consists of several phases: 1. Recognizing the problem--a problem exists when there is a gap between what is happening and what we think should be happening. 2. Defining the problem--In defining problems, it is important to involve all people who make the decisions or who may be affected by them. 3. Structuring the problem--a formal model is often developed in this phase. 4. Analyzing the problem--experimentation or solution process, evaluating different scenarios, or analyzing risks associated with various decision alternatives. 5. Interpreting results and making a decision--models cannot capture every detail of the real problem, and managers must understand the limitations. 6. Implementing the solution--making the solution work in the organization or translating the results back to the real world.

Copying formulas in Excel

To copy a formula from a single cell or range of cells down a column or across a row 1. first select the cell or range 2. click and hold the mouse on the small square in the lower right-hand corner of the cell 3. drag the formula to the "target" cells which you wish to copy. Another way to copy a formula in Excel is to select the cell with the formula to be copied, and then press Ctrl-C and then press Ctrl-V in Windows With a MAC, use the Command key rather than Ctrl

Cross Tabulation

•A cross-tabulation is a tabular method that displays the number of observations in a data set in different subcategories of two categorical variables. •A cross-tabulation table is often called a contingency table. •The subcategories of the variables must be mutually exclusive and exhaustive, meaning that each observation can be classified into only one subcategory, and, taken together over all subcategories, they must constitute the complete data set.

Dashboards

•A dashboard is a visual representation of a set of key business measures. It is derived from the analogy of an automobile's control panel, which displays speed, gasoline level, temperature, and so on. •Dashboards provide important summaries of key business information to help manage a business process or function.

Populations and Samples

•A population consists of all items of interest for a particular decision or investigation. •A sample is a subset of the population. •Most populations are too large to deal with, thus sampling is necessary. •The purpose of sampling is to obtain sufficient information to draw a valid inference about a population.

Frequency Distribution

•Categorical data naturally define the groups in a frequency distribution. •A frequency distribution is a table that shows the number of observations in each of several non-overlapping groups. A graphical depiction of a frequency distribution in the form of a column chart is called a histogram. •To construct a frequency distribution, you need only count the number of observations that appear in each category. This can be done using the Excel COUNTIF function. •Frequencies may be expressed as a fraction, or proportion of the total; this is called relative frequency.

Dispersion in Statistics

•Dispersion refers to the degree of variation in the data, that is, the numerical spread (or compactness) of the data. Several statistical measures characterize dispersion: the range, interquartile range, variance, and standard deviation. •The range is the easiest to compute, and is the difference between the maximum value and the minimum value in the data set. •The difference between the first and third quartiles is often called the interquartile range, or the midspread. •A more commonly used measure of dispersion is the variance, whose computation depends on all of the data. The larger the variance, the more the data is spread out from the mean. •Essentially, the variance is the average of the squared deviations of the observations from the mean. •The standard deviation is the square root of the variance. The standard deviation is a popular measure of risk, particularly in financial analysis.

PivotTables

•Excel provides a powerful tool for distilling a complex database into meaningful information: PivotTables. PivotTables allows you to create custom summaries and charts of key information in the data and to drill down into a large set of data in numerous ways. •Dragging a field into the Filters area in the PivotTable field list allows you to add a third dimension to your analysis. •You may express the data in a PivotTable in various percentage views, such as the percentage of the grand total, percent of row total, percent of column total, and other options.

How would 10 raised to the power of 2 × 5 raised to the power of 3/ 100 - 73 be represented in an Excel spreadsheet?

•Formulas in Excel use common mathematical operators: •Addition + •Subtraction - •Multiplication * •Division / •Raised to the Power of (Exponentiation) ^ 10^2 * 5^3 / 100 -73

Measures of Shape

•Histograms of sample data can take on a variety of different shapes. •Skewness describes the lack of symmetry of data. •Histograms that have more mass of data on the left and tail off to the right are called positively skewed. Those that have more mass on the right and tail off to the left are called negatively skewed. •The coefficient of skewness measures the degree of asymmetry of observations around the mean. •Kurtosis refers to the peakedness or flatness of a histogram.

Statistical Notation and Measures of Location

•In statistical notation, the elements of a data set are typically labeled as subscripted variables. •Measures of location provide estimates of a single value that in some fashion represents the "centering" of a set of data. The most common is the average. The average is formally called the arithmetic mean (or simply the mean) which is the sum of the observations divided by the number of observations. •The measure of location that specifies the middle value when the data are arranged from least to greatest is the median. Half of the data are below the median, and half of the data are above it. •A third measure of location is the mode. The mode is the observation that occurs most frequently. The mode is most useful for data sets that contain a relatively small number of unique values. •A fourth measure of location that is used occasionally is the midrange. This is simply the average of the largest and smallest values in the dataset.

Excel Charts (other than column and bar)

•Line charts provide a useful means for displaying data over time. •For many types of data, we are interested in understanding the relative proportion of each data source to the total. A pie chart displays this by partitioning a circle into pie-shaped areas showing the relative proportions. •An area chart combines the features of a pie chart with those of line charts. •A scatter chart shows the relationship between two variables and consist of observations of pairs of variable data. •A bubble chart is a type of scatter chart in which the size of the data corresponds to the value of a third variable; consequently, it is a way to plot three variables in two dimensions. •A combination chart displays multiple data series on the same chart using different chart types such as a column chart and a line chart. •Radar charts show multiple metrics on a spider web, thus allowing plotting of multiple dimensions of several data series. This is a useful chart to compare survey data from one time period to another or to compare performance of different entities such as factories, companies, and so on using the same criteria. •A stock chart allows you to plot stock prices, such as daily high, low, and close values.

Coefficient of Variation

•The coefficient of variation (CV) provides a relative measure of the dispersion in data relative to the mean and is defined as CV=Standard Deviation/Mean. •The coefficient of variation provides a relative measure of risk to return. The smaller the coefficient of variation, the smaller the relative risk is for the return provided. •The reciprocal of the coefficient of variation, called return to risk, is often used because it is easier to interpret.

Excel Charts

•To create a chart in Excel, it is best to first highlight the range of the data you wish to chart. Click the Insert tab in the Excel ribbon, click the chart type, and then click a chart subtype that you want to use. •Excel distinguishes between vertical and horizontal charts, calling the former column charts and the latter bar charts. •A clustered column chart compares values across categories using vertical rectangles. •A stacked column chart displays the contribution of each value to the total by stacking the rectangles.

Measures of Association

•Two variables have a strong statistical relationship with one another if they appear to move together. •Covariance is a measure of the linear association between two variables, X and Y. •Correlation is a measure of the linear relationship between two variables, X and Y, which does not depend on the units of measurement. Correlation is measured by the correlation coefficient. •For two variables, a positive correlation coefficient indicates a linear relationship exists for which one variable increases as the other also increases.


Kaugnay na mga set ng pag-aaral

Chapter 10 Math Formulas and example problems

View Set

FCS 202 Chapter 1 Quiz questions

View Set

Accounting- Final Exam, Master Exam One - ACC201

View Set

State Laws, Rules, and Regulations

View Set

ATI - Medication Administration 4

View Set

Fundamentals Learning System Final

View Set