Module 2 Organizing, Visualizing, and Describing Data

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Target downside deviation (also known as target semi-deviation) L1R02LA-BP022_2106 *** (review this question)

= [ (actual - B)^2/n-1]^1/2

Know how to calculate these: - MAD (mean absolute deviation) - Variance and Standard deviation - relationship between the geometric and arithmetic means( same formula as prior) - Target downside deviation L1R02LA-BP022_2106 *** (review this question) -coefficient of variation

= ∑ of ∣Actual - expected I/ n-1 = skip = XG = XA - Variance/2 = [ (actual - B)^2/n-1]^1/2 = Standard deviaiton/Avg or XA

Cross-sectional data, Time series data, Panel data,

Cross-sectional data: examines the same variable at a specific time for multiple observational units. Time series data: examines the same variable at different points in time for the same observational unit. Panel data: cross-sectional data over time, typically represented in a data table

Charts and Graphs:

Histograms Frequency Polygon Cumulative Frequency Distribution ( tend to flatten out when returns are extremely negative or extremely positive.) Bar chart ( like a histogram, but for categorical rather than numerical data) Grouped bar chart (known as a clustered bar chart) vs Stacked bar chart Tree map ( consists of colored rectangles that represent categories or intervals of data, identify subgroups ) Word cloud ( represents textual data with the value of observations represented by the size of the words and the category or the sentiment represented by the color of the words, It's for unstructured data) Bubble chart (a frequency polygon representing multi-dimensional data) scatter plot ( common in a linear regression) heat map ( assigns different colors indicating the magnitude of the variable

Contingency Table Joint frequency Marginal frequency

- summarizes two or more categorical variables into a single table with one variable - represents the number of variables observed at the intersection of a row and a single column - represents the number of variables observed as the total across a row or down a column

Categorical data

+ Categorical data: indicate a quality or characteristic of a group of data, thereby making this type of data important in dividing a data set for summarization and visualization. Categorical data involves a limited number of mutually exclusive qualities or descriptors such as dividend paying, or non-dividend paying, or small-cap, mid-cap, large-cap. 1) Nominal data involves categories that cannot logically be compared to each other 2) Ordinal data involves categories that can be compared to each other (e.g., bond ratings or Morningstar star ratings).

one-dimensional array multi-dimensional array

- Think of it as literally one line of boxes. 4 x 1 types of boxes -Literally multiple lines of boxes: a data set. 3 x 3 type of boxes

Absolute frequency Relative frequency

- describes the number of observations (you can count out how many times are in a range of data) - divides the number in each group or interval by the total number of observations to provide a normalized comparison (i.e., an accumulated percentage). ( look at picture will be faster)

Structured data Unstructured data

- easily organized and presented as arrays of variables or dimensional data tables - do not follow conventional organization approaches. EX: filings with regulators, posts in social media, other types of non-data financial news, management earnings calls, and analyst presentations. Words or images that are difficult to logically categorize, manipulate, or use in financial modeling.

Correlation Coefficient formula

= (Cov of A and B) / [(STD of A) x (STD of B)

Covariance formula

= E(x-Xbar)(y-Ybar)/n-1

Quantiles

Ly=(n+1)* y/100

Negative Skew

Mean < median < mode low outlier

Numerical data (known as quantitative data), Continuous data, Discrete data

Numerical data: (such as Height, weight, age, in forms of numbers), which split into 2 way: Continuous data: can be measured as any value in a range of values Discrete data: resulting from a counting process

* A bar chart that orders categories by frequency in descending order and includes a line displaying cumulative relative frequency is referred to as a: Pareto Chart. grouped bar chart. frequency polygon. --- * Which visualization tool works best to represent unstructured, textual data? Tree-Map Scatter plot Word cloud --- * A tree-map is best suited to illustrate: underlying trends over time. joint variations in two variables. value differences of categorical groups. --- * A line chart with two variables—for example, revenues and earnings per share—is best suited for visualizing: the joint variation in the variables. underlying trends in the variables over time. the degree of correlation between the variables. --- * A heat map is best suited for visualizing the: frequency of textual data. degree of correlation between different variables. shape, center, and spread of the distribution of numerical data. --- * Which valuation tool is recommended to be used if the goal is to make comparisons of three or more variables over time? Heat map Bubble line chart Scatter plot matrix

Pareto Chart. A is correct. A bar chart that orders categories by frequency in descending order and includes a line displaying cumulative relative frequency is called a Pareto Chart. A Pareto Chart is used to highlight dominant categories or the most important groups. B is incorrect because a grouped bar chart or clustered bar chart is used to present the frequency distribution of two categorical variables. C is incorrect because a frequency polygon is used to display frequency distributions. --- Word cloud C is correct. A word cloud, or tag cloud, is a visual device for representing unstructured, textual data. It consists of words extracted from text with the size of each word being proportional to the frequency with which it appears in the given text. --- value differences of categorical groups. C is correct. A tree-map is a graphical tool used to display and compare categorical data. --- underlying trends in the variables over time. --- degree of correlation between different variables. B is correct. A heat map is commonly used for visualizing the degree of correlation between different variables. --- B is correct. A bubble line chart is a version of a line chart where data points are replaced with varying-sized bubbles to represent a third dimension of the data. A line chart is very effective at visualizing trends in three or more variables over time. A is incorrect because a heat map differentiates high values from low values and reflects the correlation between variables but does not help in making comparisons of variables over time. C is incorrect because a scatterplot matrix is a useful tool for organizing scatterplots between pairs of variables, making it easy to inspect all pairwise relationships in one combined visual. However, it does not help in making comparisons of these variables over time.

Lesson 1: Categorizing, Organizing, Summarizing, and Visualizing Data LOS Identify and compare data types.

a

Add-on: Pareto Chart

a bar graph whose bars are drawn in decreasing order of frequency or relative frequency

Positive Skew

mean > median > mode asymmetry with longer tail on right high outlier

Kurtosis

the "peakedness" of the distribution We want meso, but platy is ok. We don't want lepto. - Leptokurtic gives more uncertainty, meaning that adjustments should be made to avoid having unexpectedly large gains and losses in a security or a portfolio. Thus, we want mesokurtic or platykurtic.

Coefficient of Variation (CV)

the standardized measure of the risk per unit of return; calculated as: standard deviation / expected return


Kaugnay na mga set ng pag-aaral

Commerical and Investment Properties

View Set

MCA-3 Week 1 Coronary Artery Disease

View Set

Week 7 Check Your Understanding Chapters 36, 37, 38, 40, and 41

View Set

Social Media Marketing Chapters 1 - 10

View Set