Data Visualization Exam 1 Terms

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

stacked bar chart

A bar chart that uses color to denote the contribution of each subcategory to the total.

spaghetti chart

A chart depicting possible flows through a system using a line for each possible path.

combo chart

A chart that combines two separate charts, for example, a column chart and a line chart, on the same chart.

Goal: To Show Ranking

Bar charts and column charts, sorted on the cross-sectional quantitative data of interest across categories, can be used to effectively show the rank order of categories on the quantitative variable. An example is the ten categories ranked by spending allocation in the New York City budget shown in Figure 2.2. When trying to select a chart type, we recommend starting with understanding the needs of the audience to determine the goal of the chart, understanding the types of data you have, and then selecting a chart based on the guidance provided in this section. Like most analytics tools, it is important to experiment with different approaches before arriving at a final decision on your data visualization.

Composition

Composition is what makes up the whole of an entity under consideration. An example is the bar chart in Figure 2.2

common goals for charts

Composition—Composition is what makes up the whole of an entity under consideration. An example is the bar chart in Figure 2.2. Ranking—Ranking is the relative order of items. Figure 2.2 is also an example of ranking, because we have sorted the categories by bar length, which is proportional to the amounts allocated. Correlation/Relationship—Correlation is how two variables are related to one another. An example of this is the relationship between average low temperature and average annual snowfall for various cities in the United States. Distribution—Distribution is how items are dispersed. An example of this is the number of calls received by a call center in a day, measured on an hourly basis.

Correlation/Relationship

Correlation is how two variables are related to one another. An example of this is the relationship between average low temperature and average annual snowfall for various cities in the United States.

Cross-sectional data

Data collected from several entities at the same or approximately the same point in time.

Time series data

Data collected over several points in time (minutes, hours, days, months, years, etc.)

Categorical data

Data for which categories of like items are identified by labels or names. Arithmetic operations cannot be performed on categorical data.

Quantitative data

Data for which numerical values are used to indicate magnitude, such as how many or how much. Arithmetic operations, such as addition, subtraction, multiplication, and division, can be performed on quantitative data.

Hierarchical data

Data that can be represented with a tree-like structure, where the branches of the tree lead to categories and subcategories

Data Visualization for Exploration

Data visualization is a powerful tool for exploring data to more easily identify patterns, recognize anomalies or irregularities in the data, and better understand the relationships between variables. Our ability to spot these types of characteristics of data is much stronger and quicker when we look at a visual display of the data rather than a simple listing. Visual data exploration is an important part of descriptive analytics. Data visualization can also be used directly to monitor key performance metrics, that is, measure how an organization is performing relative to its goals. data dashboard Visual data exploration is also critical for ensuring that model assumptions hold in predictive and prescriptive analytics. Understanding the data before using that data in modeling builds trust and can be important in determining and explaining which type of model is appropriate.

Data Visualization for Explanation

Data visualization is also important for explaining relationships found in data and for explaining the results of predictive and prescriptive models. More generally, data visualization is helpful in communicating with your audience and ensuring that your audience understands and focuses on your intended message.

Data Visualization in Practice

Data visualization is used to explore and explain data and to guide decision making in all areas of business and science.

Distribution

Distribution is how items are dispersed. An example of this is the number of calls received by a call center in a day, measured on an hourly basis.

Ranking

Ranking is the relative order of items. Figure 2.2 is also an example of ranking, because we have sorted the categories by bar length, which is proportional to the amounts allocated.

Benford's law

also known as the First-Digit Law, gives the expected probability that the first digit of a reported number takes on the values one through nine, based on many real-life numerical data sets such as company expense accounts.

volume

the amount of data generated

variety

the diversity in types and structures of data generated

veracity

the reliability of the data generated

velocity

the speed at which the data are generated

enclosure

A Gestalt principle stating that objects that are physically enclosed together are seen as belonging to the same group. First, we can simply reinforce the similarity principle by creating an enclosure of the points that are already in close proximity

proximity

A Gestalt principle stating that people consider objects that are physically close to one another as belonging to a group. People will generally seek to collect objects that are near each other into a group and separate objects that are far from one another into different groups.

similarity

A Gestalt principle stating that people consider objects with similar characteristics as belonging to the same group. These characteristics could be color, shape, size, orientation, or any preattentive attribute. The audience will perceive objects that are the same color, or same shape, as belonging to the same group. We need to understand this when we design a visualization and make sure that we only use similar characteristics for objects when they belong to the same group.

column chart 2

A chart that displays a quantitative variable by category or time period using vertical bars to display the magnitude of a quantitative variable. The line chart in Figure 2.12b, and the column chart in Figure 2.15, are both good displays of the Cheetah Sports annual sales. The line chart, with its connected lines, makes it easier to see how the sales are changing over time. The column chart, with its labels, is preferred if it is important for the audience to know the values of sales in each year. Moreover, adding data labels to a line chart generally makes the chart too cluttered. On the other hand, if there are numerous categories or time periods, the line chart (without data labels) would be preferred over the column chart with data labels as the column chart would appear too cluttered and labels would not be readable.

bar chart 2

A chart that displays a quantitative variable by category using the length of horizontal bars to display the magnitude of a quantitative variable. Like column charts, bar charts are useful for comparing categorical variables and are most effective when you do not have too many categories.

clustered bar chart

A chart that displays multiple quantitative variables for categories or time periods using the length of horizontal bars to denote the magnitude of the quantitative variables and separate bars and colors to denote the different categories.

clustered column chart 2

A chart that displays multiple quantitative variables for categories or time periods with different colors, with the height of the columns denoting the magnitude of the quantitative variable.

radar chart

A chart that displays multiple quantitative variables on a polar grid with an axis for each variable. The quantitative values on each axis are connected with lines for a given category.

shot chart

A chart that displays the location of shots attempted by a basketball player during a basketball game with different symbols or colors indicating successful and unsuccessful shots.

bar chart

A chart that shows a summary of categorical data using the length of horizontal bars to display the magnitude of a quantitative variable.

geographic map

A chart that shows characteristics and the arrangement of the geography of our physical reality.

column chart

A chart that shows numerical data by the height of a column for a variety of categories or time periods.

high-low-close stock chart 2

A chart that shows the high value, low value, and closing value of the price of a share of stock over time.

funnel chart

A chart that shows the progression of a numerical variable to typically smaller values through a process, for example, the percentage of website visitors who ultimately result in a sale.

funnel chart 2

A chart that shows the progression of a quantitative variable for various nested categories from larger to smaller values. marketing

high-low-close stock chart

A chart that shows three numerical values: high value, low value, and closing value for the price of a share of stock over time.

Line charts

A chart that uses a point to represent a pair of quantitative variable values, one value along the horizontal axis and the other on the vertical axis, with a line connecting the points. Line charts are very useful for time series data (data collected over a period of time: minutes, hours, days, years, etc.).

Form

A collection of the preattentive attributes of orientation, size, shape, length, and width. Each of these attributes can be used to call attention to a particular aspect of a data visualization.

clustered column chart

A column chart showing multiple variables of interest on the same chart, the different variables usually denoted by different colors or shades of a color with the columns side by side.

Stacked column chart

A column chart that shows part-to-whole comparisons, either over time or across categories. Different colors, or shades of color, are used to denote the different parts of the whole within a column.

stacked column chart

A column chart that shows part-to-whole comparisons, either over time or across categories. Different colors, or shades of color, are used to denote the different parts of the whole within a column.

data dashboard

A data visualization tool that gives multiple outputs and may update in real time. is a data visualization tool that gives multiple outputs and may update in real time. Just as the dashboard in your car measures the speed, engine temperature, and other important performance data as you drive, corporate data dashboards measure performance metrics such as sales, inventory levels, and service levels relative to the goals set by the company. These data dashboards alert management when performances deviate from goals so that corrective actions can be taken

choropleth map

A geographic map that uses shades of a color, different colors, or symbols to indicate quantitative or categorical variables by geographic region or area.

control chart

A graphical display in which a variable of interest is plotted over time relative to lower and upper control limits.

stock chart

A graphical display of stock prices over time. finance

big data

Any set of data that is too large or complex to be handled by standard data-processing techniques using a typical desktop computer. Big data includes text, audio, and video data. - volume—the amount of data generated - velocity—the speed at which the data are generated - variety—the diversity in types and structures of data generated - veracity—the reliability of the data generated

scatter chart 2

A graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other is shown on the vertical axis and a symbol is used to plot ordered pairs of the quantitative variable values. Scatter charts are among the most useful charts for exploring pairs of quantitative data. But, what if you wish to explore the relationships between more than two quantitative variables? When exploring the relationships between three quantitative variables, a bubble chart may be useful.

scatter chart

A graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other is shown on the vertical axis.

area chart

A line chart with the area between the lines filled with color.

motion

A movement type of preattentive attribute that involves directed movement and can be used to show changes within a visualization. However, when the data visualization tool allows for the use of movement, it can be used to direct attention to certain areas of a visualization or to show changes over time or space. Because the focus of this text is on static visualizations, we will not go into detail on using the preattentive attribute of movement, but we should caution that movement can also become overwhelming and distracting if it is overused in a visualization.

Flicker

A movement type of preattentive attribute that refers to effects such as flashing to draw attention to something. Humans are attuned to detecting movement. Therefore, preattentive attributes such as flicker and motion can be effective at drawing attention to specific items or portions of a data visualization

Spatial positioning

A pre attentive attribute that refers to the location of an object within some defined space. The spatial positioning most often used in data visualization is 2D positioning. Scatter charts are a common type of chart that make use of the preattentive attribute of spatial positioning.

color

A preattentive attribute for data visualizations that includes the attributes of hue, saturation, and luminance. Hue, saturation, and luminance can each be used to draw the user's attention to specific parts of a data visualization and to differentiate among values in a visualization. Using differences in hue in a data visualization creates bold, stark contracts while changing the saturation or luminance creates softer, less stark contrasts. Color can be an extremely effective attribute to use to differentiate particular aspects of data in a visualization. However, one must be careful not to overuse color as it can become distracting in a visualization. It should also be noted that many people suffer from color-blindness, which affects their ability to differentiate between some colors.

bubble chart

A scatter chart that displays a third quantitative variable using different sized dots, which we refer to as bubbles.

heat map

A two-dimensional graphical representation of data that uses different shades of color to indicate magnitude. The heat map in Figure 2.23 helps the reader to easily identify trends and patterns Heat maps can be used effectively to convey data over different areas, across time, or both, as seen here

Sankey chart

A type of data visualization chart that typically depicts the proportional flow of entities where the width of the line represents the relative flow rate compared to the widths of the other lines. However, because it is not easy to compare relative line widths, it is more difficult to compare the proportion of graduation majors from different anticipated majors. Sankey charts can often quickly become overwhelming and difficult to interpret, so one should be careful not to try to include too much information within this type of visualization.

length

A type of preattentive attribute associated with form. It refers to the horizontal, vertical, or diagonal distance of a line. we are generally referring to their use with lines, bars, or columns. Length is useful for illustrating quantitative values because a longer line corresponds to a larger value. Length is used extensively in bar and column charts to visualize data. Because it is much easier to compare relative lengths than relative sizes, bar and column charts are often preferred to pie charts for visualizing data.

Size

A type of preattentive attribute associated with form. It refers to the relative amount of 2D space that an object occupies in a visualization Most people are not good at estimating this relative size difference, so we must be careful when using the attribute of size to convey information about relative amounts.

Orientation

A type of preattentive attribute associated with form. It refers to the relative positioning of an object within a data visualization. It is a common preattentive attribute present in line graphs

width

A type of preattentive attribute associated with form. It refers to the thickness of the line. we are generally referring to their use with lines, bars, or columns.

Shape

A type of preattentive attribute associated with form. It refers to the type of object used in a data visualization. In general, most shapes do not specifically correspond to certain quantitative amounts. Nevertheless, shape can be effectively used to draw attention in a visualization or as a way to group common items and distinguish between items from different groups.

waterfall chart

A visual display that shows the cumulative effect of positive and negative changes on a variable of interest. The basis of the changes can be time or categories and changes are represented by columns anchored at the previous time or category's cumulative level. shows the gross profit by month, with blue indicating a positive gross profit and orange indicating a negative gross profit. The upper or lower level of the bar indicates the cumulative level of gross profit. For positive changes, the upper level of the bar is the cumulative level, and for negative changes, the lower end of the bar is the cumulative level. Here we see that cumulative level of gross profit rises from January to March, drops in April and May and then increases in June to the cumulative gross profit of $26,448 for the six-month period.

Goal: To show Composition

A waterfall chart shows the composition of a quantitative variable of interest over time or category. For example, Figure 2.30 shows the composition of the final value of gross profit over time. A funnel chart also shows composition in the sense that going from the bottom of the funnel to the top gives the composition of the original set at the top of the funnel. The funnel chart for the hiring process in Figure 2.34 is an example.

accounting

Accounting is a data-driven profession. Accountants prepare financial statements and examine financial statements for accuracy and conformance to legal regulations and best practices, including reporting required for tax purposes. Data visualization is a part of every accountant's tool kit. Data visualization is used to detect outliers that could be an indication of a data error or fraud colmn chart and clustered column

Engineering

Engineering relies heavily on mathematics and data. Hence, data visualization is an important technique in every engineer's toolkit. For example, industrial engineers monitor the production process to ensure that it is "in control" or operating as expected When the points are between the lower and upper control limits, the process is considered to be in control. When points begin to appear outside the control limits with some regularity and/or when large swings start to appear as in Figure 1.11, this is a signal to inspect the process and make any necessary corrections. control chart

Preattentive attributes

Features of a data visualization that can be processed by iconic memory. Preattentive attributes related to visual perception are generally divided into four categories: color, form, spatial positioning, and movement.

What types of memory are most important for visual processing ?

For most data visualizations, iconic and short-term memory are most important for visual processing. In particular, an understanding of what aspects of a visualization can be processed in iconic memory can be helpful for designing effective visualizations.

Selecting an Appropriate Chart

How do you choose an appropriate chart? If the goal of your chart is to explain, then the answer to this question depends on the message you wish to convey to your audience. If you are exploring data, the best chart type depends on the question you are asking and hope to answer from the data. Also, the type of data you have may influence your chart selection. The type of data you have should also influence your chart selection. The relationship between two quantitative variables often makes a scatter chart an appropriate choice. Bar charts, scatter charts, and line charts with the horizontal axis being time, are often the best choice for time series data. If your data have a spatial component, a geographic map might be a good choice

Goal: Show Distribution

In addition to being useful for showing the relationships between quantitative variables, scatter and bubble charts can be useful for showing how the quantitative variable values are distributed over the range for each variable. Column and bar charts can be used to show the distribution of a variable of interest over discrete categories or time periods. As previously mentioned, column charts rather than bar charts should be used for distribution over time, as it is more natural represent the progression of time from left to right. A choropleth map shows the distribution of a quantitative or categorical viable over a geographic space. Figures 2.19 and 2.21 are examples of these.

Tables versus Charts

In general, charts can often convey information faster and easier to readers than tables, but in some cases a table is more appropriate. Tables should be used when the: - reader needs to refer to specific numerical values. - reader needs to make precise comparisons between different values and not just relative comparisons. - values being displayed have different units or very different magnitudes.

finance

Like accounting, the area of business known as finance is numerical and data-driven. Finance is the area of business concerned with investing. Financial analysts, also known as "quants," use massive amounts of financial data to decide when to buy and sell certain stocks, bonds, and other financial instruments. Data visualization is useful in finance for recognizing trends, assessing risk, and tracking actual versus forecasted values of metrics of concern. high-low-close stock chart

Operations

Like marketing, analytics is used heavily in managing the operations function of business. Operations management is concerned with the management of the production and distribution of goods and services. It includes responsibility for planning and scheduling, inventory planning, demand forecasting, and supply chain optimization. Figure 1.10 shows time series data for monthly unit sales for a product (measured in thousands of units sold). Each period corresponds to one month. So that a cost-effective production schedule can be developed, an operations manager might have responsibility forforecasting the monthly unit sales for next twelve months (periods 37-48) line chart

Marketing

Marketing is one of the most popular application areas of analytics. Analytics \is used for optimal pricing, markdown pricing for seasonal goods, and optimal allocation of marketing budget. Sentiment analysis using text data such as tweets, social networks to determine influence, and website analytics for understanding website traffic and sales, are just a few examples of how data visualization can be used to support more effective marketing. Funnel Chart

Prescriptive analytics

Mathematical or logical models that suggest a decision or course of action.

Data Visualization for Exploration 2

One of the most commonly used predictive models is linear regression, which involves finding the best-fitting line to the data. In the graphs in Figure 1.2, we show the best-fitting lines for each data set. Notice that the lines are the same for each data set. In fact, the measure of how well the line fits the data (expressed by a statistic labeled ) is the same (67% of the variation in the data is explained by the line). hence, before applying predictive and prescriptive analytics, it is always best to visually explore the data to be used. This helps the analyst avoid misapplying more complex techniques and reduces the risk of poor results.

Predictive analytics

Techniques that use models constructed from past data to predict future events or better understand the relationships between variables. One of the most commonly used predictive models is linear regression, which involves finding the best-fitting line to the data. In the graphs in Figure 1.2, we show the best-fitting lines for each data set. Notice that the lines are the same for each data set. In fact, the measure of how well the line fits the data (expressed by a statistic labeled ) is the same (67% of the variation in the data is explained by the line)

cognitive load

The amount of effort necessary to accurately and efficiently process the information being communicated by a data visualization. Proper use of preattentive attributes in a data visualization reduces the cognitive load,This makes it easier for the audience to interpret the visualization with less effort.

Hue

The attribute of a color that is determined by the position the light occupies on the visible light spectrum and defines the base of the color.

Saturation

The attribute of a color that represents the amount of gray in the color and determines the intensity or purity of the hue in the color.

Luminance

The attribute of a color that represents the relative degree of black or white in the color.

Data visualization

The graphical representation of data and information using displays such as charts, graphs, and maps.

Gestalt principles

The guiding principles of how people interpret and perceive things that they see, which can be used in the design of effective data visualizations. The principles generally describe how people define order and meaning in things that they see. The principles generally describe how people define order and meaning in things that they see.

Sciences

The natural and social sciences rely heavily on the analysis of data and data visualization for exploring data and explaining the results of analysis Geographic maps are not only used to display data, but also to display the results of predictive models. An example of this is shown in Figure 1.12. Predicting the path a hurricane will follow is a complicated problem. Numerous models, each with its own set of influencing variables (also known as model features), yield different predictions. Displaying the results of each model on a map gives a sense of the uncertainty in predicted paths across all models and expands the alert to a broader range of the population than relying on a single model. Because the multiple paths resemble pieces of spaghetti, this type of map is sometimes referred to as a "spaghetti chart." More generally, a spaghetti chart is a chart depicting possible flows through a system using a line for each possible path. Spaghetti Chart

Short-term memory

The portion of memory that holds information for about a minute. It utilizes chunking, or grouping things together to hold about four chunks of visual information at one time. For instance, most people find it difficult to remember which color represents which category if more than four different colors/categories are used in a bar or column chart.

Iconic memory

The portion of memory that is processed fastest. It is processed automatically and the information is held there for less than a second.

Long-term memory

The portion of memory where information is stored for an extended amount of time. Most long-term memories are formed through repetition and rehearsal but can also be formed through clever use of storytelling.

visual perception

The process through which our brains interpret the reflections of light that enter our eyes. The process of visual perception is related to how memory works in our brain. At a very high level, there are three forms of memory that affect visual perception: iconic memory, short-term memory, and long-term memory.

Analytics

The scientific process of transforming data into insights for making better decisions In summary, the availability of massive amounts of data, improvements in analytical methods, and substantial increases in computing power and storage have enabled the explosive growth in analytics, data science, and artificial intelligence.

Descriptive analytics

The set of analytical tools that describe what has happened.

Sports

The use of analytics for player evaluation and on-field strategy is now common throughout professional sports. Data visualization is a key component of how analytics is applied in sports. It is common for coaches to have tablet computers on the sideline that they use to make real-time decisions such as calling plays and making player substitutions. Shot Chart

Goal: To show a relationship

To show a relationship between two quantitative variables, we recommend a scatter chart. When dealing with three quantitative variables, a bubble chart can be used. Line charts can be used to emphasize the pattern across consecutive data points and are commonly used to display relationships over time. Stock charts show the relationship between time and stock price. Column charts, bar charts, and heat maps can be used to show the relationships that exist between categories.

Some Charts to Avoid

Usually this is because a chart is overly cluttered or takes too much effort for most audiences to interpret the chart quickly and accurately. Instead of a pie chart, consider using a bar chart. This is because science has shown that we are better at assessing differences in length than angle and area. Small differences can be better detected in length than area, especially when sorted by length. Also, using a bar chart simplifies the chart in that there is no longer a need for a different color for each categor Another chart to be avoided is a radar chart. A radar chart is a chart that displays multiple quantitative variables on a polar grid with an axis for each variable. The radar chart in Figure 2.36 has three axes corresponding to the three columns of data in Figure 2.35. Luckily the three variables are of roughly the same magnitude. Variables of very different scales can distort a radar chart. Even with this very small data set, the radar chart is quite busy and difficult for a

why visualize data?

We create data visualizations for two reasons: exploring data and communicating/explaining a message.

Goal: To show Composition

When the goal is to show the composition of an entity, a good choice is a bar chart, sorted by contribution to the whole. An example is the New York City budget in Figure 2.2. A stacked bar chart is appropriate for showing the composition of different categories and a stacked column chart is good for showing composition over a time series. Figure 2.17, the sales for Cheetah Sports by region is a good example of a stacked column chart with time series data. A treemap shows composition in the situation where there is a hierarchical structure among categorical variables. In Figure 2.26, we see the brand values (the quantitative variable of interest) for companies within industry sectors. For example, the technology sector is composed of six brands in the top ten. All other sectors are composed of only a single brand.

Human Resource Management

With the increased use of analytics in business, HRM has become much more data-driven. Indeed, HRM is sometimes now referred to as "people analytics." HRM professionals use data and analytical models to form high-performing teams, monitor productivity and employee performance, and ensure diversity of the workforce. Data visualization is an important component of HRM, as HRM professionals use data dashboards to monitor relevant data supporting their goal of having a high-performing workforce. A key interest of HRM professionals is employee churn, or turnover in an organization's workforce. When employees leave and others are hired, there is often a loss of productivity as positions go unfilled. Also, new employees typically have a training period and then must gain experience, which means employees will not be fully productive at the at the beginning of their tenure with the company.

treemap

a chart that uses the size, color, and arrangement of rectangles to display the magnitudes of a quantitative variable for different categories, each of which are further decomposed into subcategories. The size of each rectangle represents the magnitude of the quantitative variable within a category/subcategory. The color of the rectangle represents the category and all subcategories of a category are arranged together.

Some Charts to Avoid 2

for an audience to interpret. Perhaps a better choice is the clustered column chart shown in Another chart that many find difficult to read is an area chart. An area chart is a line chart with the area between the lines filled with color. Combo charts can be overly cluttered and difficult to interpret, especially when they contain both a left and right vertical axis. radar, area, pie, combo

will the type of data you have influence the type of graph you should use to convey your message?

yes


Ensembles d'études connexes

Section 9, Unit 1: Listing Agreements

View Set

AP Euro Chapter 13: Reformations and Religious Wars

View Set

Match the types of fractures with their definition

View Set

ATI Gastrointestinal learning system 3.0

View Set