IPAC 4130 Chapter 1
(True/False) Big Data implies the collection of complete (population) data.
Answer: False Explanation: False. Big Data does not imply the collection of complete population data. It often involves working with large samples or subsets of data.
(True/False) Big Data is always used by businesses when available.
Answer: False Explanation: False. Big Data may not always be used by businesses, as it can be inconvenient and computationally burdensome. The benefits of using Big Data may not always justify the associated costs.
(True/False) Categorical data are also referred to as quantitative data.
Answer: False Explanation: False. Categorical data are not referred to as quantitative data. Categorical data represent distinct categories or labels, while quantitative data involve numerical values.
(True/False) Categorical variables can perform meaningful arithmetic operations.
Answer: False Explanation: False. Categorical variables are not suitable for performing meaningful arithmetic operations. The numerical codes assigned to categories do not have inherent mathematical significance.
(True/False) Continuous variables are always measured in discrete values.
Answer: False Explanation: False. Continuous variables are not always measured in discrete values. In practice, they may be approximated or measured in discrete intervals, but they can theoretically take on an uncountable number of values within a range.
(True/False) Handling missing values is typically the last step in data preparation.
Answer: False Explanation: False. Handling missing values is not necessarily the last step in data preparation. It can be an essential part of data cleaning and can occur at various stages of data analysis. It depends on the nature and importance of missing data in the analysis.
(True/False) The presence of a large amount of data guarantees the generation of useful insights and measurable improvements.
Answer: False Explanation: False. Having a large volume of data does not guarantee the generation of useful insights or measurable improvements. It depends on factors like data quality, analysis methods, and the formulation of meaningful questions.
(True/False) Inconvenience and computational burden are not factors that can deter the use of Big Data.
Answer: False Explanation: False. Inconvenience and computational burden can be significant deterrents to the use of Big Data. These challenges may lead businesses to carefully consider whether the benefits of using Big Data justify the costs and efforts involved.
(True/False) It is always feasible to collect data from the entire population.
Answer: False Explanation: False. It is not always feasible to collect data from the entire population due to various practical constraints, such as time, cost, and the size of the population.
(True/False) Numeric data are also referred to as qualitative data.
Answer: False Explanation: False. Numeric data are referred to as quantitative data, not qualitative data. Quantitative data involve meaningful numerical values.
(True/False) Examples of categorical data include numerical grades in a course.
Answer: False Explanation: False. Numerical grades in a course are typically considered quantitative data because they represent numerical values on a scale. Categorical data would involve non-numerical categories like "Marital Status" or "Hair Color."
(True/False) On an interval scale, the zero value indicates the absence of the characteristic being measured.
Answer: False Explanation: False. On an interval scale, the zero value is arbitrary and does not indicate the absence of the characteristic being measured. For example, a temperature of 0 degrees does not mean there is no temperature.
(True/False) On an ordinal scale, the difference between ranked values is meaningful and consistent.
Answer: False Explanation: False. On an ordinal scale, the difference between ranked values is not necessarily meaningful or consistent. While you can rank the categories, the numerical values assigned to them are arbitrary and lack consistent intervals.
(True/False) Sorting data is primarily used to perform complex statistical tests.
Answer: False Explanation: False. Sorting data is not primarily used to perform complex statistical tests. Its main purpose is to organize data for better understanding, identify outliers, and explore data patterns. Complex statistical tests are typically performed after data preparation.
(True/False) Numeric variables with missing values should be excluded from subsequent analysis, according to the imputation strategy.
Answer: False Explanation: False. The imputation strategy for numeric variables recommends replacing missing values with imputed values, typically using measures like the average (mean). This allows you to retain all cases for analysis, including those with missing data, by substituting reasonable estimates for missing values.
(True/False) The number of students in a classroom is an example of a continuous variable.
Answer: False Explanation: False. The number of students in a classroom is an example of a discrete variable because it takes on countable, whole-number values.
(True/False) The ordinal scale is typically used for variables that can be measured quantitatively.
Answer: False Explanation: False. The ordinal scale is typically used for variables that can be ranked or categorized based on some characteristic, but it does not imply a quantitative measurement. It involves ordering categories without specifying the exact degree of difference between them.
(True/False) Time series data is primarily focused on specific groups of people or events.
Answer: False Explanation: False. Time series data is primarily focused on tracking changes or patterns over time for certain groups of people, specific events, or objects. The focus is on the evolution of these variables rather than specific groups or events.
(True/False) Unstructured data conform to a pre-defined, row-column format.
Answer: False Explanation: False. Unstructured data do not conform to a predefined, row-column format; they are typically free-form and non-structured.
(True/False) Examples of unstructured data include financial data in structured databases.
Answer: False Explanation: False. Unstructured data examples, such as social media data, differ from structured data found in databases, including financial records. Unstructured data are typically not organized in rows and columns like structured data.
(True/False) Unstructured data typically follows a row-column format.
Answer: False Explanation: False. Unstructured data is the opposite of structured data and does not follow a specific row-column format. It is often free-form and lacks a predefined structure.
(True/False) Veracity in Big Data relates to the immense volume of data compiled from various sources.
Answer: False Explanation: False. Veracity in Big Data refers to the credibility and quality of the data, focusing on its reliability rather than its volume.
(True/False) Statistics is the science that deals only with numerical data.
Answer: False Explanation: Statistics deals with both numerical and non-numerical data. It encompasses a wide range of data types and formats.
(True/False) The characteristic "Values" in Big Data refers to the methodological plan for formulating questions and curating data.
Answer: True Explanation: True. "Values" in the context of Big Data refer to the methodological plan for formulating questions, curating the right data, and unlocking hidden potential, emphasizing the importance of having a clear strategy for data utilization.
(True/False) Counting and sorting data can help analysts verify if the data set is complete or contains missing values.
Answer: True Explanation: True. Counting and sorting data can indeed help analysts verify if the data set is complete or contains missing values. It provides a preliminary assessment of data quality.
(True/False) Counting the number of observations in each category is a common way to summarize categorical data.
Answer: True Explanation: True. Counting the number of observations in each category, or finding percentages for each category, is a common way to summarize and analyze categorical data. This helps understand the distribution of data across different categories.
(True/False) The imputation strategy recommends replacing missing values in categorical variables with the predominant category.
Answer: True Explanation: True. In the imputation strategy, missing values in categorical variables are often replaced with the predominant (most frequently occurring) category. This is a straightforward way to impute missing data for categorical variables.
(True/False) Sample data are generally collected in one of two ways.
Answer: True Explanation: True. Sample data are typically collected in one of two ways: random sampling or non-random sampling. The choice of sampling method depends on the research objectives and constraints.
(True/False) The choice of the statistical tool depends on the type and format of the data being analyzed.
Answer: True Explanation: The choice of statistical tools or methods depends on the characteristics of the data, including its type, format, and the research question being addressed.
(True/False) The final step in the statistical analysis process is to clearly communicate information with actionable business insights.
Answer: True Explanation: The final step in the statistical analysis process is to communicate the results and insights derived from the analysis in a clear and actionable manner, often to support decision-making in business or other contexts.
(True/False) A sample in statistics is a representation of the entire population.
Answer: True Explanation: True. A sample in statistics is a subset of the population, and it is used to make inferences or draw conclusions about the entire population.
(True/False) Arithmetic operations are valid for interval- and ratio-scaled variables.
Answer: True Explanation: True. Arithmetic operations, such as addition, subtraction, multiplication, and division, are valid for both interval and ratio-scaled variables because they involve meaningful differences and ratios between values.
(True/False) Categorical variables are usually coded into numbers for data processing purposes.
Answer: True Explanation: True. Categorical variables are often coded into numerical values to facilitate data processing and analysis. These numerical codes represent the categories but may not have mathematical meaning.
(True/False) Categorical variables are typically represented by labels or names to identify distinguishing characteristics.
Answer: True Explanation: True. Categorical variables are typically represented by labels or names that identify the distinguishing characteristics or categories. These labels are used to categorize and differentiate observations.
(True/False) Distinguishing between different measurement scales is essential for choosing appropriate data analysis techniques.
Answer: True Explanation: True. Choosing the appropriate data analysis techniques depends on the measurement scale of the variable (e.g., categorical, ordinal, interval, or ratio). Different measurement scales require different types of analysis methods.
(True/False) Data subsetting can be based on data ranges to focus on specific portions of the data.
Answer: True Explanation: True. Data subsetting can indeed be based on data ranges, allowing analysts to focus on specific portions of the data that fall within certain range criteria. This is a common approach for selecting data that meets specific criteria for analysis.
True/False) Descriptive statistics involve organizing and presenting data using charts and tables.
Answer: True Explanation: True. Descriptive statistics include organizing and presenting data using various methods, including charts and tables, to provide a clear summary of the data.
(True/False) Homeownership rates in the US is an example of time series data.
Answer: True Explanation: True. Homeownership rates in the US, tracked over multiple time periods, is a classic example of time series data as it involves observing and analyzing changes in homeownership rates over time.
(True/False) Making inferences about population parameters is typically based on sample statistics.
Answer: True Explanation: True. In statistical inference, we often use sample statistics as estimates of population parameters and make inferences based on them.
(True/False) A population in statistics consists of all items or members of interest.
Answer: True Explanation: True. In statistics, a population is the complete set of items or members of interest that researchers want to study. (True/False) A sample in statistics is a representation of the entire population.
(True/False) Inferential statistics are used to make conclusions about a larger set of data based on a smaller sample.
Answer: True Explanation: True. Inferential statistics involve drawing conclusions about a larger population based on a smaller sample, making predictions or inferences.
(True/False) Nominal scales represent categories or groups and their values differ by labels or names.
Answer: True Explanation: True. Nominal scales represent categories or groups, and the values differ based on labels or names. For example, "Marital Status" categories may include "Single," "Married," "Divorced," etc.
(True/False) Structured data can be easily entered, stored, queried, and analyzed using appropriate tools.
Answer: True Explanation: True. Structured data is designed to be easily entered, stored, queried, and analyzed using tools specifically designed for structured data management, such as databases and spreadsheet software.
(True/False) Structured data primarily consists of numerical information that is objective and not open to interpretation.
Answer: True Explanation: True. Structured data often consists of numerical information, and it is typically objective, meaning the data is based on concrete measurements and is not open to subjective interpretation.
(True/False) Subsetting involves excluding variables with excessive amounts of missing values.
Answer: True Explanation: True. Subsetting may involve excluding variables with excessive amounts of missing values to simplify the dataset and ensure that the analysis is based on variables with sufficient data.
(True/False) The provided example of NBA Eastern Conference standings for the 2018-2019 season is an instance of cross-sectional data.
Answer: True Explanation: True. The example provided, the NBA Eastern Conference standings for the 2018-2019 season, is indeed an instance of cross-sectional data because it represents data collected at a specific point in time (the end of the 2018-2019 season) without considering changes over time.
(True/False) The number of children in a family is an example of a discrete variable.
Answer: True Explanation: True. The number of children in a family is typically a discrete variable because it takes on countable, whole-number values (e.g., 1, 2, 3) and does not have infinite possible values within a range.
(True/False) Time series data can include observations at various time intervals, such as hourly, daily, and annual.
Answer: True Explanation: True. Time series data can encompass various time intervals, including hourly, daily, weekly, monthly, quarterly, or annual observations, depending on the nature of the data.
(True/False) Unstructured data may have some implied structure but are still considered unstructured.
Answer: True Explanation: True. While unstructured data may exhibit some implied or inherent structure, they are still classified as unstructured because they lack a specific, predefined format.
How do we make inferences about an unknown population parameter? a) By analyzing the entire population b) By using a sample statistic calculated from sample data c) By estimating the population parameter with certainty d) By obtaining complete information on the population
Answer: b) By using a sample statistic calculated from sample data Explanation: We make inferences about an unknown population parameter by analyzing a sample of data and calculating a sample statistic. This statistic is used to estimate or draw conclusions about the population parameter.
What is the primary characteristic of time series data? a) Data collected from various groups of people b) Data collected over several time periods c) Data collected without a specific focus on time d) Data collected from specific events
Answer: b) Data collected over several time periods Explanation: The primary characteristic of time series data is that it is collected over several time periods, focusing on the evolution or changes of certain groups of people, specific events, or objects over time.
What are the two main branches of statistics? a) Population and Sample statistics b) Descriptive and Inferential statistics c) Mean and Median statistics d) Quantitative and Qualitative statistics
Answer: b) Descriptive and Inferential statistics Explanation: Descriptive and Inferential statistics are the two primary branches of statistics. Descriptive statistics summarize data, while inferential statistics make inferences about populations based on sample data.
What is the recommendation of the omission strategy for dealing with observations with missing values? a) Replace missing values with imputed values b) Exclude observations with missing values from subsequent analysis c) Convert missing values into "Unknown" d) Group missing values into a separate category
Answer: b) Exclude observations with missing values from subsequent analysis Explanation: The omission strategy recommends excluding observations with missing values from subsequent analysis. This approach is suitable when missing data is deemed problematic, and excluding incomplete cases won't significantly affect the analysis.
What distinguishes an interval scale from a nominal or ordinal scale? a) Categorization of data b) Meaningful differences between values c) Presence of a true zero point d) Use of arbitrary values
Answer: b) Meaningful differences between values Explanation: An interval scale categorizes and ranks data, and the differences between values are meaningful. However, it does not have a true zero point, and the zero value is arbitrary, meaning it does not reflect the absence of the characteristic being measured.
Which of the following is an example of unstructured data? a) A structured database containing sales records b) Social media data from Twitter, YouTube, and Facebook c) Financial reports in spreadsheet format d) Scientific research data with well-defined variables
Answer: b) Social media data from Twitter, YouTube, and Facebook Explanation: Social media data, including posts, comments, and multimedia content from platforms like Twitter, YouTube, and Facebook, are classic examples of unstructured data because they lack a predefined structure and can be textual or multimedia in nature.
How are categorical variables typically coded for data processing? a) They are left in their original label or name form. b) They are transformed into numerical values. c) They are converted into textual descriptions. d) They are merged with numeric variables.
Answer: b) They are transformed into numerical values Explanation: Categorical variables are often coded into numerical values for data processing. This allows for easier analysis and manipulation of the data while retaining the information about categories.
What is the primary objective of subsetting in statistical analysis? a) To increase the size of the dataset b) To extract a portion of data relevant for subsequent analysis c) To create redundancy in the dataset d) To eliminate all missing values from the dataset
Answer: b) To extract a portion of data relevant for subsequent analysis Explanation: The primary objective of subsetting in statistical analysis is to extract a portion of the dataset that is relevant for subsequent analysis. This allows analysts to focus on specific subsets of data without including irrelevant or redundant information.
What characterizes Big Data? a) Small volume of structured data b) Data that is easily managed using traditional tools c) A massive volume of structured and unstructured data d) Data that is not generated by businesses
Answer: c) A massive volume of structured and unstructured data Explanation: Big Data is characterized by a massive volume of data, including both structured and unstructured data. It is not limited to small volumes and often presents challenges in managing, processing, and analyzing.
What is the primary role of statistics in dealing with data? a) Generating data b) Collecting data c) Analyzing data d) Storing data
Answer: c) Analyzing data Explanation: Statistics primarily involves the collection, preparation, analysis, interpretation, and presentation of data. Analyzing data is a key role of statistics.
Why is it generally not feasible to obtain population data for statistical analysis? a) Because it is cost-effective b) Because it is easy to collect c) Because it is impossible to examine every member of the population d) Because population data is readily available
Answer: c) Because it is impossible to examine every member of the population Explanation: It is generally not feasible to obtain population data because it is impossible to examine every member of the population, which is often very large or geographically dispersed.
What distinguishes a continuous variable from a discrete variable? a) Continuous variables assume a countable number of values. b) Continuous variables are typically measured in discrete values. c) Continuous variables have infinite possible values within an interval. d) Discrete variables have uncountable values.
Answer: c) Continuous variables have infinite possible values within an interval Explanation: Continuous variables can take on an uncountable number of values within an interval, and they are not restricted to whole numbers. Discrete variables have a countable number of distinct values.
Which step in the statistical analysis process involves using the appropriate statistical tools based on the data at hand? a) Data collection b) Data preparation c) Data analysis d) Data interpretation
Answer: c) Data analysis Explanation: The step that involves using the appropriate statistical tools based on the data is the data analysis step. The choice of tools depends on the data and the specific analysis goals.
What is the key characteristic of cross-sectional data? a) Data collected over a long period of time b) Data collected at multiple points in time c) Data collected at the same point in time d) Data collected only during specific seasons
Answer: c) Data collected at the same point in time Explanation: The defining characteristic of cross-sectional data is that it is collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time.
What is a characteristic of structured data? a) Data in unorganized, free-form text b) Data with subjective and interpretive content c) Data in a pre-defined, row-column format d) Data that is open to interpretation
Answer: c) Data in a pre-defined, row-column format Explanation: Structured data is characterized by its pre-defined, row-column format, which makes it organized and easily processed by spreadsheet or database applications.
In the context of data, what is the first step in the statistical analysis process? a) Interpretation b) Data collection c) Data preparation d) Data presentation
Answer: c) Data preparation Explanation: The first step in the statistical analysis process is finding the right data and preparing it for analysis. Data preparation is crucial to ensure data quality and suitability for analysis
What distinguishes a discrete variable from a continuous variable? a) Discrete variables are represented by non-whole numbers. b) Continuous variables assume a countable number of values. c) Discrete variables have infinite possible values. d) Continuous variables are always whole numbers.
Answer: c) Discrete variables have infinite possible values Explanation: Discrete variables assume a countable number of values, but they need not be whole numbers. Continuous variables, on the other hand, have infinite possible values and can take any value within a range.
Which of the following is an example of a ratio scale variable? a) Temperature b) Age c) Height d) Marital status
Answer: c) Height Explanation: Height is an example of a ratio scale variable because it has a true zero point, which reflects the absence of height (e.g., height of 0 cm means no height). Ratios are meaningful on a ratio scale, and arithmetic operations are valid.
How is Big Data described by the definition provided by Gartner? a) Low-volume, low-velocity, and low-variety information b) Information assets that are easy to process using traditional methods c) High-volume, high-velocity, and/or high-variety information assets d) Information assets that demand minimal information processing
Answer: c) High-volume, high-velocity, and/or high-variety information assets Explanation: According to Gartner's definition, Big Data involves high-volume, high-velocity, and/or high-variety information assets that require innovative forms of processing for enhanced insight, decision making, and process automation.
How are categorical variables typically expressed for data processing? a) Using whole numbers b) In decimal format c) In words or labels d) In fractions
Answer: c) In words or labels Explanation: Categorical variables are typically expressed using words or labels that represent different categories or groups. These labels are then coded into numerical values for data processing purposes.
What is a characteristic of the ordinal scale of measurement? a) It cannot categorize data. b) It can interpret the difference between ranked values. c) It uses arbitrary numbers to rank data. d) It is primarily used for quantitative variables.
Answer: c) It uses arbitrary numbers to rank data. Explanation: Ordinal scales categorize and rank data with respect to some characteristic, but the numbers assigned to the categories are arbitrary and do not represent consistent intervals. The difference between ranked values is not necessarily meaningful.
Which of the following datasets is an example of cross-sectional data? a) Monthly temperatures recorded over a year b) Daily stock prices for a single company over a year c) NBA Eastern Conference standings for a single season d) Annual inflation rates for multiple countries
Answer: c) NBA Eastern Conference standings for a single season Explanation: The NBA Eastern Conference standings for a single season, such as the 2018-2019 season, is an example of cross-sectional data because it represents data collected at the same point in time without regard to differences in time.
How does the imputation strategy recommend handling missing values in numeric variables? a) Exclude them from the analysis b) Replace with the median value c) Replace with the average value d) Convert them into categorical variables
Answer: c) Replace with the average value Explanation: The imputation strategy for numeric variables suggests replacing missing values with the average (or mean) value of the observed data. This imputed value provides a reasonable estimate to fill in the gaps.
Which of the following is an example of time series data? a) Monthly weather data for cities around the world b) Survey responses from different age groups c) Sales figures for various products within a year d) Historical population data for countries
Answer: c) Sales figures for various products within a year Explanation: Sales figures for various products within a year represent time series data because it involves collecting data over time (in this case, annually) to analyze trends and changes.
Which of the following applications is commonly used for structured data? a) Social media platforms b) Word processing software c) Spreadsheet or database applications d) Image editing software
Answer: c) Spreadsheet or database applications Explanation: Structured data is typically managed and analyzed using spreadsheet or database applications that are designed to handle data organized in a structured format.
What is the primary focus of descriptive statistics? a) Drawing conclusions about populations b) Collecting sample data c) Summarizing important aspects of a data set d) Conducting hypothesis tests
Answer: c) Summarizing important aspects of a data set Explanation: Descriptive statistics focus on summarizing important aspects of a data set, including collecting, organizing, and presenting data using charts and tables.
What does the characteristic "Velocity" of Big Data refer to? a) The credibility and quality of data b) The immense amount of data from multiple sources c) The rapid speed at which data is generated d) The methodological plan for data curation
Answer: c) The rapid speed at which data is generated Explanation: Velocity in the context of Big Data refers to the rapid speed at which data is generated, which can pose management challenges.
Which of the following is an example of a continuous variable? a) The number of students in a classroom b) The number of cars in a parking lot c) The weight of a newborn baby d) The score on a multiple-choice test
Answer: c) The weight of a newborn baby Explanation: The weight of a newborn baby is an example of a continuous variable because it can take on an infinite number of values within an interval. In practice, it may be measured in discrete values, but the underlying variable is continuous.
What is a characteristic of categorical data? a) They are also called quantitative data. b) They represent numerical values. c) They can be defined by two or more categories. d) They are primarily used for data processing.
Answer: c) They can be defined by two or more categories Explanation: Categorical data, also called qualitative data, represent categories or labels and can be defined by two or more distinct categories. They are not numerical in nature.
Why is counting and sorting among the first tasks analysts perform when inspecting data? a) To create summary statistics b) To identify outliers c) To gain a better understanding and insights into the data d) To perform complex statistical tests
Answer: c) To gain a better understanding and insights into the data Explanation: Counting and sorting data is one of the first tasks performed by analysts to gain a better understanding and insights into the data. It helps verify data completeness and provides a visual overview of the data's characteristics.
What is a characteristic of unstructured data? a) Data conforming to a pre-defined, row-column format b) Data primarily consisting of numerical content c) Data that follows database structures d) Data that do not conform to a row-column model
Answer: d) Data that do not conform to a row-column model Explanation: Unstructured data do not conform to a predefined, row-column format or database structures. They are often free-form and lack a structured layout.
What is one reason for the expense of obtaining information on the entire population? a) It involves fewer data collection methods b) It requires smaller sample sizes c) It includes analysis of sample data d) It involves surveying every member of the population
Answer: d) It involves surveying every member of the population Explanation: One reason for the expense of obtaining information on the entire population is that it would involve surveying or collecting data from every member of the population, which can be costly and time-consuming.
What kind of arithmetic operations are typically impossible to perform on categorical variables? a) Addition and subtraction b) Multiplication and division c) Calculations involving percentages d) Meaningful arithmetic operations
Answer: d) Meaningful arithmetic operations Explanation: Categorical variables are not suitable for performing meaningful arithmetic operations such as addition, subtraction, multiplication, or division. The numerical codes used for encoding are often arbitrary and lack mathematical significance.
Which scale of measurement is the least sophisticated and represents categories or groups based on labels or names? a) Ordinal b) Interval c) Ratio d) Nominal
Answer: d) Nominal Explanation: Nominal scales are the least sophisticated and represent categories or groups based on labels or names. They do not imply any order or ranking among the categories.
What is a characteristic of numeric data? a) They are also called qualitative data. b) They represent categories and labels. c) They are typically not represented by numbers. d) They represent meaningful numbers.
Answer: d) They represent meaningful numbers Explanation: Numeric data, also called quantitative data, represent meaningful numerical values. These values are used to quantify and measure characteristics of interest.
True/False) Ratios are meaningful on both interval and ratio scales.
Answer: False Explanation: False. Ratios are meaningful on a ratio scale, where there is a true zero point. On an interval scale, ratios are not meaningful because the zero value is arbitrary
(True/False) The objective of subsetting is to increase the dataset's size.
Answer: False Explanation: False. The objective of subsetting is not to increase the dataset's size but rather to focus on a specific, relevant subset of the data while excluding irrelevant or low-quality information.
(True/False) The omission strategy involves replacing missing values with reasonable imputed values.
Answer: False Explanation: False. The omission strategy involves excluding observations with missing values rather than replacing them with imputed values. It aims to work with complete cases only.
True/False) Cross-sectional data is collected by recording a characteristic of subjects at different points in time.
Answer: False Explanation: Cross-sectional data is collected by recording a characteristic of subjects at the same point in time or without regard to differences in time. It is not collected at different points in time.
(True/False) Cross-sectional data is typically used to analyze changes in data over time.
Answer: False Explanation: Cross-sectional data is not typically used to analyze changes over time. It provides a snapshot of data at a single point in time and is often used for comparisons or analysis of characteristics at that specific point in time.
(True/False) A discrete variable assumes an infinite number of values.
Answer: False Explanation: False. A discrete variable assumes a countable number of values, but it does not have an infinite number of possible values.
What is one reason for eliminating observations during the subsetting process? a) To retain all observations, regardless of quality b) To increase the data range for analysis c) To exclude redundant variables d) To eliminate observations with missing values, low-quality data, or outliers
Answer: d) To eliminate observations with missing values, low-quality data, or outliers Explanation: One of the reasons for eliminating observations during the subsetting process is to remove data that contains missing values, low-quality data, or outliers. This helps improve the overall quality of the data used in analysis
What is one of the purposes of sorting data? a) To create new variables b) To hide missing values c) To determine if there are outliers d) To review the range of values for each variable
Answer: d) To review the range of values for each variable Explanation: Sorting data allows analysts to review the range of values for each variable. It helps identify patterns, outliers, and anomalies in the data by arranging it in a specific order.
Which of the following is one of the three primary characteristics of Big Data? a) Velocity b) Veracity c) Values d) Volume
Answer: d) Volume Explanation: The three primary characteristics of Big Data are Volume (immense amount of data), Velocity (rapid speed of data generation), and Variety (all types, forms, granularity). Veracity and Values are additional characteristics associated with Big Data.