Udemy Practice Questions Course (CompTIA Data+)

Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 Which of the following value is the measure of the central tendency "mean"? A.) 56.3 B.) 56 C.) 56.6 D.) 56.9

Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 Which of the following value is the measure of the central tendency "mean"? C.) 56.6 -------------------------------------------------------------- *The mean is the sum of the value of each observation in a dataset divided by the number of observations. This is also known as the arithmetic average.

The Acme Corporation is working on a new data warehouse and business intelligence (DW/BI) project. They need to uncover data quality issues in data sources, and what needs to be corrected in Extract, transform, load (ETL). Which of the following methods should they use to validate the data? A.) Data Profiling B.) Cross-Validation C.) Data Correction D.) Data Auditing

The Acme Corporation is working on a new data warehouse and business intelligence (DW/BI) project. They need to uncover data quality issues in data sources, and what needs to be corrected in Extract, transform, load (ETL). Which of the following methods should they use to validate the data? A.) Data Profiling -------------------------------------------------------------- *Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.

Which of the following Logical functions would return TRUE? D.) =AND(A2="Bananas", B2>C2) -------------------------------------------------------------- *Microsoft Excel provides 4 logical functions to work with the logical values. *The functions are AND, OR, XOR, and NOT. *You use these functions when you want to carry out more than one comparison in your formula or test multiple conditions instead of just one. *As well as logical operators, Excel logical functions return either TRUE or FALSE when their arguments are evaluated.

Which of the following SQL commands will be used to prevent SQL injection attacks? A.) SELECT ? FROM ExamsDigest_Courses WHERE certName= * ORDER BY date B.) SELECT ? FROM ExamsDigest_Courses WHERE certName= "param" ORDER BY date C.) SELECT * FROM ExamsDigest_Courses WHERE certName= ? ORDER BY date D.) SELECT * FROM ExamsDigest_Courses WHERE certName= "param" ORDER BY date

Which of the following SQL commands will be used to prevent SQL injection attacks? C.) SELECT * FROM ExamsDigest_Courses WHERE certName= ? ORDER BY date -------------------------------------------------------------- *Parameterized SQL queries allow you to place parameters in an SQL query instead of a constant value. *A parameter takes a value only when the query is executed, which allows the query to be reused with different values and for different purposes. *Parameterized SQL statements are available in some analysis clients and are also available through the Historian SDK.

A SQL database administrator wants to display the total number of employees. Which of the following SQL commands should the administrator use to display the total number? A.) SELECT COUNT(+) FROM HumanResources.Employee; GO B.) SELECT COUNT(*) FROM HumanResources.Employee; GO C.) SELECT COUNT(NUM *) FROM HumanResources.Employee; GO D.) SELECT COUNT(NUM) FROM HumanResources.Employee; GO

A SQL database administrator wants to display the total number of employees. Which of the following SQL commands should the administrator use to display the total number? B.) SELECT COUNT(*) FROM HumanResources.Employee; GO -------------------------------------------------------------- *The rest of the commands use improper syntax.

A business analyst requests an analysis of data to display a table of all of a company's cloth products that were sold in the UK in June, compare the sales figures with those in September, and then compare them with other product sales in the UK over the same period. Which of the following methods enables the data scientist to extract and query data in order to analyze it from different angles? A.) Online Query Processing B.) Online Analytical Processing C.) Online Transactional Processing D.) Online Standard Processing

A business analyst requests an analysis of data to display a table of all of a company's cloth products that were sold in the UK in June, compare the sales figures with those in September, and then compare them with other product sales in the UK over the same period. Which of the following methods enables the data scientist to extract and query data in order to analyze it from different angles? B.) Online Analytical Processing -------------------------------------------------------------- *Online Analytical Processing (OLAP) queries help with trend analysis, financial reporting, sales forecasting, budgeting, and other planning purposes, among other things. OLAP is used for business analyses, including planning, budgeting, forecasting, data mining, and deals with few queries, but they are complex and involve a lot of data (for example, aggregate queries). Mainly uses the select statement. *Online transaction processing (OLTP) captures, stores, and processes data from transactions in real-time. An OLTP database stores and manages data related to everyday operations within a system or a company. However, OLTP is focused on transaction-oriented tasks. OLTP typically deals with query processing (inserting, updating, deleting data in a database), and maintaining data integrity and effectiveness when dealing with numerous transactions simultaneously. *Online query processing and Online standard processing are incorrect as these are imaginary terms.

A company wants to create a chart that presents the growth in online sales broken down by customer type, based on the mix of channels they used similar to the one you see below. *Image on other side* Which of the following type of visualization is MOST suitable? A.) Waterfall B.) Heat Map C.) Word Cloud D.) Infographic

A company wants to create a chart that presents the growth in online sales broken down by customer type, based on the mix of channels they used similar to the one you see below. Which of the following type of visualization is MOST suitable? A.) Waterfall -------------------------------------------------------------- *A waterfall visualization shows how an initial value is increased and decreased by a series of intermediate values, leading to a final cumulative value shown in the far-right column. *The intermediate values can either be time-based or category based. *A waterfall chart is a specific type of bar chart that reveals the story behind the net change in something's value between two points. *Instead of just showing a beginning value in one bar and an ending value in a second bar, a waterfall chart dis-aggregates all the unique components that contributed to that net change and visualizes them individually.

A financial analyst gathers information, assembles spreadsheets, and writes reports. He wants every time he makes changes to the files, these will be synced and updated across all of his devices. Which of the following storage environments does the analyst need to save his file? A.) Sync Storage B.) Share Drive C.) Local Storage D.) Cloud Storage

A financial analyst gathers information, assembles spreadsheets, and writes reports. He wants every time he makes changes to the files, these will be synced and updated across all of his devices. Which of the following storage environments does the analyst need to save his file? D.) Cloud Storage -------------------------------------------------------------- *Cloud services are popular because enable many businesses to access application software without the need for investing in computer software and hardware. *Other benefits include scalability, reliability, and efficiency.

A web analyst wants to display a chart of daily hits on the keyword "it certifications" for four different states with the help of Google Trends and Python. Which of the following data collection methods will effectively request data from google trends and display them in a chart? A.) Public Databases B.) Survey C.) Web Scraping D.) Application Programming Interface (API)

A web analyst wants to display a chart of daily hits on the keyword "it certifications" for four different states with the help of Google Trends and Python. Which of the following data collection methods will effectively request data from google trends and display them in a chart? D.) Application Programming Interface (API) -------------------------------------------------------------- *An increasingly popular method for collecting data online is via a Representational State Transfer Application Program Interface REST API or simply API. *Google makes many APIs available for public use. One such API is Google Trends. *This provides data on query activity of any keyword you can think of. Basically, this data tells you what topics people are interested in getting information on. *Python can be used for API calls. For example, to access the Google trends API we can use the Python library pytrends.

A web analyst wants to use automated bots to crawl through the internet and extract data from targeted sites. He wants the collected data to be delivered in a readable form such as CSV for further analysis. Which of the following data collection methods is the MOST appropriate method to employ? A.) Web Scraping B.) Public Databases C.) Survey D.) Application Programming Interface (API)

A web analyst wants to use automated bots to crawl through the internet and extract data from targeted sites. He wants the collected data to be delivered in a readable form such as CSV for further analysis. Which of the following data collection methods is the MOST appropriate method to employ? A.) Web Scraping -------------------------------------------------------------- *Web scraping is a process of using automated bots to crawl through the internet and extract data. *The bots collect information by first breaking down the targeted site to its most basic form, HTML text, then scan through to gather data according to some preset parameters. *After that, the collected data is delivered in CSV or Excel format, so it is readable for whoever wants to use it.

A web developer is developing a new application using React.js for the front-end and Python for the backend. He wants to store data in JavaScript Object Notation (JSON) format in order to transmit the data between the web application and the server. Which of the following formats represents a JSON file? A.) (Company- ExamsDigest - email - [email protected] - Country - United Kingdom) B.) <div>Company ExamsDigest</div> <div>Email [email protected]</div> <div>Country United Kingdom </div> C.) {"Company":"ExamsDigest", "email":"[email protected]", "Country":"United Kingdom"} D.) [Company] - [ExamsDigest] - [email] - [[email protected]] - [Country] - [United Kingdom]

A web developer is developing a new application using React.js for the front-end and Python for the backend. He wants to store data in JavaScript Object Notation (JSON) format in order to transmit the data between the web application and the server. Which of the following formats represents a JSON file? C.) {"Company":"ExamsDigest", "email":"[email protected]", "Country":"United Kingdom"}

A web developer wants to ensure that malicious users can't type SQL statements when they asked for input, like their username/userid. Which of the following query optimization techniques would effectively prevent SQL Injection attacks? A.) Indexing B.) Temporary Table in the query set C.) Subset of records D.) Parametrization

A web developer wants to ensure that malicious users can't type SQL statements when they asked for input, like their username/userid. Which of the following query optimization techniques would effectively prevent SQL Injection attacks? D.) Parametrization -------------------------------------------------------------- *Parameterized SQL queries allow you to place parameters in an SQL query instead of a constant value. *A parameter takes a value only when the query is executed, allowing the query to be reused with different values and purposes. *Parameterized SQL statements are available in some analysis clients and are also available through the Historian SDK. *For example, you could create the following conditional SQL query, which contains a parameter for the collector name: SELECT* FROM ExamsDigest WHERE coursename=? ORDER BY tagname *SQL Injection is best prevented through the use of parameterized queries.

Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 Which of the following value is the measure of the central tendency "median"? A.) 54 B.) 56 C.) 57 D.) 55

Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 Which of the following value is the measure of the central tendency "median"? C.) 57 -------------------------------------------------------------- *The median is the middle value in distribution when the values are arranged in ascending or descending order.

Suppose that we have observed the following n = 5 resting pulse rates: 64, 68, 74, 76, 78 Which of the following values is the variance of the resting pulse rates? A.) 27.4 B.) 27.2 C.) 27.1 D.) 27.3

Suppose that we have observed the following n = 5 resting pulse rates: 64, 68, 74, 76, 78 Which of the following values is the variance of the resting pulse rates? B.) 27.2 -------------------------------------------------------------- *The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean. *Variance tells you the degree of spread in your data set. The more spread the data, the larger the variance is in relation to the mean. Step 1: *Sample Mean = (sample data 1 + sample data 2) / n Sample Mean = (64 + 68 + 74 + 76 + 78) / 5 Sample Mean = 72 Step 2: *Variance = (sample data 1 - sample mean)^2 + (sample data 2 - sample mean)^2 / n Variance = (64-72)^2 + (68-72)^2 + (74-72)^2 + (74-72)^2 + (78-72)^2 / 5 Variance = 135 / 5 Variance = 27.2

True or False: Static reporting requires pulling up various reports from different sources and analyzing insights from a longer time period in order to provide a snapshot of data while Dynamic reports provide deep insights and allow users to interact with the data rather than just view it.

True or False: Static reporting requires pulling up various reports from different sources and analyzing insights from a longer time period in order to provide a snapshot of data while Dynamic reports provide deep insights and allow users to interact with the data rather than just view it. True

What's the difference between ad-hoc and structured reports? (Select two) A.) Structured reports are generated as needed for a one-time-use, in a visual format relevant to the audience B.) Ad-hoc reports use a large volume of data and are produced using a formalized reporting template C.) Structured reports use a large volume of data and are produced using a formalized reporting template D.) Ad-hoc reports are generated as needed for a one-time-use, in a visual format relevant to the audience

What's the difference between ad-hoc and structured reports? (Select two) C.) Structured reports use a large volume of data and are produced using a formalized reporting template D.) Ad-hoc reports are generated as needed for a one-time-use, in a visual format relevant to the audience -------------------------------------------------------------- *Structured reports are produced by people who have a high degree of technical experience working with business intelligence tools to mine and aggregate large amounts of data. *Ad hoc reporting relies on much smaller amounts of data. *This makes it easier for people in an enterprise to report on a specific data point that answers a specific business question.

Which of the following inferential statistical methods is a number describing how likely it is that your data would have occurred by random chance? A.) p-values B.) t-tests C.) Chi-squared D.) Z-score

Which of the following inferential statistical methods is a number describing how likely it is that your data would have occurred by random chance? A.) p-values -------------------------------------------------------------- *A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e. that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. *A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related to certain features. It is mostly used when the data sets, like the data set recorded as the outcome from flipping a coin 100 times, would follow a normal distribution and may have unknown variances. A t-test is used as a hypothesis testing tool to test an assumption applicable to a population. *A Z-score is a numerical measurement that describes a value's relationship to the mean of a group of values. Z-score is measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score. A Z-score of 1.0 would indicate a value that is one standard deviation from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean. *A chi-square test is a statistical test used to compare observed results with expected results. The purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying.

Which of the following inferential statistical methods is a numerical measurement that describes a value's relationship to the mean of a group of values? A.) t-test B.) p-value C.) Chi-squared D.) Z-score

Which of the following inferential statistical methods is a numerical measurement that describes a value's relationship to the mean of a group of values? D.) Z-score -------------------------------------------------------------- *A Z-score is a numerical measurement that describes a value's relationship to the mean of a group of values. Z-score is measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score. A Z-score of 1.0 would indicate a value that is one standard deviation from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean. *A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related to certain features. It is mostly used when the data sets, like the data set recorded as the outcome from flipping a coin 100 times, would follow a normal distribution and may have unknown variances. A t-test is used as a hypothesis testing tool to test an assumption applicable to a population. *A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e. that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. *A chi-square test is a statistical test used to compare observed results with expected results. The purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying.

Which of the following inferential statistical methods is a statistical test used to compare observed results with expected results? A.) Chi-squared B.) p-value C.) Z-score D.) t-test

Which of the following inferential statistical methods is a statistical test used to compare observed results with expected results? A.) Chi-squared -------------------------------------------------------------- *A chi-square test is a statistical test used to compare observed results with expected results. The purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying. *A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related to certain features. It is mostly used when the data sets, like the data set recorded as the outcome from flipping a coin 100 times, would follow a normal distribution and may have unknown variances. A t-test is used as a hypothesis testing tool to test an assumption applicable to a population. *A Z-score is a numerical measurement that describes a value's relationship to the mean of a group of values. Z-score is measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score. A Z-score of 1.0 would indicate a value that is one standard deviation from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean. *A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e. that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

Which of the following recurring reports help General Data Protection Regulation (GDPR) to maintain and prove compliance? A.) Risk and Regulatory Reports B.) Operational Reports C.) Compliance Reports D.) GDPR Reports

Which of the following recurring reports help General Data Protection Regulation (GDPR) to maintain and prove compliance? C.) Compliance Reports -------------------------------------------------------------- *Compliance reporting is the process of presenting information to auditors that show that your company is adhering to all the requirements set by the government and regulatory agency under a particular standard. *Compliance reports typically include information on how customer/company data is dealt with - how it is controlled or protected, obtained and stored, and how it is secured and distributed internally and externally.

Q3 2020 has just ended, and now a data analyst needs to create an ad-hoc sales report that demonstrates how well the Q3 2020 promotion went versus last year's Q3 promotion. Which of the following date parameters should the analyst use? A.) 2019 vs. YTD 2020 B.) Q3 2019 vs. Q3 2020 C.) YTD 2019 vs. YTD 2020 D.) Q4 2019 vs. Q3 2020

Q3 2020 has just ended, and now a data analyst needs to create an ad-hoc sales report that demonstrates how well the Q3 2020 promotion went versus last year's Q3 promotion. Which of the following date parameters should the analyst use? B.) Q3 2019 vs. Q3 2020

True or False: Τype I error is the mistaken rejection of the null hypothesis, also known as a "false positive", while a type II error is the mistaken acceptance of the null hypothesis, also known as a "false negative".

True or False: Τype I error is the mistaken rejection of the null hypothesis, also known as a "false positive", while a type II error is the mistaken acceptance of the null hypothesis, also known as a "false negative". True -------------------------------------------------------------- *Although type I and type II errors can never be avoided entirely, the investigator can reduce their likelihood by increasing the sample size (the larger the sample, the lesser is the likelihood that it will differ substantially from the population).

Which of the following are common examples of unstructured data? (Choose TWO) A.) JSON B.) No-SQL Databases C.) SQL Databases D.) Audio Files E.) XML

Which of the following are common examples of unstructured data? (Choose TWO) B.) No-SQL Databases D.) Audio Files -------------------------------------------------------------- *Unstructured data is more or less all the data that is not structured. *Even though unstructured data may have a native, internal structure, it's not structured in a predefined way. *There is no data model; the data is stored in its native format.

Which of the following is the FIRST step to develop a clean and well-structured dashboard according to CompTIA dashboard development process? A.) Grant approval B.) Develop dashboard C.) Deploy to production D.) Mockup design

Which of the following is the FIRST step to develop a clean and well-structured dashboard according to CompTIA dashboard development process? D.) Mockup design -------------------------------------------------------------- *The Steps according to CompTIA: 1.) Mocku/Wireframe design 2.) Approval granted 3.) Develop dashboard 4.) Deploy to production

A data analyst has been asked to create an ad-hoc sales report for the Chief Executive Officer (CEO). Which of the following should be included in the report? A.) The sales representatives' home addresses B.) Line-item SKU numbers C.) YTD total sales D.) The customers' first and last names

A data analyst has been asked to create an ad-hoc sales report for the Chief Executive Officer (CEO). Which of the following should be included in the report? C.) YTD total sales -------------------------------------------------------------- *For Answer B: A SKU stands for "Stock Keeping Unit" and is a code used to differentiate products typically by an alphanumeric combination of 8-or-so characters.

A data analyst wants to measure how well a piece of information reflects reality. Which of the following data quality dimensions does the data analyst need to assess? A.) Data Accuracy B.) Data Integrity C.) Data Completeness D.) Data Consistency

A data analyst wants to measure how well a piece of information reflects reality. Which of the following data quality dimensions does the data analyst need to assess? A.) Data Accuracy

Which of the following is an example of a discrete data type? A.) 8in. (20cm) B.) 5 kids C.) 2.5mi. (4km) D.) 10.7lbs. (4.9kg)

Which of the following is an example of a discrete data type? B.) 5 kids

Which of the following types of vulnerability scans an organization should perform as they store credit card information that follows the Payment Card Industry Data Security Standard (PCI DSS)? A.) Compliance Scan B.) Full Scan C.) Stealth Scan D.) Discovery Scan

Which of the following types of vulnerability scans an organization should perform as they store credit card information that follows the Payment Card Industry Data Security Standard (PCI DSS)? A.) Compliance Scan -------------------------------------------------------------- *If you are an organization that is governed by regulations due to the industry you are in or your business practices, you may have to perform vulnerability scans on a regular basis to show compliance with those regulations.

A company uses AWS cloud technologies for its scalability and high-performance features. The company wants to give access to software engineers to control the AWS infrastructure and deny access to the rest employees. Which of the following access control method the company SHOULD use to fulfill the requirement? A.) Software-based B.) Role-based C.) Developer-based D.) AWS-based

A company uses AWS cloud technologies for its scalability and high-performance features. The company wants to give access to software engineers to control the AWS infrastructure and deny access to the rest employees. Which of the following access control method the company SHOULD use to fulfill the requirement? B.) Role-based -------------------------------------------------------------- *Role-based access control (RBAC), also known as role-based security, is an access control method that assigns permissions to end-users based on their role within your organization. *RBAC provides fine-grained control, offering a simple, manageable approach to access management that is less error-prone than individually assigning permissions.

In which of the following data quality dimensions the data is following a set of standard data definitions like data type, size, and format (e.g. date of birth of customer is in the format "mm/dd/yyyy")? A.) Conformity B.) Integrity C.) Completeness D.) Consistency

In which of the following data quality dimensions the data is following a set of standard data definitions like data type, size, and format (e.g. date of birth of customer is in the format "mm/dd/yyyy")? A.) Conformity -------------------------------------------------------------- *Conformity means the data is following the set of standard data definitions like data type, size and format.

What Python library provides data analysts with access to tools that allow them to better structure data? A.) NumPy B.) TensorFlow C.) pandas D.) Keras

What Python library provides data analysts with access to tools that allow them to better structure data? C.) pandas

Which of the following are common examples of structured data? (Choose TWO) A.) Audio Files B.) SQL Databases C.) Excel Files D.) Video Files E.) No-SQL Databases

Which of the following are common examples of structured data? (Choose TWO) B.) SQL Databases C.) Excel Files

Which of the following contains alphanumeric values? A.) 10.1E^2 B.) 13.6 C.) 1347 D.) A3J7

Which of the following contains alphanumeric values? D.) A3J7

Which of the following data analytics tools can execute the following command? 1 | INSERT INTO Companies (CompanyName, ContactName, Address, City, PostalCode, Country) 2 | VALUES ('ExamsDigest', 'Nick G.', 'London St. 13', 'London', '45533', 'United Kingdom'); A.) Python B.) SQL C.) Microsoft Excel D.) R

Which of the following data analytics tools can execute the following command? 1 | INSERT INTO Companies (CompanyName, ContactName, Address, City, PostalCode, Country) 2 | VALUES ('ExamsDigest', 'Nick G.', 'London St. 13', 'London', '45533', 'United Kingdom'); B.) SQL

Which of the following measures of dispersion would help analysts to measure market and security volatility — and predict performance trends? A.) Range B.) Variance C.) Standard Deviation D.) Distribution

Which of the following measures of dispersion would help analysts to measure market and security volatility — and predict performance trends? C.) Standard Deviation -------------------------------------------------------------- *Standard deviation is a statistical measurement in finance that, when applied to the annual rate of return of an investment, sheds light on that investment's historical volatility. *The greater the standard deviation of securities, the greater the variance between each price and the mean, which shows a larger price range.

Which of the following statistical methods refers to the probability that a population parameter will fall between a set of values for a certain proportion of times? A.) Frequencies / Percentages B.) Percent Difference C.) Confidence Intervals D.) Percent Change

Which of the following statistical methods refers to the probability that a population parameter will fall between a set of values for a certain proportion of times? C.) Confidence Intervals -------------------------------------------------------------- *The confidence interval (CI) is a range of values that's likely to include a population value with a certain degree of confidence. *It is often expressed as a % whereby a population mean lies between an upper and lower interval.

Which of the following type of visualization is a graphical representation of word frequency that gives greater prominence to words that appear more frequently in a source text? A.) Word Cloud B.) Word Counter C.) Word Rate D.) Word Frequency

Which of the following type of visualization is a graphical representation of word frequency that gives greater prominence to words that appear more frequently in a source text? A.) Word Cloud -------------------------------------------------------------- *The larger the word in the visual the more common the word was in the document(s). *This type of visualization can assist evaluators with exploratory textual analysis by identifying words that frequently appear in a set of interviews, documents, or other text. *The other options do not exist

Which of the following value is the measure of dispersion "range" between the scores of ten students in a test. The scores of ten students in a test are 17, 23, 30, 36, 45, 51, 58, 66, 72, 77. A.) 90 B.) 70 C.) 80 D.) 60

Which of the following value is the measure of dispersion "range" between the scores of ten students in a test. The scores of ten students in a test are 17, 23, 30, 36, 45, 51, 58, 66, 72, 77. D.) 60 -------------------------------------------------------------- *Range = 77 - 17 = 60 *Range is the interval between the highest and the lowest score. *Range is a measure of variability or scatteredness of the varieties or observations among themselves and does not give an idea about the spread of the observations around some central value.

A data analyst wants to make a rough comparison of two graphs of variability, considering only the most extreme cases. Which of the following measures of dispersion would effectively make the comparison? A.) Range B.) Variance C.) Distribution D.) Standard Deviation

A data analyst wants to make a rough comparison of two graphs of variability, considering only the most extreme cases. Which of the following measures of dispersion would effectively make the comparison? A.) Range -------------------------------------------------------------- *Range is the interval between the highest and the lowest score. *Range is a measure of variability or scatteredness of the varieties or observations among themselves and does not give an idea about the spread of the observations around some central value.

Melinda is analyzing a movie dataset, where individual films have a star rating between 1 and 5. What type of data is this? A.) Nonparametric data B.) Redundant data C.) Duplicate data D.) Data outlier

Melinda is analyzing a movie dataset, where individual films have a star rating between 1 and 5. What type of data is this? A.) Nonparametric data

Records from governmental agencies, student records information, and existing human research subjects' data are examples of: A.) Data Transmission B.) Data Use Agreements C.) Release Approvals D.) Data Constraints

Records from governmental agencies, student records information, and existing human research subjects' data are examples of: B.) Data Use Agreements -------------------------------------------------------------- *A Data Use Agreement (DUA) is a contractual document used for transferring non-public or restricted-use data. *It is an agreement that is required under the Privacy Rule and must be entered into before there is any use or disclosure of a limited data set (defined below) to an outside institution or party.

True or False: Cardinality refers to the maximum number of times an instance in one entity can relate to instances of another entity while ordinality is the minimum number of times an instance in one entity can be associated with an instance in the related entity.

True or False: Cardinality refers to the maximum number of times an instance in one entity can relate to instances of another entity while ordinality is the minimum number of times an instance in one entity can be associated with an instance in the related entity. True -------------------------------------------------------------- *Ordinality is also closely linked to cardinality. *While cardinality specifies the occurrences of a relationship, ordinality describes the relationship as either mandatory or optional. *In other words, cardinality specifies the maximum number of relationships and ordinality specifies the absolute minimum number of relationships.

Which of the following Date function commands would effectively calculate the number of days, months, or years between two dates? A.) DATEDIF() B.) DAYS() C.) DATEVALUE() D.) MONTH()

Which of the following Date function commands would effectively calculate the number of days, months, or years between two dates? A.) DATEDIF()

Which of the following policies is a set of guidelines that helps organizations keep track of how long information must be kept and how to dispose of the information when it's no longer needed? A.) Data retention policy B.) Acceptable use policy C.) Data processing policy D.) Data deletion policy

Which of the following policies is a set of guidelines that helps organizations keep track of how long information must be kept and how to dispose of the information when it's no longer needed? A.) Data retention policy -------------------------------------------------------------- *The policy should also outline the purpose of processing personal data. *This ensures that you have documented proof that justifies your data retention and disposal periods.

Which of the following recurring reports help companies to reach business goals, identify strengths, weaknesses, and trends? A.) Compliance Reports B.) Risk and Regulatory Reports C.) Operational Reports D.) Business Goal Reports

Which of the following recurring reports help companies to reach business goals, identify strengths, weaknesses, and trends? C.) Operational Reports -------------------------------------------------------------- *Operational reporting is an effective, results-driven means of tracking, measuring, and analyzing a business's regular deliverables and metrics, usually on a daily, weekly, and sometimes monthly basis with the help of modern and professional BI reporting tools. *A KPI report which is an operational report is a management tool that facilitates the measurement, organization, and analysis of the most important business key performance indicators.

One of the benefits of using Amazon QuickSight is: A.) Scale from one to one of thousands of users B.) Subscription-based charges C.) Embed BI dashboards in your applications D.) QuickSight dashboards can be accessed only from mobile devices

One of the benefits of using Amazon QuickSight is: C.) Embed BI dashboards in your applications -------------------------------------------------------------- *Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud. QuickSight lets you easily create and publish interactive BI dashboards that include Machine Learning-powered insights. QuickSight dashboards can be accessed from any device, and seamlessly embedded into your applications, portals, and websites. *Benefits: 1.) Scale from tens to tens of thousands of users 2.) Embed BI dashboards in your applications 3.) Access deeper insights with Machine Learning 4.) Ask questions of your data, receive answers

The ACME Corporation hired an analyst to detect data quality issues in their excel documents. Which of the following are the most common issues? (Select two) A.) Misspellings B.) Duplicates C.) Apostrophe D.) Commas E.) Symbols

The ACME Corporation hired an analyst to detect data quality issues in their excel documents. Which of the following are the most common issues? (Select two) A.) Misspellings B.) Duplicates

Which of the following can be used to translate data into another form so it can only be read by a user who has a key or a password? A.) Data encryption B.) Data transmission C.) Data protection D.) Data masking

Which of the following can be used to translate data into another form so it can only be read by a user who has a key or a password? A.) Data encryption

Which of the following is one of the most important principles you have to take into account for dashboard development? A.) Consider your audience B.) Consider the right type of dashboard C.) Consider your budget D.) Consider the coding language

Which of the following is one of the most important principles you have to take into account for dashboard development? A.) Consider your audience -------------------------------------------------------------- *You need to know who's going to use the dashboard. *You need to put yourself in your audience's shoes. *The context and device on which users will regularly access their dashboards will have direct consequences on the style in which the information is displayed.

Maria, a senior database manager, wants to create a new DB for the sales department. She needs to create 2 tables, Employees and Sales respectively. The Employees table contains a single row representing an employee with each employee assigned a unique id (primary key). The second table, Sales, contains individual sales records associated with the employee who made the sale. Which of the following database types will Maria implement? A.) Relational B.) Non-Relational C.) Snowflake D.) Star

Maria, a senior database manager, wants to create a new DB for the sales department. She needs to create 2 tables, Employees and Sales respectively. The Employees table contains a single row representing an employee with each employee assigned a unique id (primary key). The second table, Sales, contains individual sales records associated with the employee who made the sale. Which of the following database types will Maria implement? A.) Relational -------------------------------------------------------------- *A relational database, also called Relational Database Management System (RDBMS) or SQL database, stores data in tables and rows also referred to as records. A relational database works by linking information from multiple tables through the use of "keys." *The non-relational database, or NoSQL database, stores data. However, unlike the relational database, there are no tables, rows, primary keys, or foreign keys. Instead, the non-relational database uses a storage model optimized for specific requirements of the type of data being stored. *Snowflake Schema in the data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. *Star schema in the data warehouse, in which the center of the star can have one fact table and a number of associated dimension tables. It is known as star schema as its structure resembles a star. The Star Schema data model is the simplest type of Data Warehouse schema.

A data analyst is looking for software that includes accounting and budgeting templates for easy use and built-in calculating and formula features to organize and synthesize results. Which of the following data analytics tools is MOST suitable in this case? A.) IBM Cognos B.) R C.) Minitab D.) Microsoft Excel

A data analyst is looking for software that includes accounting and budgeting templates for easy use and built-in calculating and formula features to organize and synthesize results. Which of the following data analytics tools is MOST suitable in this case? D.) Microsoft Excel -------------------------------------------------------------- *Microsoft Excel gives businesses the tools they need to make the most of their data. At its most basic level, Excel is an excellent tool for both data entry and storage. Excel even includes accounting and budgeting templates for easy use. *R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. *Everyone in your organization can use IBM Cognos BI to view or create business reports, analyze data, and monitor events and metrics so that they can make effective business decisions. *Minitab empowers all parts of an organization to predict better outcomes, design better products, and improve processes to generate higher revenues and reduce costs.

A data analyst wants to create "Income Categories" that would be calculated based on the existing variable "Income". The "Income Categories" would be as follows: Income category 1: less than $1 Income category 2: more than $1 and less than $20,000 Income category 3: more than $20,001 and less than $40,000 Income category 4: more than $40,001 Which of the following data manipulation techniques should the data analyst use to create "Income Categories"? A.) Data Append B.) Data Blending C.) Derived Variables D.) Data Merge

A data analyst wants to create "Income Categories" that would be calculated based on the existing variable "Income". The "Income Categories" would be as follows: Income category 1: less than $1 Income category 2: more than $1 and less than $20,000 Income category 3: more than $20,001 and less than $40,000 Income category 4: more than $40,001 Which of the following data manipulation techniques should the data analyst use to create "Income Categories"? C.) Derived Variables -------------------------------------------------------------- *Derived variables are variables that you create by calculating or categorizing variables that already exist in your data set. *Data merging is the process of combining two or more data sets into a single data set. *Data blending involves pulling data from different sources and creating a single, unique, dataset for visualization and analysis. *A data append is a process that involves adding new data elements to an existing database.

A data analyst wants to show and hide information on his sheet based on selected criteria. Which of the following data manipulation techniques is most appropriate in this case? A.) Indexing B.) Filtering C.) Sorting D.) Parametrization

A data analyst wants to show and hide information on his sheet based on selected criteria. Which of the following data manipulation techniques is most appropriate in this case? B.) Filtering -------------------------------------------------------------- *Filters allow you to show or hide information on your sheet based on selected criteria. *They're useful because they don't change the overall layout of your sheet. *You can also save filters and share them with anyone who is shared to the sheet. *You can even set default filters on your sheet so that when shared users open that sheet, they see the same view.

A data scientist wants to see which products make the most money and which products attract the most customer purchasing interest in their company. Which of the following data manipulation techniques would he use to obtain this information? A.) Data Merge B.) Data Append C.) Normalize Data D.) Data Blending

A data scientist wants to see which products make the most money and which products attract the most customer purchasing interest in their company. Which of the following data manipulation techniques would he use to obtain this information? D.) Data Blending -------------------------------------------------------------- *Data blending is combining multiple data sources to create a single, new dataset, which can be presented visually in a dashboard or other visualization and can then be processed or analyzed. *Enterprises get their data from a variety of sources, and users may want to temporarily bring together different datasets to compare data relationships or answer a specific question.

A database administrator designs a new database with two tables. 1. Employee 2. Department The Employee table has the following columns: 1. Employee_birthdate 2. Employee_Year_Of_Registration_In_Company 3. Employee_Total_Years_Of_Registration Which of the following refers to the situation that there is data in the database that can be removed without losing information? A.) Redundant Data B.) Invalid Data C.) Duplicate Data D.) Non-Parametric Data

A database administrator designs a new database with two tables. 1. Employee 2. Department The Employee table has the following columns: 1. Employee_birthdate 2. Employee_Year_Of_Registration_In_Company 3. Employee_Total_Years_Of_Registration Which of the following refers to the situation that there is data in the database that can be removed without losing information? A.) Redundant Data -------------------------------------------------------------- *The term data redundancy is in the same context, relational database design, used to refer to the situation that there is data in the database that can be removed without losing information. *Consider our case where for each employee there is a birthdate, a year of registration in the company, and the total number of years in the company. *The last piece of information can be derived from the other two, and so is redundant.

A database administrator is responsible to optimize the data queries from 800ms to 300ms by using indexing. Which of the following techniques, indexing uses to make columns faster to query? A.) Filter the data using logical functions B.) Duplicate the table where data is stored with less data C.) Sort the data alphabetically D.) Create pointers where data is stored within a database

A database administrator is responsible to optimize the data queries from 800ms to 300ms by using indexing. Which of the following techniques, indexing uses to make columns faster to query? D.) Create pointers where data is stored within a database -------------------------------------------------------------- *Indexing makes columns faster to query by creating pointers to where data is stored within a database. *Indexes allow us to create sorted lists without having to create all new sorted tables, which would take up a lot of storage space. *An index is a structure that holds the field the index is sorting and a pointer from each record to their corresponding record in the original table where the data is actually stored. Indexes are used in things like a contact list where the data may be physically stored in the order you add people's contact information, but it is easier to find people when listed out in alphabetical order.

A forecasting analyst wants to predict sales for a company based on weather, previous sales, and GDP growth. Which of the following inferential statistical methods is MOST suitable for this case? A.) Regression B.) Correlation C.) Chi-squared D.) t-test

A forecasting analyst wants to predict sales for a company based on weather, previous sales, and GDP growth. Which of the following inferential statistical methods is MOST suitable for this case? A.) Regression -------------------------------------------------------------- *Regression is a statistical method used in finance, investing, and other disciplines that attempt to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables). *A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related to certain features. It is mostly used when the data sets, like the data set recorded as the outcome from flipping a coin 100 times, would follow a normal distribution and may have unknown variances. A t-test is used as a hypothesis testing tool to test an assumption applicable to a population. *Correlation, in the finance and investment industries, is a statistic that measures the degree to which two securities move in relation to each other. Correlations are used in advanced portfolio management, computed as the correlation coefficient, which has a value that must fall between -1.0 and +1.0. *A chi-square test is a statistical test used to compare observed results with expected results. The purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying.

A hypothesis test sometimes rejects the null hypothesis even if the true value of the population parameter is the same as the value in the null hypothesis. This type of result is known as: A.) A Type I Error B.) A Type II Error C.) A correct inference D.) The confidence level of the inference

A hypothesis test sometimes rejects the null hypothesis even if the true value of the population parameter is the same as the value in the null hypothesis. This type of result is known as: A.) A Type I Error -------------------------------------------------------------- *Type I Error = "false positive"; occurs when a researcher incorrectly rejects a true null hypothesis. This means that your reported findings where significant when in fact they have occurred by chance. *Likely Cause: sample size is too small *Type II Error = "false negative"; occurs when a researcher fails to reject the null hypothesis. This means that your reported findings where not significant when in fact there is a significant effect found in the population. *Likely Cause: statistical power is too low

A junior web developer is developing a new application where users can upload short videos. The first task is to create a homepage that shows the headline "Upload Your Short Videos" and a clickable button that says, "upload now". Which of the following HTML commands would help the developer to complete the task successfully? A.) <h1>Upload Your Short Videos</h1> <button>upload now</button> B.) <h1>Upload Your Short Videos</h1> <h1>upload now</h1> C.) <span>Upload Your Short Videos</span> <button>upload now</button> D.) <p>Upload Your Short Videos</p> <p>upload now</p>

A junior web developer is developing a new application where users can upload short videos. The first task is to create a homepage that shows the headline "Upload Your Short Videos" and a clickable button that says, "upload now". Which of the following HTML commands would help the developer to complete the task successfully? A.) <h1>Upload Your Short Videos</h1> <button>upload now</button> -------------------------------------------------------------- *The <h1> to <h6> tags are used to define HTML headings. <h1> defines the most important heading. <h6> defines the least important heading. *Note: Only use one <h1> per page - this should represent the main heading/subject for the whole page. *The <button> tag defines a clickable button.

A junior web developer wants to develop a new eCommerce app using Python for the back end, MongoDB for the database, and Vue.js for the front end. He wants to create a simple list called apparel with three lists items ("t-shirts, hoodies, jackets"). Which of the following command does he need to type to create the list on Python? A.) apparel = "t-shirts", "hoodies", "jackets" B.) apparel = ["t-shirts", "hoodies", "jackets"] C.) apparel = ("t-shirts", "hoodies", "jackets") D.) apparel = {"t-shirts", "hoodies", "jackets"}

A junior web developer wants to develop a new eCommerce app using Python for the back end, MongoDB for the database, and Vue.js for the front end. He wants to create a simple list called apparel with three lists items ("t-shirts, hoodies, jackets"). Which of the following command does he need to type to create the list on Python? B.) apparel = ["t-shirts", "hoodies", "jackets"]

A web developer which develops an e-commerce marketplace designs its database to capture, store and process data from transactions in real-time. The e-commerce platform will deal with many standard and straightforward queries such as insert, delete, and update and the data will be stored in 3NF (third normal form). In which of the following databases would the developer store the data? A.) Online Standard Processing B.) Online Query Processing C.) Online Transactional Processing D.) Online Analytical Processing

A web developer which develops an e-commerce marketplace designs its database to capture, store and process data from transactions in real-time. The e-commerce platform will deal with many standard and straightforward queries such as insert, delete, and update and the data will be stored in 3NF (third normal form). In which of the following databases would the developer store the data? C.) Online Transactional Processing -------------------------------------------------------------- *Online transaction processing (OLTP) captures, stores, and processes data from transactions in real-time. *An OLTP database stores and manages data related to everyday operations within a system or a company. *However, OLTP is focused on transaction-oriented tasks. *OLTP typically deals with query processing (inserting, updating, deleting data in a database), and maintaining data integrity and effectiveness when dealing with numerous transactions simultaneously. *Each transaction involves individual database records made up of multiple fields or columns. *Examples include banking and credit card activity or retail checkout scanning. *Online analytical processing is incorrect. Online analytical processing (OLAP) uses complex queries to analyze aggregated historical data from OLTP systems. *OLTP and OLAP are two systems that complement each other. While OLTP deals with processing day-to-day transactions, OLAP helps analyze the processed data. *Online query processing and Online standard processing are incorrect as these are imaginary terms.

An Amazon seller wants to generate a revenue report that includes the number of sales and the number of refunds between June and August. Which of the following should the seller use to generate the report? A.) Date Range B.) Data Content C.) Views D.) Frequency

An Amazon seller wants to generate a revenue report that includes the number of sales and the number of refunds between June and August. Which of the following should the seller use to generate the report? A.) Date Range -------------------------------------------------------------- *A date range report is a custom report that allows you either to select a month to include in the report or to choose specific start and end dates for the data included in the report.

An online payment company wants to develop a new fraud detection system to protect customers against disputes and fraudulent payments. Which of the following data analytics tools is MOST suitable for developing the fraud detection system? A.) Qlik B.) Tableau C.) Dataroma D.) SPSS Modeler

An online payment company wants to develop a new fraud detection system to protect customers against disputes and fraudulent payments. Which of the following data analytics tools is MOST suitable for developing the fraud detection system? D.) SPSS Modeler -------------------------------------------------------------- *IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks.

An online shop sends orders based on the zip code customers type during the checkout process. Many orders never received the customers because customers misspell the zip code into the input field. Which of the following solutions SHOULD the online shop implement to solve this issue? A.) Data Outliers B.) Non-Parametric Data C.) Specification Mismatch D.) Data Type Validation

An online shop sends orders based on the zip code customers type during the checkout process. Many orders never received the customers because customers misspell the zip code into the input field. Which of the following solutions SHOULD the online shop implement to solve this issue? D.) Data Type Validation -------------------------------------------------------------- *Data validation refers to the process of ensuring the accuracy and quality of data. *It is implemented by building several checks into a system or report to ensure the logical consistency of input and stored data. *In automated systems, data is entered with minimal or no human supervision. *Therefore, it is necessary to ensure that the data that enters the system is correct and meets the desired quality standards. *The data will be of little use if it is not entered properly and can create bigger downstream reporting issues.

An online shop wants to expand its customer base using SMS marketing campaigns. The online shop has already a database with customers with the following columns. 1. First_Name 2. Last_Name 3. Email. The head of marketing wants to add one more column titled "Phone_Numbers". Also, he wants to match the customer's names with the respective phone numbers. Which of the following data manipulation techniques would the head of marketing use to add the phone numbers into the database without affecting the existing data? A.) Imputation B.) Transpose C.) Normalize Data D.) Data Append

An online shop wants to expand its customer base using SMS marketing campaigns. The online shop has already a database with customers with the following columns. 1. First_Name 2. Last_Name 3. Email. The head of marketing wants to add one more column titled "Phone_Numbers". Also, he wants to match the customer's names with the respective phone numbers. Which of the following data manipulation techniques would the head of marketing use to add the phone numbers into the database without affecting the existing data? D.) Data Append -------------------------------------------------------------- *Data append is a process that involves adding new data elements to an existing database. An example of a common data append would be the enhancement of a company's customer files. A data append takes the information they have, matches it against a larger database of business data, allowing the desired missing data fields to be added. *Imputation is the process of replacing missing data with substituted values. *Transposing data is where the data in the rows are turned into columns, and the data in the columns is turned into rows. (See image for example) *Data normalization is the process of structuring your relational customer database, following a series of normal forms. This improves the accuracy and integrity of your data while ensuring that your database is easier to navigate.

Consider the following data set: (Image on other side) **Image = students w/ class grades and "total" grades** Which of the following data manipulation techniques would arrange the data set in decreasing order of total marks so that who scored highest marks is in the top row, and who scored lowest marks is in the last row? A.) Filtering B.) IsEmpty() C.) CURDATE() D.) Sorting

Consider the following data set: (Image on other side) Which of the following data manipulation techniques would arrange the data set in decreasing order of total marks so that who scored highest marks is in the top row, and who scored lowest marks is in the last row? D.) Sorting -------------------------------------------------------------- *Sorting lets you organize all or part of your data in ascending or descending order. *Note that you cannot undo a sort after it has been saved...so you'll want to make sure that all of your rows in your sheet, including parent rows in a hierarchy, are ordered the way you want before saving.

Consider the following dataset which contains information about houses that are for sale. Which of the following string manipulation commands will replace the "St" characters in the address column with the word "Street"? (Image on other side) A.) SELECT address, REPLACE(address, 'Street', 'St') AS new_address FROM melb LIMIT 5; B.) SELECT address, REPLACE(address, 'St', 'Street') AS new_address FROM melb LIMIT 5; C.) SELECT address, REPLACE('Street', 'St') AS new_address FROM melb LIMIT 5; D.) SELECT address, REPLACE('St', 'Street') AS new_address FROM melb LIMIT 5;

Consider the following dataset which contains information about houses that are for sale. Which of the following string manipulation commands will replace the "St" characters in the address column with the word "Street"? B.) SELECT address, REPLACE(address, 'St', 'Street') AS new_address FROM melb LIMIT 5; -------------------------------------------------------------- *Proper Syntax: REPLACE(column_name, old_string, new_string) *Note: The search is not case-sensitive. *String manipulation (or string handling) is the process of changing, parsing, splicing, pasting, or analyzing strings. *The REPLACE() function replaces all occurrences of a substring within a string, with a new substring.

Consider the following dataset which contains information about houses that are for sale. (Image on other side) Which of the following string manipulation commands will combine the address and region name columns to create a full address? A.) SELECT CONCAT(address, ',', regionname) AS full_address FROM melb LIMIT 5; B.) SELECT CONCAT(address, '-', regionname) AS full_address FROM melb LIMIT 5; C.) SELECT CONCAT(regionname, '-', address) AS full_address FROM melb LIMIT 5; D.) SELECT CONCAT(regionname, ',', address) AS full_address FROM melb LIMIT 5;

Consider the following dataset which contains information about houses that are for sale. (Image on other side) Which of the following string manipulation commands will combine the address and region name columns to create a full address? A.) SELECT CONCAT(address, ',', regionname) AS full_address FROM melb LIMIT 5; -------------------------------------------------------------- *String manipulation (or string handling) is the process of changing, parsing, splicing, pasting, or analyzing strings. *SQL is used for managing data in a relational database. *The CONCAT() function adds two or more strings together.

Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 This table shows a simple frequency distribution of the retirement age data. Age | Frequency 54 | 3 55 | 1 56 | 1 57 | 2 58 | 2 60 | 2 Which of the following value is the measure of the central tendency "mode"? A.) 55 B.) 54 C.) 57 D.) 56

Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 This table shows a simple frequency distribution of the retirement age data. Which of the following value is the measure of the central tendency "mode"? B.) 54 -------------------------------------------------------------- *There are three main measures of central tendency: the mode, the median and the mean. Each of these measures describes a different indication of the typical or central value in the distribution. *The mode is the most commonly occurring value in a distribution.

The human resources (HR) department of the ACME Corporation needs to monitor changes in employee satisfaction over time. Which of the following data collection methods will BEST help the HR department to monitor the changes? A.) Survey B.) Application Programming Interface (API) C.) Web Scraping D.) Web Services

The human resources (HR) department of the ACME Corporation needs to monitor changes in employee satisfaction over time. Which of the following data collection methods will BEST help the HR department to monitor the changes? A.) Survey -------------------------------------------------------------- *A survey is defined as the act of examining a process or questioning a selected sample of individuals to obtain data about a service, product, or process. *Data collection surveys collect information from a targeted group of people about their opinions, behavior, or knowledge. *Common types of example surveys are written questionnaires, face-to-face or telephone interviews, focus groups, and electronic (e-mail or website) surveys. *It is helpful to use surveys when: 1.) Identifying customer requirements or preferences 2.) Assessing customer or employee satisfaction, such as identifying or prioritizing problems to address 3.) Evaluating proposed changes 4.) Assessing whether a change was successful 5.) Monitoring changes in customer or employee satisfaction over time

The sales of a grocery store had an average of $8,000 per day. The store introduced several advertising campaigns in order to increase sales. To determine whether the advertising campaigns have been effective in increasing sales, a sample of 64 days of sales was selected, and the sample mean was $8,300 per day. The correct null and alternative hypotheses to test whether there has been a significant increase are: A.) Null: Sample mean is 8,000; Alternative: Sample mean is greater than or equal to 8,000. B.) Null: Sample mean is 8,000; Alternative: Sample mean is greater than 8,000. C.) Null: Population mean is 8,000; Alternative: Population mean is greater than or equal to 8,000. D.) Null: Population mean is 8,000; Alternative: Population mean is greater than 8,000.

The sales of a grocery store had an average of $8,000 per day. The store introduced several advertising campaigns in order to increase sales. To determine whether the advertising campaigns have been effective in increasing sales, a sample of 64 days of sales was selected, and the sample mean was $8,300 per day. The correct null and alternative hypotheses to test whether there has been a significant increase are: D.) Null: Population mean is 8,000; Alternative: Population mean is greater than 8,000.

Which of the following categories would contain information about an individual's biometric data, genetic information, and sexual orientation? A.) Sensitive personal information B.) Personal health information C.) Intellectual property D.) Personally identifiable information

Which of the following categories would contain information about an individual's biometric data, genetic information, and sexual orientation? A.) Sensitive personal information -------------------------------------------------------------- *Sensitive Personal Information (SPI) refers to information that does not identify an individual, but is related to an individual, and communicates information that is private or could potentially harm an individual should it be made public. *Personally identifiable information (PII) is any data that can be used to identify a specific individual. Social Security numbers, mailing or email addresses, and phone numbers have most commonly been considered PII, but technology has expanded the scope of PII considerably. *Protected health information (PHI), also referred to as personal health information, generally refers to demographic information, medical histories, test and laboratory results, mental health conditions, insurance information, and other data that a healthcare professional collects to identify an individual and determine appropriate care. *Intellectual property (IP) is a term for any intangible asset — something proprietary that doesn't exist as a physical object but has value.Examples of intellectual property include designs, concepts, software, inventions, trade secrets, formulas, and brand names, as well as works of art.

Which of the following data analytics tools is a cloud-based platform designed to provide direct, simplified, real-time access to business data for decision-makers across the company with minimal IT involvement? A.) Delta Load B.) Snowflake C.) Domo D.) OLTP

Which of the following data analytics tools is a cloud-based platform designed to provide direct, simplified, real-time access to business data for decision-makers across the company with minimal IT involvement? C.) Domo -------------------------------------------------------------- *Domo specializes in business intelligence tools and data visualization.

Which of the following data file formats transmits data in web applications? A.) JSON B.) Flat C.) HTML D.) XML

Which of the following data file formats transmits data in web applications? A.) JSON -------------------------------------------------------------- *JSON transmits data in web applications. *XML provides a standard method to access information. *HTML creates web pages and web applications. *Flat imports data in data warehousing projects.

Which of the following data integration processes combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse? A.) Application Programming Interface (API) B.) Extract, Transform, Load (ETL) C.) Extract, Load, Transform (ELT) D.) Delta Load

Which of the following data integration processes combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse? B.) Extract, Transform, Load (ETL) -------------------------------------------------------------- WRONG ANSWERS: *ELT stands for "Extract, Load, and Transform." In this process, data gets leveraged via a data warehouse in order to do basic transformations. That means there's no need for data staging. ELT uses cloud-based data warehousing solutions for all different types of data - including structured, unstructured, semi-structured, and even raw data types. *Delta is the incremental load between the last data load and now. For example, if your yesterday's load inserted 50 records into your target table and today 80 new records have come to your source system, you insert only the latest 80 records into the target after checking against the target table. These 80 records are the delta records. *API is the acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other.When you use an application on your mobile phone, the application connects to the Internet and sends data to a server. The server then retrieves that data, interprets it, performs the necessary actions and sends it back to your phone. The application then interprets that data and presents you with the information you wanted in a readable way.

Which of the following data manipulation techniques improves the accuracy and integrity of your data while ensuring that your database is easier to navigate? A.) Normalize Data B.) Data Blending C.) Data Append D.) Data Merge

Which of the following data manipulation techniques improves the accuracy and integrity of your data while ensuring that your database is easier to navigate? A.) Normalize Data -------------------------------------------------------------- *Data normalization is the process of structuring your relational customer database, following a series of normal forms. This improves the accuracy and integrity of your data while ensuring that your database is easier to navigate. *Data append is a process that involves adding new data elements to an existing database. An example of a common data append would be the enhancement of a company's customer files. A data append takes the information they have, matches it against a larger database of business data, allowing the desired missing data fields to be added. *Data blending is combining multiple data sources to create a single, new dataset, which can be presented visually in a dashboard or other visualization and can then be processed or analyzed. *Data merging is the process of combining two or more data sets into a single data set.

Which of the following inferential statistical methods determine if there is a significant difference between the means of two groups? A.) Chi-squared B.) t-test C.) p-value D.) Z-score

Which of the following inferential statistical methods determine if there is a significant difference between the means of two groups? B.) t-test -------------------------------------------------------------- *A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related to certain features. It is mostly used when the data sets, like the data set recorded as the outcome from flipping a coin 100 times, would follow a normal distribution and may have unknown variances. A t-test is used as a hypothesis testing tool to test an assumption applicable to a population. *A Z-score is a numerical measurement that describes a value's relationship to the mean of a group of values. Z-score is measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score. A Z-score of 1.0 would indicate a value that is one standard deviation from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean. *A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e. that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. *A chi-square test is a statistical test used to compare observed results with expected results. The purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying.

Which of the following statements does BEST describe the difference between discrete and continuous data types? A.) Continuous data is an alphanumeric type of data that includes characters with fixed data values determined by counting. Discrete data includes complex characters and varying data values that are measured over a specific time interval. B.) Discrete data is an alphanumeric type of data that includes characters with fixed data values determined by counting. Continuous data includes complex characters and varying data values that are measured over a specific time interval. C.) Continuous data is a numerical type of data that includes numbers with fixed data values determined by counting. Discrete data includes complex numbers and varying data values that are measured over a specific time interval. D.) Discrete data is an alphanumeric type of data that includes characters with fixed data values determined by counting. Continuous data includes complex characters and varying data values that are measured over a specific time interval.

Which of the following statements does BEST describe the difference between discrete and continuous data types? D.) Discrete data is an alphanumeric type of data that includes characters with fixed data values determined by counting. Continuous data includes complex characters and varying data values that are measured over a specific time interval.

You have been tasked to create the login form for the website of your client. You design the database for the login form, but you forgot to add the column that keeps a Boolean type (0/1) which shows if a user has an account or not. If a user has an account, then the database should be stored the value 1, alternatively the value 0. Which of the following data types should you use to complete the database? A.) Date B.) Numeric C.) Currency D.) Alphanumeric

You have been tasked to create the login form for the website of your client. You design the database for the login form, but you forgot to add the column that keeps a Boolean type (0/1) which shows if a user has an account or not. If a user has an account, then the database should be stored the value 1, alternatively the value 0. Which of the following data types should you use to complete the database? B.) Numeric -------------------------------------------------------------- *Numeric is any intrinsic data type (Byte, Boolean, Integer, Long, Currency, Single, Double). *Numeric data types are numbers stored in database columns. *These data types are typically grouped by: Exact numeric types, values where the precision and scale need to be preserved.

Udemy Practice Questions Course (CompTIA Data+)

Related study sets

ASM 222

Unit 4: Demand Supply Market Forces, Unit 3: Doing Business in a Market Economy, Unit 2: Economic Systems and Role of Government, Unit 1: Act Like an Economist

Exam 2

Management and Leadership Exam 1

COR439- Flag Burning, Fighting Words

Chapter 11: The Federal Court System

Chapter 7 - graded

Sales Process and Direct Marketing

FIN 3403 Chapter 10

GB 350 1-6.2 quiz qs

International Macro

Chapter 8

individual and health nutrition 1

Marketing Final

Social Studies Chapter 24

Contracts

NUR 314 FINAL EXAM

Module 4: Research Methods

Senior Sem Ch.2

Speech Terms Ch.6