Google Data Analytics
a data professional aims to achieve a statistical power of at least _____ to consider their results statistically significant
0.8, or 80%
What are some strategies data professionals can use when they do not have enough data to meet a business objective?
1. Locate another relevant dataset to work with. 2. Consider whether it is possible to adjust the objective. 3. Use smaller-scale data until they can find more complete data.
A data analyst wants to know how many cells from G2 through cell G100 contain numbers below 500. Which of the following COUNTIF statements should they use?
=COUNTIF(G2:G100,"<500")
Which function will return the number of characters in spreadsheet cell F8 in order to confirm it contains exactly 15 characters?
=LEN(F8)
Which function sorts a spreadsheet range between cells K1 and L80 in ascending order by the first column, Column K?
=SORT(K1:L80, 1, TRUE)
CONCAT
A SQL function that adds strings together to create new text strings that can be used as unique keys
CAST
A SQL function that converts data from one datatype to another
COALESCE
A SQL function that returns non-null values in a list
CASE
A SQL statement that returns records that meet conditions by including an if/then statement in a query
null
A data value used to represent situations in which an actual value is unknown, unavailable or not applicable
What information is typically included in a changelog? Select all that apply.
A description of the change The date of the change The component that changed and the reason why
Changelog
A file containing a chronologically ordered list of modifications made to a project
Population
A group of individuals that belong to the same species and live in the same area
DISTINCT
A keyword that is added to a SQL SELECT statement to retrieve only non-duplicate entries
Float
A number that contains a decimal
Verification
A process to confirm that a data-cleaning effort was well executed and the resulting data is accurate and reliable
COUNTA
A spreadsheet function that counts the total number of values within a specified range
Substring
A subset of a text string
Find and replace
A tool that finds a specified search term and replaces it with something else
A data analyst organizes a database to show only the 100 most recent real estate sales in Stamford, Connecticut. What steps do they take?
Add a filter to return only sales in Stamford, Connecticut, then sort the most recent sales at the top of the list.
Which of the following statements accurately describe code review and code commit? Select all that apply.
An example of code review is a data professional asking a colleague to assess their SQL query. Code commit might involve updating code within a version control system. Code review occurs prior to code commit.
Central Limit Theorem (CLT)
As sample size increases, the results more closely resemble the normal (bell-shaped) distribution from a large number of samples
In what way is the SQL WHERE clause similar to filtering in spreadsheets?
Both the WHERE clause and spreadsheet filters return a subset of data based on specified criteria.
Which SQL clause will consider a condition and return a value when that condition is met?
CASE WHEN column_name = 'condition' THEN 'value' END
A data professional discovers that SUV is spelled SWV in the database column car_types. Which CASE clause will enable them to correct the misspellings?
CASE WHEN car_types = 'SWV' THEN 'SUV'
A data analyst discovers that their database has recognized product price data as text strings. What SQL function can the analyst use to convert the text strings to floats?
CAST
As you migrate data from a legacy system to a new database, you find that many client records contain missing values in the email_address column. What SQL function can you use to replace these null values with a value in a different column?
COALESCE
Which SQL function combines groups of text strings from multiple cells in order to create a new string?
CONCAT
A team is tasked with determining how many customers have made a purchase in the past month. Using a pivot table in Google Sheets, what function will total the number of non-empty sales in the purchase_date column of its customer database?
COUNTA
Which function enables a data professional to count the total number of spreadsheet values within a specified range?
COUNTA
During verification, you wonder if one of your data modifications was an effective update. What can you reference to revisit the modification and your reasoning behind it?
Changelog
A data professional makes a change to a file. Then, they ask a colleague to evaluate the change to identify any potential issues. What does this scenario describe?
Code review
Identify what makes data insufficient
Comes from only one source, Continuously updates and is incomplete, outdated, geographically limited
A junior data analyst needs to search their spreadsheet for a particular client ID. In order to identify all cells containing the ID, they use a spreadsheet tool that changes how cells appear when values meet specific conditions. What tool do they use?
Conditional formatting
A data professional in the logistics industry wants to calculate the margin of error for a study about transportation route efficiency. They know the population size and sample size. What must they also know in order to accurately calculate margin of error?
Confidence level
Which of the following statements accurately describe sample size, population, and confidence level? Select all that apply.
Confidence level is the probability that a sample accurately reflects the greater population. When data professionals use sample size, they are using a part of a population that is representative of the population. Random sampling involves selecting a sample from a population so that every possible type of the sample has an equal chance of being chosen.
A data professional works on a financial audit. During the verification process, they keep in mind the big picture view of confirming that the company's financial statements comply with accounting standards. What activities will help them achieve this goal? Select all that apply.
Consider the business problem Consider the goal Consider the data
Typecasting
Converting data from one type to another
While working with a database table that contains the column employee_name, you notice that there are some duplicate entries. Which SQL clause would you use in a query to return the employee_name data without these duplicates?
DISTINCT employee_name
Data teams use the SQL command _____ to tidy up a database that is currently cluttered with many irrelevant tables.
DROP TABLE IF EXISTS
Which data professionals are most often responsible for ensuring that data is available, secure, and backed up to prevent loss?
Data warehousing specialists
A data analyst uses a spreadsheet's Split tool to place each grain and dairy product into new, separate cells. What is the semicolon's function in this scenario?
Delimiter
A data analyst uses a changelog to record how the data evolves while cleaning their data. What data cleaning best practice does this describe?
Documentation
A data professional works in a spreadsheet column that can only contain six-digit customer ID numbers. They ensure the data points in the column are always exactly six-digits long using which data analytics tool?
Field length
Which of the following statements accurately describe sorting and filtering?
Filtering involves showing data that meets a specified criteria while hiding the rest. Sorting can be used to group similar data together by a classification. Data professionals sort data to make it easier to understand, analyze, and visualize.
A data professional in human resources is tasked with identifying appropriate staff members to manage upcoming projects. In the analyze phase of the data analysis process, what activities might this involve?
Format the data to filter for keywords relevant to the upcoming projects Organize an employee dataset by skills and experience
A data team collaborating with the HR department uses the SQL command _____ to add a row for a new employee to their organization's database.
INSERT INTO
Deal with insufficient data
Identify trends within the available data, Wait for more data if time allows, Discuss with stakeholders and adjust your objective, Search for a new dataset.
You are working with a database table that contains data about turtles. What SQL clause will return any turtle ages that are less than three digits long from the turtle_age column?
LENGTH(turtle_age) < 3
You are using a database table that includes the column user_password, and you want to make sure all passwords are aligned to company protocols. Which SQL clause will help you confirm that the passwords are 15 characters long?
LENGTH(user_password) = 15
What SQL clause can be added to this query to ensure only the first 50 results are returned?
LIMIT 50
Data range
Numerical values that fall between predefined maximum and minimum values
A data team begins to investigate possible relationships in a dataset by sorting the data by several fields of interest. What phase of analysis is the data team in?
Organize data
What term describes data points that are very different from similarly collected data and, therefore, might not be reliable values?
Outliers
A data professional runs a query that will return a dataset containing numbers out to five decimal places. Which SQL function will limit the records to two decimal places?
ROUND
Which of the following tasks are involved in the verification process?
Rechecking the data-cleaning effort Considering whether the data is credible and appropriate for the project Manually fixing errors found in the data
What objectives can be achieved by documenting the evolution of a dataset?
Recover data-cleaning errors Determine the quality of the data Inform other users of changes
Question 2 A research team conducts an experiment to determine if a new cybersecurity tool is more effective than the previous version. What type of results are required for the experiment to be statistically significant?
Results that are real and not caused by random chance
You're a data analyst for a sports arena who wants to better understand their customers who attend soccer games. At every event, attendees are asked to fill out a survey. The sports arena keeps the responses in the data table CustomerSurveys. Which query should you use to examine only the data from customers who attended soccer games?
SELECT * FROM CustomerSurveys WHERE event = 'soccer';
Which query will return a list of all construction businesses that have made more than $8 million, from the largest number of employees to the fewest?
SELECT * FROM `CompanyData` WHERE Business = 'Construction' AND Revenue > 8000000 ORDER BY number_of_employees DESC
After a company merger, a data analyst receives a dataset with millions of rows of data. They need to use this data to identify insights for an important project. What tool would be most efficient for the analyst to use?
SQL
You are working with a database table that has columns about ice cream, such as ice_cream_flavor. Which SUBSTR function and AS command will retrieve the first 4 characters of each flavor and store the result in a new column called flavor_ID?
SUBSTR(ice_cream_flavor, 1, 4) AS flavor_ID
You are working with a database table that contains data about cookbooks. What SQL clause will retrieve the first eight letters of each data point in the recipe_name column, then store the result in a new column called recipe_listing?
SUBSTR(recipe_name, 1, 8) AS recipe_listing
A candy manufacturer conducts a survey to learn more about its customer base. Although young people are known to purchase a large percentage of its candy, due to age requirements, the survey is only sent to customers who are 18 years or older. What is likely to result?
Sampling bias
A data analyst at a high-tech manufacturer sorts inventory data in a spreadsheet. They sort all data by ranking in the Order Frequency column, keeping together all data across rows. What spreadsheet tool are they using?
Sort Sheet
In SQL, what function can be used to remove leading spaces from a piece of data?
TRIM
You're a data analyst working with a team to analyze data. One of your team members shows you a spreadsheet they've sorted, but you notice that one of the columns doesn't seem to be correctly associated with the rest of the dataset. On closer examination, you realize that only that column was sorted, instead of the entire sheet. How can you and your team member sort the entire sheet?
The SORT function The Sort sheet option
Data integrity
The accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle
Statistical significance
The determination of whether your result could be due to random chance or not. The greater the significance, the less due to chance
Confidence interval
The range of possible values that the population's result would be at the confidence level of the study
A junior data analyst copies data from one computer to another over their company network. The network connection goes down during the process, which results in an incomplete copy of the data. What data integrity problem does this scenario describe?
Transfer
. A data analyst wants to find out how many middle school students in Helsinki have laptops. It is unlikely that they can survey every middle schooler in the city. Instead, they survey enough students to represent all middle schoolers. This describes what data analytics concept?
Using a sample
To filter for all students in the Sophomore table who live in Fairfield County, a data professional uses the _____ clause in SQL.
WHERE
Sample
a subset of the population
Data validation is a tool for checking the _____ of data before adding or importing it
accuracy and quality
Review data integrity
assessing the overall accuracy, consistency, and completeness of the data, Connect objectives to data by understanding how your business objectives can be served, Know when to stop collecting data
Sampling _____ occurs when some members of a population are overrepresented or underrepresented in the data.
bias
Fill in the blank: Changelogs are files containing _____ ordered lists of modifications made to a project.
chronologically
Data constraints
criteria that determine whether a piece of a data is clean and valid
Consistency
degree to which data is repeatable from different points of entry or collection
Accuracy
degree to which the data conforms to the actual entity being measured or described
Completeness
degree to which the data contains all desired components or measures
To update their company name during a rebranding, a data professional uses the spreadsheet tool _____ to search for any instance of "Green Thumb Inc." and change it to "Farmer's Friend."
find and replace
Fill in the blank: When searching for the value in the first argument of the function, VLOOKUP looks in the _____ column of the specified location
leftmost
Hypothesis testing
make and test an educated guess about a problem/solution
proxy data
measurements that allow one to indirectly infer a value such as the temperature or atmospheric conditions in years past
Sort Sheet
menu sort function that keeps data together
size of your sample
no less than 30 confidence level most commonly used is 95%, but 90% can work in some cases For a higher confidence level, use a larger sample size To decrease the margin of error, use a larger sample size For greater statistical significance, use a larger sample size
4 phases of analysis
organize data, format and adjust data, get input from others, transform data
Data manipulation
process of changing data to make it more organized and easier to read
Data replication
process of storing data in multiple locations
Cross-field validation
process that ensures certain conditions for multiple data fields are satisfied
If a test is statistically _____, the results are less likely to be due to random chance and more likely to be due to a real difference between the groups being compared
significant
cleaning pitfalls
spelling errors, document errors, misfielded values, overlook missing values, looking at subset not the whole picture, losing track of objective, not fixing source of error, not analyzing system, not backing data, not accounting data
After a change to a query is submitted, all team members will be able to access the new query once they _____ the most up-to-date version control system.
sync to
Fill in the blank: When typing a MID function, the correct _____ to follow is =MID(range, reference starting point, number of middle characters)
syntax
Confidence level
the estimated probability that a population parameter lies within a given confidence interval
Margin of error
the range of percentage points in which the sample accurately reflects the population
The goal of analysis is to identify _____ within data in order to accurately answer questions and solve problems.
trends and relationships
A software engineer accesses the source of code for a new app in a _____, which allows them to revert to previous versions of the code if a problem is discovered.
version control system
