Data Analytics - Course 5

Ace your homework & exams now with Quizwiz!

An analyst notes that the "160" in cell A9 is formatted as text, but it should be Australian dollars. What spreadsheet tool can help them select the right format?

Format as Currency

In a spreadsheet, cell J10 contains the date and time value 2/23/2021 7:00. What is the correct syntax to return only the four-digit time portion of the cell value?

the syntax is =RIGHT(J10, 4)

what is filtering?

used when you are only interested in seeing data that meets a specific criteria, and hiding the rest. Filtering is really useful when you have lots of data. You can save time by zeroing in on the data that is really important or the data that has bugs or errors. Most spreadsheets and SQL databases allow you to filter your data in a variety of ways. Filtering gives you the ability to find what you are looking for without too much effort.

During which of the four phases of analysis do you compare your data to external sources?

get input from others

example of filtering

if you are only interested in finding out who watched movies in October, you could use a filter on the dates so only the records for movies watched in October are displayed. Then, you could check out the names of the people to figure out who watched movies in October.

what is Sort range?

-doesn't keep the information across rows together. -When you sort a range, you're selecting a specific collection of cells or the range that you want the sorting limited to. Nothing else on the spreadsheet gets rearranged but the specified cells.

Which phase of the data analysis process has the goal of identifying trends and relationships?

analyze

what function can you use if you notice an extra space after using the CONCAT function?

the TRIM function

You are performing a calculation during your analysis of a dataset. Which phase of analysis are you in?

transform data

A data analyst wants to write a SQL query to combine data from two columns and into a new column. What function can they use?

CONCAT

what is just as important as cleaning data?

-> making sure it's in the right format

CONCAT with + function (usage & example) in sql

-> Adds two or more strings together using the + operator -> 'Google' + '.com'

You are working with a dataset from a local community college. You sort the students alphabetically by last name. This is an example of which phase of analysis?

Format and adjust data

Fill in the blank: A data analyst uses _____ to decide which data is relevant to their analysis and which data types and variables are appropriate.

organization

Analysis

-> process used to make sense of the data collected.

Combine ORDER BY with WHERE clauses

-> Next, modify the query so that it returns the top 10 counties with the highest birth counts for 2018 only. SELECT * FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` WHERE Year = '2018-01-01' ORDER BY Births DESC LIMIT 10

when should you consider using R?

Are you spending more time debugging queries than actually analyzing data? Maybe you should consider R.

what spreadsheet function will allow you to combine data values in separate cells into one cell?

CONCAT function

A data analyst in human resources uses a spreadsheet to keep track of employees' work anniversaries. They add color to any employee who has worked for the company for more than 10 years. Which spreadsheet tool changes how cells appear when values equal 10 or more?

Conditional formatting

You are creating a spreadsheet to help you with your job search. Every time you find an interesting job, you add it to the spreadsheet. Then, you want to indicate two possible options: Need to Apply or Applied. What spreadsheet tool will save you time by enabling you to create a dropdown list with Need to Apply and Applied as the possible options?

Data validation

When working with a spreadsheet, data analysts can use the WHERE function to locate specific characters in a string.

False

when would you use a pivot table?

For example, if you're working with a simple spreadsheet, maybe five to ten rows and a few columns, then pivot tables are a great way to visualize that data.

You are querying a database of ice cream flavors to determine which stores are selling the most mint chip. For your project, you only need the first 80 records. What clause should you add to the following SQL query?

LIMIT 80

what is Sort sheet?

all of the data in a spreadsheet is sorted by the conditions of a single column, but the related information across each row stays together.

Fill in the blank: A data analyst is working with a spreadsheet that has very long text strings. They use the LEN function to count the number of _____ in the text strings.

characters

how should you first approach a problem?

-> it starts with how you approach a problem mentally. You've learned about different kinds of thinking skills and how to practice them in your data analysis work. From analytical, to mathematical, to structured thinking. This helps build your mental model, or your thought process, and the way you approach a problem. Data analysts use these thinking skills to approach a problem logically and break it into smaller parts.

what is the bottom line when it comes to data organization?

-> it's important to have your data in the right format. So always be prepared to adjust, no matter how far into your analysis you are

how to effectively search for answers to your problems online

-> it's important to use the right terms when searching for solutions. Knowing how to frame data analytics questions with the same language other analysts are using will help you get more search results, and it'll help you understand what other analysts are saying -> On top of being able to use the right terms to search online, you also need to be familiar with basic tools. That way, when an online resource is walking you through a new function and a tool that you've used before, you'll know how those tools work.

what is just as important as having the right tools in your toolkit?

-> knowing when the use those tools

what can incorrectly formatted data do?

-> lead to mistakes -> take time to fix -> affect stakeholder's decision-making

filtering in sql (w/ example)

-> let's say we want to filter movies by genre so that we're only working with comedies. But we still want release dates to be sorted in descending order, from most recent to oldest films. We can do this with the WHERE clause. -> SELECT * -> FROM movie_data.movies -> WHERE Genre = "Comedy" -> ORDER BY Release_Date

one of the biggest attributes that students should keep in mind

-> most important thing that they need to have throughout this learning journey is grit

what clause can you use to filter data in SQL?

-> the WHERE clause

what is data validation usually used for?

-> to add drop-down lists to cells with predetermined options for users to choose from -> creating custom checkboxes -> protect structured data and formulas

what is the goal of analysis?

-> to identify trends and relationships within the data so that you can accurately answer the question you're asking.

where do you place the ORDER BY clause in your query?

-> usually the last clause in your query

how to filter data in BigQuery (w/ example)

-> we are going to filter a movies tables to only show comedies -> SELECT -> * -> FROM -> movie_data.movies -> WHERE -> genre = 'Comedy'

what is sorting?

-> when you arrange data into a meaningful order to make it easier to understand, analyze, and visualize. It ranks your data based on a specific metric you choose. You can sort data in spreadsheets, SQL databases (when your dataset is too large for spreadsheets), and tables in documents.

what if you want to put a space in between the merged values?

-> you need to use the full CONCATENATE function, which allows you to combine multiple strings. -> =CONCATENATE(A2," ",B2)

What function can be used to return the number of characters in cell B8 so you can confirm that it contains exactly 20 characters?

=LEN(B8)

Spreadsheet cell F2 contains the text string "Dashboard". To return the substring "board", what is the correct syntax?

=RIGHT(F2, 5)

Question 1 Fill in the blank: In SQL, _____ can be used to combine strings from multiple tables in order to create a new string.

CONCAT

An analyst uses =SORT to sort spreadsheet data in descending order. What do they type at the end of their sort function?

false

sorting in SQL using the ORDER BY clause (w/ example)

*sorting movies by release date* -> SELECT * -> FROM movie_data.movies -> ORDER BY Release_Date

Explore Stack Overflow (how to ask q's)

-> From the home page, click the dropdown in the upper left corner and click Questions. -> The Questions page provides different categories of questions for you to choose. Some examples include the "Newest" and "Active" categories. Read some of the questions under the different categories. Tags will help you find questions. On the left pane, click on Tags. -> For example, if you want to only find questions that have the tag "SQL," then type [SQL] in the search field, along with your keywords or question. See the example below. -> [SQL] How to use CONCAT

what do you do in the analyze phase?

-> The analyze stage is where you become the expert about your dataset. Here, you're going to understand all of the different fields. You're going to understand their averages, potentially the median of the data. You're going to understand how different rows in your data differ from each other. And it's where you're going to gain the confidence to be able to explain your findings to an audience that maybe does not have the same level of expertise with data that you have

how do you change the order of the ORDER BY clause

-> ORDER BY Release_Date DESC -> now the most recently released films are at the top

What is R? and what is used for?

-> R is another programming language, but it's not a database language like SQL. It's a programming language frequently used for statistical analysis, visualization, and other data analysis.

what is stack overflow?

-> an online platform where programmers ask code-related questions and peers are available to suggest answers. You can ask questions about programming languages such as SQL and R (which you will learn about in Course 7), data tools, and much more.

by default what order is data out in when using ORDER BY

-> ascending order -> oldest to newest (in the case of release date)

To narrow the scope of a query, a data analyst filters by a particular criteria. This type of filtering must be done one variable at a time. (t/f)

-> false -> filtering can be done by single variable or multiple variables, depending on the query's needs.

what can be a good idea if you're stuck on a task?

-> it can be a good idea to take a step back and reconsider how you're approaching a task.

A spreadsheet cell contains the coldest temperature ever recorded in New Zealand: -22 °Celsius. What function will display that temperature in Fahrenheit?

=CONVERT(-22, "C", "F") will display -22 °C in Fahrenheit

To convert the temperature in cell B2 in a Google spreadsheet from degrees Celsius to degrees Fahrenheit, what is the correct syntax for the CONVERT function?

=CONVERT(B2, "C", "F")

You ask volunteers at a theater production which tasks they have already completed and add that data to a spreadsheet containing all required tasks. You will use the information provided by the volunteers to figure out which tasks still need to be done. This is an example of which phase of analysis?

Get input from others

A data analyst clicks on the Format Cells in drop-down menu and selects the option Text Is Exactly November. This changes the color of all the cells that contain the word November. What spreadsheet tool is the analyst using?

The data analyst is using conditional formatting. Conditional formatting is a spreadsheet tool that changes how cells appear when values meet specific conditions.

A data analyst wants to ensure spreadsheet formulas continue to run correctly, even if someone enters the wrong data by mistake. Which data-validation menu option should they select to flag data entry errors?

To ensure spreadsheet formulas continue to run correctly, even if someone enters the wrong data by mistake, select Reject Invalid Inputs to flag that data as invalid.

how do you identify trends and relationships within the data?

-> To do this, you should stick to the 4 phases of analysis: organize data, format and adjust data, get input from others, and transform data by observing relationships between data points and making calculations.

what are the two methods for sorting data in spreadsheet?

-> one involves using the menu; -> the other involves writing out the sort function.

what is really important throughout the analysis phase?

-> organization ---> How your data is classified and structured will impact your findings

what are the 4 phases of analysis?

-> organize data -> format and adjust data -> get input from others -> transform data by observing relationships between data points and making calculations.

sorting movies by menu (w/ example)

*sorting movies by release date* -> click on the letter of the column to highlight all the cells -> head to the data tab in the menu -> Now you have two choices: sort a sheet or a range of data. You'll notice that we've selected just the release dates, but these release dates are specifically related to the movies in their row. -> In this case, you want the release date and the movie title to stay in the same row as you sort because they're related. To do this, you'll want to "Sort sheet." This will keep all the data together by row, no matter how you sort it. Depending on the order you want the release dates to be in, you can sort from A to Z, which will also rank the dates numerically. Or you can sort from Z to A, which will sort data the opposite way. Since we want the release dates to be in order, we'll click "Sort sheet by column B" from A to Z.

what is a customized sort order?

-> A customized sort order is when you sort data in a spreadsheet using multiple conditions. This means that sorting will be based on the order of the conditions you select

CONCAT function (usage & example) in sql

-> A function that adds strings together to create new text strings that can be used as unique keys -> CONCAT ('Google', '.com')

CONCAT_WS function (usage & example) in sql

-> A function that adds two or more strings together with a separator -> CONCAT_WS (' . ', 'www', 'google', 'com') -> *The separator (being the period) gets input before and after Google when you run the SQL function

another example of formatting (fahrenheit to celcius)

-> All you need to do is use the CONVERT function to change the unit of measurement. -> We'll use this empty column here. (a new column) -> =CONVERT(B2, "F", "C") ---> the B2 specifies the cell you'd like to convert -> then double tap the bottom right corner to convert all cells down the column

During which analysis phase does an analyst sort and filter data?

-> An analyst sorts and filters data during the format and adjust analysis phase.

the CAST function syntax

-> CAST (expression AS typename) -> Where expression is the data to be converted and typename is the data type to be returned.

Converting a date to a datetime in sql

-> Datetime values have the format of YYYY-MM-DD hh: mm: ss format, so date and time are retained together. The following CAST statement returns a datetime value from a date. -> SELECT CAST (MyDate AS DATETIME) FROM MyTable In the above SQL statement, the following occurs: SELECT indicates that you will be selecting data from a table CAST indicates that you will be converting the data you select to a different data type AS comes before and identifies the data type which you are casting to DATETIME indicates that you are converting the data to a datetime value FROM indicates which table you are selecting the data from

explanation of above sql query

-> First, we'll input SELECT user type to let SQL know that we want the user type as a column. Then we'll use CONCAT to combine the names of the beginning and ending stations for each trip in a new column. This will create one column based on the routes people take. We also need to input a title for this new column. We'll type in, AS route, to name the route column using those beginning and ending station names we combined with CONCAT. -> After that, we want SQL to count the number of trips. So we'll input COUNT to do that. We can use an asterisk to tell it to count up the number of rows in the data we're selecting. In this case, each row represents a trip, which is why we can just count all of the rows we've selected. We'll name this output as num_trips.

when would it be appropriate to save the results of a query into a new table?

-> This is a useful skill when the original data source changes continuously and you need to preserve a specific dataset for continued analysis. It's also valuable when you are dealing with a large dataset and know you'll be doing more than one analysis using the same subset of data.

CONCAT function example

-> For example, the bike-sharing company has two different kinds of customers; one-time paying customers and subscribers. Let's say we want to find out what routes are most popular with different user types. SELECT usertype, CONCAT (start_station_name, " to ", end_station_name) AS route, COUNT(*) as num_trips, ROUND(AVG(cast(tripduration as int64)/60),2) AS duration FROM `bigquery-public-data.new_york.citibike_trips` GROUP BY start_station_name, end_station_name, usertype ORDER BY num_trips DESC LIMIT 10

what is an example of when it would be appropriate to use data validation? (in the case of drop-down lists)

-> If you have a spreadsheet with a lot of collaborators, this can make it easier for them to interact with your table. You can think of it like a multiple choice question on a quiz. Since you control what's being entered into the worksheet, it cuts down on how much data cleaning you have to do later on.

how to get use bigquery

-> If you have never created a BigQuery project before, click CREATE PROJECT on the right side of the screen. If you have created a project before, you can use an existing one or create a new one by clicking the project dropdown in the blue header bar and selecting NEW PROJECT. -> Name your project something that will help you identify it later. You can give it a unique project ID or use an auto-generated one. Don't worry about selecting an organization if you don't know what to put. -> Click + ADD DATA at the top of the Explorer menu, then Explore public datasets from the resulting dropdown.

customized sort order example

-> Imagine you want the guests to be sorted by whether or not they've been sent an invitation. And based on that, we want those guest names to be listed alphabetically. You can do that easily with the "Sort range" option under Data. -> First, highlight all the data in the set from cells A1 to D6. Then under the Data tab in the menu, click "Sort range, then Advanced range sorting options. -> In this case, check "Data has a header row," which makes sure that the title of the column isn't mixed into the sorting. Then, we'll make sure it's being sorted by "Sent invitation." -> Because we want to add an additional sorting condition, we'll now click on "Add another sort column." The guest names should be in alphabetical order. So let's select "Guest Names" and sort from A to Z.

4 phases of analysis to a real-world scenario

-> Imagine you want to buy a gift for your friend Zara's wedding. The problem is you're not sure what to get her. Fortunately, you have a ton of data from her wedding website. But instead of reading all the data on her website and scrolling through a photo album of her and her partner, you go straight to the online registry, a wish list of gifts they'd enjoy. The registry is like a dataset that you can analyze to make a decision. Now that you're checking out organized data in the registry, you want to make sure that the list of data, or gifts in this case, is formatted in a way that's easy to reference. Formatting data streamlines things and saves you time. Scrolling through hundreds of gifts can be time-consuming. Instead, you can adjust the data in a way that makes it easy to digest by filtering and sorting your data. You have a budget you want to stick to, so you sort the gift prices from low to high. You then filter prices to include gifts that are within your budget of $60. You're working with a newly formatted list of data. At this point, it's good to remember that input from other people can also be really helpful when analyzing information and making decisions. You can check the list of gifts to figure out if anyone else has already bought any of the items. You realize a few of the items in the list have been purchased, and this informs your decision. When analyzing data, gaining input from others is important because it gives you a viewpoint you might not understand or have access to.

how to save a new table

-> In order to make this subset of data easier to query from, you can save the table from the weather data into a new dataset. -> 1. From your Explorer pane, click the three vertical dots next to your project and select Create dataset. You can name this dataset demos and leave the rest of the default options. Click CREATE DATASET. -> 2. Open your new dataset and select COMPOSE NEW QUERY. Input the following query to get the average temperature, wind speed, visibility, wind gust, precipitation, and snow depth La Guardia and JFK stations for every day in 2020, in descending order by date, and ascending order by Station ID: *paste the previous query* -> 3. Before you run the query, select the MORE menu from the Query Editor and open the Query Settings menu. In the Query Settings menu, select Set a destination table for query results. Set the dataset option to demos and name the table nyc_weather. -> 4. Run the query from earlier; now it will save as a new table in your demos dataset. 5. Return to the Query settings menu by using the MORE dropdown menu. Reset the settings to Save query results in a temporary table. This will prevent you from accidentally adding every query as a table to your new dataset.

Conditional formatting can be used for which spreadsheet tasks?

-> It can be used to change a cell's color in order to highlight it.

sorting in a pivot table

-> Items in the row and column areas of a pivot table are sorted in ascending order by any custom list first. For example, if your list contains days of the week, the pivot table allows weekday and month names to sort like this: Monday, Tuesday, Wednesday, etc. rather than alphabetically like this: Friday, Monday, Saturday, etc. -> If the items aren't in a custom list, they will be sorted in ascending order by default. But, if you sort in descending order, you are setting up a rule that controls how the field is sorted even after new data fields are added.

example of using data validation (drop-down)

-> Let's say our team has a spreadsheet that tracks everyone's progress. But instead of making everyone write in where they are in their task individually, we can provide a drop-down menu with multiple options, like "Not Yet Started," "In Progress," and "Ready." -> first highlight the column you'd like to work with -> then select Data pull-down menu at the top and click "Data validation." -> select the "list of items" option from the possible criteria and type in the selections we want to create. Then hit Save, and now all of those cells have drop-down menus that we can use to easily mark progress for each task.

what are examples of format errors?

-> Let's say you wanted to sort the movies in this spreadsheet by most recent. If the spreadsheet cast them as strings instead of dates, it might sort them alphabetically. -> It's also possible that your datasets contain inconsistent units of measurement that you'll need to convert. Like say, a table that includes both US dollars and English pounds.

A data analyst wants to add a spreadsheet dropdown list with three options: Draft, Edit, and Final. Which option from the Data Validation menu should they select?

-> List of Items

explanation of above sql query (cont.)

-> Now let's also get the average trip duration for each route. In this case, we don't need the exact average, so we can use the ROUND function to round up. We'll put that first and then in the parentheses use average to get the average trip duration. We'll also want this data to be in integer form for this calculation, so we'll input cast as int 64. Big query stores numbers in a 64-bit memory system, which is why there's a 64 after integer in this case. -> this query divides by 60, which is the number of seconds in a minute. Dividing by 60 returns the output "duration" in minutes instead of seconds. -> and tell it how far we want it to round, two decimal places. We'll name this output as duration. We'll need to tell SQL where this information is stored. -> Since we're using COUNT and AVERAGE functions in our select clause, we have to use GROUP BY to group together summary rows. Let's group by the start station, the end station, and the user type for this query. Finally, we'll use ORDER BY to tell it how we want to organize this data. (check out DA folder for the query visual)

combining data from three cells using CONCAT

-> Now, combine the month, day, and year into a single data value: Date -> =CONCATENATE(C2," ",D2,", ",E2) ---> this will give you April 30, 1789 -> notice how the comma is btw the " "

Hands-On Activity: SQL sorting queries (births data) continued...

-> Now, modify the query to sort in the other direction, returning the top 10 counties with the highest yearly birth counts between 2016-2018. -> SELECT * FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` ORDER BY Births DESC LIMIT 10

how to combine both data validation and conditional formatting (w/ example)

-> Right now we don't have a visual on how many tasks are in progress or how many upcoming deadlines there are. But if we color-coded those elements of the table, we could quickly see key pieces of data really easily. Let's start with the Status column, column C. In the last example, we created these drop- down menus with the data validation tool. Now we can use conditional formatting to add some color. Let's go to the conditional formatting option under the Format menu. -> This brings up a sidebar where we can select our range rule in formatting style. -> we can choose "Format Cells if... Text is exactly" from the rules. For our first rule, let's write "Not Yet Started" as the text condition. Then we'll choose a color to apply to those cells that have "Not Yet Started" in them.

filtering for two conditions at once using the AND filter in sql

-> SELECT * -> FROM movie_data.movies -> WHERE Genre = "Comedy" -> AND Revenue > 300000000 -> ORDER BY Release_Date

what can you do if to avoid having to rewrite similar, but slightly different, queries over and over again?

-> Save the results from the original query as a table for future queries.

what are two ways to format and adjust data when performing an analysis?

-> Sorting and filtering are two ways you can keep things organized when you format and adjust data to work with it. -> For example, a filter can help you find errors or outliers so you can fix or flag them before your analysis. Outliers are data points that are very different from similarly collected data and might not be reliable values. The benefit of filtering the data is that after you fix errors or identify outliers, you can remove the filter and return the data to its original organization.

why use tables to organize your data?

-> Tables help you organize similar kinds of data into categories and subject areas that you can focus on as you analyze. -> Tables allow you to make decisions about data types. They help you to figure out what variables you need and the data type those variables should have. So if you have a database where you need to convert a data type during your analysis, you can do that by using the CAST command in SQL or any other method that you learn on the job or from your own research.

what is the analyze stage like?

-> The analyze stage is like preparing a fabulous meal. You have done all the cleaning and the preparing and the cooking, and you're finally able to take a bite and to see if what you're originally hoping to happen or what you were expecting, to see if that is really the case. Is it delicious? Is it exactly like you expected? Or is the consistency a little off and you need to add a little bit more salt? The analysis stage begins once you've prepped and cleaned your data.

Converting a date to a string in sql

-> The following CAST statement returns a string from a date identified by the variable MyDate in the table called MyTable. -> SELECT CAST(MyDate AS STRING) FROM MyTable In the above SQL statement, the following occurs: SELECT indicates that you will be selecting data from a table CAST indicates that you will be converting the data you select to a different data type AS comes before and identifies the data type which you are casting to STRING indicates that you are converting the data to a string FROM indicates which table you are selecting the data from

Converting a number to a string in sql

-> The following CAST statement returns a string from a numeric identified by the variable MyCount in the table called MyTable. -> SELECT CAST(MyCount AS STRING) FROM MyTable In the above SQL statement, the following occurs: SELECT indicates that you will be selecting data from a table CAST indicates that you will be converting the data you select to a different data type AS comes before and identifies the data type which you are casting to STRING indicates that you are converting the data to a string FROM indicates which table you are selecting the data from

Converting a string to a number in sql

-> The following CAST statement returns an integer from a string identified by the variable MyVarcharCol in the table called MyTable. (An integer is any whole number.) -> SELECT CAST(MyVarcharCol AS INT) FROM MyTable In the above SQL statement, the following occurs: SELECT indicates that you will be selecting data from a table CAST indicates that you will be converting the data you select to a different data type AS comes before and identifies the data type which you are casting to INT indicates that you are converting the data to an integer FROM indicates which table you are selecting the data from

why would you use data validation to protect structured data and formulas?

-> The more people that are working together in a spreadsheet, the more likely someone can accidentally break a formula. -> the data validation menu has an option to reject invalid inputs, which helps make sure our custom tools will continue to run correctly, even if someone puts the wrong data in by mistake.

last phase of analysis from previous example (cont.)

-> Transforming data means identifying relationships and patterns between the data, and making calculations based on the data you have. Going back to our example, you were able to find a gift that you knew Zara would like, and one that fits your budget. You were also able to choose a gift that wasn't already purchased by someone else. By finding the relationship between these data points, you chose, purchased, and sent a gift that would answer the problem you wanted to solve

You are working on an international project and need to invoice your customers for the work you complete. The database you use contains an invoices table. The invoices table contains the following columns: InvoiceId, CustomerId, InvoiceDate, BillingAddress, BillingCity, BillingState, BillingCountry, BillingPostalCode, Total. Create a query to return all the columns from this table for only customers in Germany who have an invoice total greater than $5.

-> Twelve rows are returned when making the following query: SELECT * FROM invoices WHERE BillingCountry='Germany' AND Total > 5. The AND clause allows you to write a query with more than one condition. This means that this query will return a list of 12 customers to charge that are from Germany and have invoices totaling more than $5.

The SAFE_CAST function in sql

-> Using the CAST function in a query that fails returns an error in BigQuery. To avoid errors in the event of a failed query, use the SAFE_CAST function instead. The SAFE_CAST function returns a value of Null instead of an error when a query fails. -> The syntax for SAFE_CAST is the same as for CAST. Simply substitute the function directly in your queries. The following SAFE_CAST statement returns a string from a date. -> SELECT SAFE_CAST(MyDate AS STRING) FROM MyTable

how to combine both data validation and conditional formatting (example 2)

-> We can also combine data validation and conditional formatting to track upcoming deadlines. We have a column of dates called "Review By This Date." First, let's use the data validation functionality to make sure users only enter valid dates. We'll go back to the Data dropdown at the top, pull up Data validation, and select Date as our criteria. Also select reject input -> Then we can go to the Format menu at the top. Go down to conditional formatting and open the sidebar again. We'll click the "Select range" icon and select the "Review By This Date" column. Now under Format rules, we can select "Date is after," which will give us another option. Let's choose "today." Then choose color. -> if the date listed in these rows is after today, it'll be filled in orange -> You can also choose a specific locked date if needed. But for now, let's go with today.

how to lock in the data when using a formula

-> We'll copy the values and then right click in a new column. There's an option for "Paste special." And there's an option to "Paste values only." And now we have the static values in this column.

example of using data validation (checkbox)

-> We'll go back to the data validation menu. But instead of choosing "List from a range," we'll choose "Checkbox." There's an option to use custom cell values. Let's choose that and put in "Approved" and "Not approved." Now these tasks can be checked off by whoever's reviewing them, like a project manager, for example.

what is conditional formatting?

-> a spreadsheet tool that changes how cells appear when values meet specific conditions. -> This lets you add visual cues to your spreadsheets that make it easier to understand your table at a glance, and it makes the information in the worksheet clearer to your stakeholders

what is the CONCAT function used with sql

-> allows you to join multiple text strings from multiple sources -> use CONCAT to combine strings from multiple tables to create new strings.

when should you convert and format your data?

-> as early as you can

strings in spreadsheets (example)

-> building a simple formula to separate the dates and timestamps from one column regarding bike data ---> column being "starttime" and "stoptime" -> for example, if we'd like to find the avg time btw start times, then we don't need the dates there -> =LEN(C2) ---> done to see how many characters are in the cell -> =FIND(" ", C3) ---> done to see where the date on the left side of the cell ends.. (looks for the space btw date and time, which brings back the 11, symbolizing where the space is) ---> therefore the timestamp substring will start at character 12. -> We can use the LEFT and RIGHT functions to select which parts of the string we want to isolate in a new column. We'll use RIGHT on one of these cells to indicate that we want to grab the right side. -> =RIGHT(C2, 8) ---> the C2 is the cell and 8 is the number of characters in the portion we'd like to isolate -> LEFT function works the same way

CONCAT example

-> combine the two sets of names in columns First Name and Last Name in a new column called Full Name. -> Click on cell F2. This is where you start the data for the new column. After you click on the cell, type =CONCAT(A2,B2) into the function bar and hit Return (Mac). -> Notice that the two names were combined without a space between them.

example of formatting data (numbers into currency)

-> for example if there's a column or two that has numbers but not the currency -> highlight the column(s) you want to format -> go to format, currency -> or the currency shortcut, which is displayed as $

what is a tip when adding data to tables using a formula

-> go back and paste the data in as values afterwards -> That way they're locked in. Otherwise the cell stays as a formula and could get confusing when you start working with the data.

example of sorting

-> if you need to rank things or create chronological lists, you can sort by ascending or descending order. If you are interested in figuring out a group's favorite movies, you might sort by movie title to figure it out. Sorting will arrange the data in a meaningful way and give you immediate insights. Sorting also helps you to group similar data together by a classification. For movies, you could sort by genre -- like action, drama, sci-fi, or romance.

Hands-On Activity: SQL sorting queries (births data)

-> imagine you were asked by your manager to figure out which 10 counties had the lowest birth count for 2016-2018. You could accomplish this by modifying your query to use the ORDER BY clause. SELECT * FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` ORDER BY Births LIMIT 10

what does data validation do in spreadsheets?

-> it allows you to control what can and can't be entered in your worksheet.

what should you keep in mind when using the SORT function in a spreadsheet?

-> t's good to keep in mind that when you use the SORT function, you're actually changing the existing dataset, unlike when you used the Data tab in the menu, which rearranged the data in the original dataset.

most of the data you'll use in your analysis will be organized in what?

-> tables

SORT function in action (example)

=SORT(A2:D6, 2, TRUE) ->After your first open parenthesis, reference the first cell in which data is collected from. In this case, that's A2 -> Then you'll include a colon and write the last cell you want included in the function, which is D6. -> A2 colon D6 is the range for this function. Next, write a comma to separate the range from what we're sorting by, which is column B. -> You should keep in mind that this part of the function doesn't recognize column letters. So in this case, we use the corresponding number instead, which is 2, since column B is the second column in our range. -> In this next part you'll need to decide whether you want the data in this column to be in ascending or descending order. A TRUE statement is in ascending order, and FALSE is descending. Because we want the tables to be listed starting from table number one, we'll write TRUE for ascending, and then end the function with a closed parenthesis. Now, let's see our function play out. -> Our party guests are now sorted by which table they're seated.

A data analyst uses a function to sort a spreadsheet range between cells H1 and K65. They sort in ascending order by the first column, Column H. What is the syntax they are using?

=SORT(H1:K65, 1, TRUE)

You are working with three datasets about voter turnout in your county. First, you identify relationships and patterns between the datasets. Then, you use formulas and functions to make calculations based on your data. This is an example of which phase of analysis?

Transform data

Fill in the blank: _____ involves arranging data into a meaningful order to make it easier to understand, analyze, and visualize.

sorting

Which menu sort function is used to keep data together across rows?

sort sheet


Related study sets

FEMALE REPRODUCTIVE SYSTEM: EXAM 4

View Set

Chapter 10 - Designing Organization Structure

View Set

Live Virtual Machine Lab 7.2: Module 07 Configuring Switching Features

View Set