Data Analytics Course 5 Week 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

To reach your goal, stick to the 4 phases of analysis (by observing relationships between data points and making calculations.)

1. Organize Data 2. Format and Adjust Data 3. Get input from others 4. Transform data The beauty of the analysis process is that you probably already analyze situations in your everyday life. Whether you're analyzing data in your personal life or in your career, these four tasks can help you make better decisions. The more you do it, the more comfortable you'll feel with the process.

Four phases of analysis

1. Organize data 2. Format and adjust data 3. Get input from others 4. Transform data

Organization isn't just about making things look orderly. It's also about making it easier to search and locate the data you need in a quick and easy way.

As a data analyst, you'll find yourself rearranging and sifting through databases pretty often.

Formatting and adjusting your data

As you move closer to analyzing your data, you will want to have the data formatted and ready to go. In this part of the course, you will learn all about converting and formatting data, including how to use SQL queries to combine data. You will also discover the value of feedback and support from your colleagues and how it can lead to new insights that you can apply to your work

Data Analysis Process

Ask, Prepare, Process, Analyze, Share, Act

Analysis Process (Get input from others)

At this point, it's good to remember that input from other people can also be really helpful when analyzing information and making decisions. When analyzing data, gaining input from others is important because it gives you a viewpoint you might not understand or have access to. On top of gaining input from other people, it's also important to seek out others' perspectives early. That way, if they predict any obstacles or challenges, you'll know beforehand. The people you'll look to for input don't have to be experts to be helpful. Sometimes all you need is for someone who's familiar with a topic or data you're considering.

Performing data calculations

Calculations are one of the more common tasks that data analysts perform during an analysis. In this part of the course, you will explore formulas, functions, and pivot tables in spreadsheets and SQL queries. All of these are used in data calculations. You will also learn about the benefits of using SQL to manage temporary database tables

Outliers

Data points that are very different from similarly collected data and might not be reliable values.

Database organization

Database organization enables analysts to make decisions about which data is relevant to pull for a specific analysis. It also helps them decide which data types and variables are appropriate.

​Aggregating data for analysis

During an analysis, you might need to combine data to gain insights and complete business objectives. In this part of the course, you will explore the functions, procedures, and syntax to combine, or aggregate data. You will learn how to combine data within multiple cells in spreadsheets, and within multiple database tables using SQL queries

Sort Function

In spreadsheets, functions are preset commands that perform a specific process. So in this case, the SORT function, as you might be able to guess, sorts your data. It's good to keep in mind that when you use the SORT function, you're actually changing the existing dataset, unlike when you used the Data tab in the menu, which rearranged the data in the original dataset. Once you have an idea of the data you want to be sorted and how, applying functions to your data is simple.

Sorting in a pivot table

Items in the row and column areas of a pivot table are sorted in ascending order by any custom list first. For example, if your list contains days of the week, the pivot table allows weekday and month names to sort like this: Monday, Tuesday, Wednesday, etc. rather than alphabetically like this: Friday, Monday, Saturday, etc. If the items aren't in a custom list, they will be sorted in ascending order by default. But, if you sort in descending order, you are setting up a rule that controls how the field is sorted even after new data fields are added.

ORDER BY to sort

ORDER BY: You can use the ORDER BY clause to sort results returned in a query. With this clause, you can choose to order data by fields in a certain column. ORDER BY command is usually the last clause in our query. That makes sure that all the results of the query you're running are sorted by that clause. By default, the ORDER BY clause sorts data in ascending order. (oldest to most recent)

Organizing data to begin analysis

Organizing data makes the data easier to use in an analysis. In this part of the course, you will learn the importance of organizing your data with sorting and filtering. You will explore organizing data in both spreadsheets and with SQL queries and temporary tables Organizing your data is one of the most important steps for analysis! Once you get organized, you can perform calculations to find clear and objective answers to any data question

IMPORTANT: Can use the like clause and a wildcard such as % to filter for a substring when writing a query. EX: The County_of_Residence column included both the city and state but I only wanted data for CA. That is where the LIKE clause and wildcard come into play

SELECT * FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` WHERE County_of_Residence LIKE "%CA%" AND Year = "2018-01-01" ORDER BY Births DESC Limit 25

Difference between sorting and filtering

To recap, the easiest way to remember the difference between sorting and filtering is that you can use sort to quickly order the data, and filter to display only the data that meets the criteria that you have chosen. Use filtering when you need to reduce the amount of data that is displayed. It is important to point out that, after you filter data, you can sort the filtered data, too. If you revisit the example of finding out who watched movies in October, after you have filtered for the movies seen in October, you can then sort the names of the people who watched those movies in alphabetical order.

Analysis Process (Transform Data)

Transforming data means identifying relationships and patterns between the data, and making calculations based on the data you have.

There's two methods for sorting spreadsheet data

one involves using the menu; the other involves writing out the sort function. The first method uses the Data tab in the menu of your spreadsheet program. The second way to store information in a spreadsheet is by writing a SORT function.

Customized Sort Order

A customized sort order is when you sort data in a spreadsheet using multiple conditions. This means that sorting will be based on the order of the conditions you select. To use this, highlight your range and then go to data - sort range - and then select your multiple criteria to sort by

Filtering

Filtering is used when you are only interested in seeing data that meets a specific criteria, and hiding the rest. Filtering is really useful when you have lots of data. Most spreadsheets and SQL databases allow you to filter your data in a variety of ways. Filtering gives you the ability to find what you are looking for without too much effort. For example, if you are only interested in finding out who watched movies in October, you could use a filter on the dates so only the records for movies watched in October are displayed. Filter: Filtering is showing only the data that meets a specific criteria while hiding the rest. Typically you can use filters when you want to narrow down the amount of data you want to sift through. Using a filter slims down larger data sets to smaller subsets that are relevant to what you need. When it comes to sifting through large, disorganized piles of data, filters are your friend.

Analysis Process (Format and Adjust Data)

Formatting data streamlines things and saves you time. Scrolling through hundreds of gifts can be time-consuming. Instead, you can adjust the data in a way that makes it easy to digest by filtering and sorting your data

How to create a temporary data table (subset of data from a large data set)

Save results from a query as a new table so you can work with only that specific data that you already queried 1. From your Explorer pane, click the three vertical dots next to your project and select Create dataset. You can name this dataset demos and leave the rest of the default options. Click CREATE DATASET. 2. Open your new dataset and select COMPOSE NEW QUERY. 3. Before you run the query, select the MORE menu from the Query Editor and open the Query Settings menu. In the Query Settings menu, select Set a destination table for query results. Set the dataset option to demos and name the table nyc_weather. 4. Run the query from earlier; now it will save as a new table in your demos dataset. 5. Return to the Query settings menu by using the MORE dropdown menu. Reset the settings to Save query results in a temporary table. This will prevent you from accidentally adding every query as a table to your new dataset. The ability to save your results into a new table is a helpful trick when you know you're only interested in a subset of a larger complex dataset that you plan on querying multiple times, such as the weather data for just La Guardia and JFK. This also helps minimize errors during your analysis.

When sorting data in a spreadsheet, you can choose "Sort sheet" or "Sort range."

Sort Sheet: all of the data in a spreadsheet is sorted by the conditions of a single column, but the related information across each row stays together. Sort Sheet keeps all of the data together that you sorted Data - Sort - Sort sheet Sort Range: On the other hand, "Sort range" doesn't keep the information across rows together. When you sort a range, you're selecting a specific collection of cells or the range that you want the sorting limited to. Nothing else on the spreadsheet gets rearranged but the specified cells. Sort range only sorts one column and doesn't sort any of the other rows. This causes the data to be jumbled as all of the data doesn't stay grouped together. You will most likely use Sort Sheet but should know both to not get them mixed up

Data analysts also need to format and adjust data when performing an analysis

Sorting and filtering are two ways you can keep things organized when you format and adjust data to work with it. For example, a filter can help you find errors or outliers so you can fix or flag them before your analysis.

Sorting and filtering in sheets

Sorting in Google Sheets helps you quickly spot trends in numbers. One trend might be gross revenue by sales region. In this case, you could sort the gross revenue column in descending (Z to A) order to spot the top performing regions at the top, or sort the gross revenue column in ascending (A-Z) order to spot the lowest performing regions at the top. Although an alphabetical order is implied, these sorting options do sort numbers, as our gross revenue example highlighted. And like the SORT function, you can use the FILTER function to filter by any matching criteria you like. This creates a custom filter. You might recall that you can filter data and then sort the filtered results. Using the FILTER and SORT functions together in a range of cells can programmatically and automatically achieve these results for you.

Sorting for data analysts (SQL)

Sorting is a useful way to rearrange data because it can help you understand the data you have in a different light. As you've probably already noticed, a lot of things you can do in spreadsheets can also be done in SQL. Sorting is one of those things. When a spreadsheet has too much data, you can get error messages, or it can cause your program to crash. SQL shortens processes that would otherwise take a very long time or be impossible to complete in a spreadsheet.

Sorting Datasets

Sorting is amazing. Not only does it add order and meaning to your spreadsheets, it also gives you the power to reimagine data altogether. When you sort data based on a specific metric, you can uncover new patterns and relationships within datasets you might not have otherwise noticed. This is especially true for spreadsheets, which you'll use a lot in your work as a data analyst. Knowing how to sort data in spreadsheets can make you a stronger and more confident analyst.

Sorting

Sorting is when you arrange data into a meaningful order to make it easier to understand, analyze, and visualize. It ranks your data based on a specific metric you choose. You can sort data in spreadsheets, SQL databases (when your dataset is too large for spreadsheets), and tables in documents. For example, if you need to rank things or create chronological lists, you can sort by ascending or descending order Sorting will arrange the data in a meaningful way and give you immediate insights. Sorting also helps you to group similar data together by a classification Sorting is when you arrange data into a meaningful order to make it easier to understand, analyze, and visualize. Sorting ranks your data based on a specific metric that you can choose. You can sort data in spreadsheets and databases that use SQL.

Most data I will use in my analysis will be organized in tables

Tables help organize similar kinds of data into categories and subject areas that you can focus on as you analyze You can use these tables and fields to help you decide how to move forward with your analysis. The structure of this database can help you decide which data you need to pull to meet your objectives. Tables allow you to make decisions about data types. They help you to figure out what variables you need and the data type those variables should have. So if you have a database where you need to convert a data type during your analysis, you can do that by using the CAST command in SQL or any other method that you learn on the job or from your own research If you're performing your analysis in a spreadsheet, you want to make sure that the columns and rows are effectively organized. You can even hide columns that you won't need for analysis or that show duplicate information. The organization of datasets is really important for data analysts. Most of the datasets you will use will be organized as tables. Tables are helpful because they let you manipulate your data and categorize it. Having distinct categories and classifications lets you focus on, and differentiate between, your data quickly and easily.

Benefit of filtering data

The benefit of filtering the data is that after you fix errors or identify outliers, you can remove the filter and return the data to its original organization.

Analysis

The process used to make sense of the data collected Taking the right steps to proceed and think about your data in different ways The goal of analysis is to identify trends and relationships within the data so that you can accurately answer the question you're asking.

Filtering in SQL

You can also filter data in SQL using the WHERE clause. The WHERE clause works similarly to filtering in a spreadsheet because it returns rows based on a condition you name. If the data in a where clause is a string format, you have to use single or double quotations You can apply multiple filters to a database. You can even sort and filter data at the same time for even more precise results.

Sorting and filtering in excel

You can also sort in ascending (A-Z) and descending (Z-A) order in Microsoft Excel. Excel offers Smallest to Largest and Largest to Smallest sorting when you are working with numbers. Similar to the SORT function in Google Sheets, Excel includes custom sort capabilities that are available from the menu. After you select the data range, click the Sort & Filter button to select the criteria for sorting. You can even sort by the data in rows instead of by the data in columns if you select Sort left to right under Options. (Sort top to bottom is the default setting to sort the data in columns.)

Filtering in spreadsheets

You can use filters and spreadsheet programs, like Excel and Sheets, to only display data from rows that match the range or condition you've set.

Using sorts and filters to display different info in SQL

You might remember that while sorting puts data in a specific order, filters narrow down data, so you only see data that fits the filter. In order to do this, just combine a WHERE clause with the ORDER BY clause You can combine the AND filter with the WHERE clause to filter for two conditions


Set pelajaran terkait

18 Regulation of Gene Expression

View Set

NC - Policy Provisions, Options, and Other Features

View Set

Judicial Activism vs. Judicial Restriant

View Set

Legal Environment of Business Mid Term

View Set

Internal 2 Exam 2 (after quizzes 1 and 2) Logan U

View Set

CompTIA A+ Exam 220-1101 - Network Protocols Quiz

View Set

Chapter 3: Temperature and Humidity Extremes

View Set

Chapter 29: Orthopaedic Injuries

View Set

ЗНО англ слова которые чаще всего встречаются (F, H, I)

View Set