DSC 1O WEEK 3-WHO KNOWS

Ace your homework & exams now with Quizwiz!

Histogram

-chart that displays the distribution of numerical values -uses bins, one bar for each bin -uses the area principle: the area of each bar is the percent of individuals in the corresponding bars. AREA IS THE PERCENTAGE Have to follow the area principle

aged.bin('Age',bins=make_array(0,5,10,20) second to last row is?

10. It won't look for things over 20. 20 will have 0. all of the rows that were cut off or are the very last are included into the second of last row(or the last argument)

What should happen to out histogram if we combine the two bins[20,40) and [40,60) into one large bin [20 to 60

20,60 bin has to be twice as wide. since the width is fixed, you cant change the height and area. The area of the bar for bin [20,60) should be the sum of the areas of the bars for bins [20,40) and [40,60). This preserves the area.YOU HAVE TO ADD THE AREAS. AREA IS

num_rows

Compute the number of rows in a table

select

Create a copy of a table with only some of the columns

take()

Create a copy of the table with only the rows whose indices are in the given array

Height measures.....

DENSITY.....-how packed things are in the bin. MOST DENSE= MORE STUFF, odr the height of the column. density depends on width

Question 1. Assign us_death_rate to the total US annual death rate during this time interval (July 1, 2016 to July 1, 2017). The annual death rate for a year-long period is the number of deaths in that period as a proportion of the population at the start of the period.

Question 1. Assign us_death_rate to the total US annual death rate during this time interval (July 1, 2016 to July 1, 2017). The annual death rate for a year-long period is the number of deaths in that period as a proportion of the population at the start of the period. In [37]: us_death_rate = sum(pop.column('DEATHS'))/sum(pop.column('2017')) us_death_rate

Table

Table() Create an empty table, usually to extend with data

Table.read_table

Table.read_table("my_data.csv") Create a table from a data file

Histogram AXES

The area of a bar isa percentage of the whole area=% This horizontal axi

What should happen to out histogram if we combine the two bins[20,40) and [40,60) into one large bin [20 to 60 )? What is the density of the new bin

The new bin has about twice as many movies and is twice as big as each original bin, so it is about the same density as each original bin. doubling width, and height, so that we cna keep the height to abou the same

don't use norm=FALSE

True

def f(s) return np.round(s/sum(s) *100,2) 1.What does this function do?

a

count the specific number on the axis, for each age range, how many things fall in that age range

aged.hist('Age', bins=np.arange(0,101,20),normed=False)

group

aggregates all rows with the same value for a column inot a single row in the result First arg: which column you want to manipulate second:w​hat to do with the other columns

group by color

all_cones.group('Cones;) cOLOR. COUNT brown. 1 red. 2

group by Flavor and Color

all_cones.group(['Flavor','Color']) the argument should be in a list

bar

already grouped by a topic history: decide bins, based off od different buckets

def f(s) return np.round(s/sum(s) *100,2) 13.What output will it give? 14.What output will it give?

array of numbers an array of numbers

Question 2. Sort the data in decreasing order by NEI, naming the sorted table by_nei. Create another table called by_nei_pter that's sorted in decreasing order by NEI-PTER instead.

by_nei = unemployment.sort('NEI',descending=True) by_nei_pter = unemployment.sort('NEI-PTER',descending=True)

HW3 Question 2. Sort the data in decreasing order by NEI, naming the sorted table by_nei. Create another table called by_nei_pter that's sorted in decreasing order by NEI-PTER instead.

by_nei = unemployment.sort('NEI',descending=True) by_nei_pter = unemployment.sort('NEI-PTER',descending=True)

Question 5. Add pter as a column to unemployment (named "PTER") and sort the resulting table by that column in decreasing order. Call the table by_pter.

by_pter = unemployment.with_columns("PTER", pter).sort("PTER",descending=True) by_pter

starters.group('TEAM',max)

chooses the biggest of the letter with letter in the alphabet with the letter at the endmost part of the alphabet

scatter plot

compare two numerical data types.

Question 3. Make a table of the number of complaints made against each company. Call it complaints_per_company. It should have one row per company and 2 columns: "company" (the name of the company) and "number of complaints" (the number of complaints made against that company).

complaints_per_company = complaints.group('company').relabeled("count", "number of complaints") complaints_per_company

Question 5. Make a bar chart of just the 5 companies with the most complaints.

complaints_per_company.sort("number of complaints",descending=True).take(np.arange(5)).barh("company")

Question 6. Make a bar chart like the one above, with one difference: The size of each company's bar should be the proportion (among all complaints made against any company in complaints) that were made against that company.

complaints_per_company.with_column("proportion of all complaints", complaints_per_company.column("number of complaints")/complaints.num_rows)\ .sort("proportion of all complaints",descending=True)\ .drop("number of complaints")\ .take(np.arange(5))\ .barh('company')

HW3 How many complaints were made against each kind of product? Make a table called 'complaints_per_product' with one row per product category and 2 columns: "product" (the name of the product) and "number of complaints" (the number of complaints made against that kind of product). You should be able to do this in one line of code.

complaints_per_product=company.group('product').relabeled('count','"number of complaints") complaints_per_product

def f(s) return np.round(s/sum(s) *100,2) 12.What kind of input does it take? examps s=1,2,35 5/6=1/6,2/6,3/6 5/6*100=1/6*100,2/6*100, 3/6*100

computes percents

most expensive chocolate ice-cream

cones.where('Flavor,'chocolate').column

Binning

counting the number of numerical values that lie within ranges, called bins (put numbers into groups, based on the range) inluding left start point ​and exclusive on the right side endpoint

Apply

creates an array by calling a function on every element in input column(s) table_name.apply(function_name, 'column_label')

with_column("name",.....)

data that you want to go into that column. new column will be added to the end of the graph

def spread (values) return max(values)-min(values)

def spread (values): Name. Argument names(parameters)

sort() is default.....

default false

c_to_F(y/4)

does y/4, calls C_to_F(and plugs in the value)

Question 4.3. What's the title of the earliest movie in the dataset? You could just look this up from the output of the previous cell. Instead, write Python code to find out.

earliest_movie_title =imdb_by_year.column('Title').item(0) earliest_movie_title

do it for all the data at one time

every set of parents, predict the height of their child , compare prediction height with the actual height

Question 2. Assign fastest_growth to an array of the names of the five states with the fastest population growth rates in descending order of growth rate.

fastest_growth = pop.with_column('R', -pop.column(3)/pop.column(2)).sort('R').take(np.arange(5)).column(1) # SOLUTIONfastest_growth

aged.bin('Age, bins=make_array(2,4,6,8,10)

fin2,4,6. fine level of detail

add my_flower to the original table using...

flowers.with_row(my_flower)

L.7group('Age')

for each group how many movies there were for that age

nba.

for each possible pair of team and position, find the max of each player max is measuring the max string , /c its measuring in alphabeitcal order

aged.hist('Age', bins=np.arange(0,101,20),unit='year')

from 0 to 100, in chinks of 20. (age) is added to the z and y axis labels, default measures a weird percentage

Question 3. Use take to make a table containing the data for the 8 quarters when NEI was greatest. Call that table greatest_nei.

greatest_nei = by_nei.take(np.arange(8)), it's upperbound is exclusive , so greatest_nei

cones.group('Flavor')

group by flavor Flavor count chocolate 3 strawberry 2

L.7top.group('Studio')

group by studio

crowdness of bins is......

height

L.7distribute

how many people have that value

If we want just the ratings of the movies, we can get an array that contains the data in that column:

imdb.column("Rating").....returns an array

If you create a table column from a list, it will

it will automatically be converted to an array. A row, on the ther hand, mixes types.

if the column has numbers,

it will sort numerically.

Question 5. Assign less_than_west_births to the number of states that had a total population in 2017 that was smaller than the number of babies born in region 4 (the Western US) during this time interval.

less_than_west_births =pop.where('2017',are.below(west_births)).num_rows less_than_west_berths

Question 3. Assign movers to the number of states for which the absolute annual rate of migration was higher than 0.5%. The annual rate of migration for a year-long period is the net number of migrations (in and out) as a proportion of the population at the start of the period. The MIGRATION column contains estimated annual net migration counts by state.

movers = pop.with_column("test", pop.column("MIGRATION")/pop.column("2016")).where("test",are.above(.005)).num_rows movers

The horizontal axis is a

number line

L.7ov sum is over 100 , so

overlap

Are measures.....

percent

look at silimalr families, mid_parent function

predict the result. should be bale to vary

Compute an array containing the percentage of people who were PTER in each quarter. (The first element of the array should correspond to the first row of unemployment, and so on.)

pter = unemployment.column('NEI-PTER')-unemployment.column('NEI') pter

Function requirements are not ......

required

def cut_off_at_100(age) 'tHE SMALLER OF age AND 100' return their age or 100, whichever is smaller cut_off_at_100

return min(age,100) 104

what if some of the columns can't be summed b/c they're strings?

select the columns you want and and then group nba.select('POSITION','SALARY').group('POSITION,np.mean).sort('SALARY',descending=True)

histogram

show the distribution of numerical data. don't use it when. each column represents a group defined by a continuous, quantitative variable

L.7 easier to compare

sort first

If the column has strings in it....

sort will sort alphabetically

who is the best payed starter

start_salaries.

Which will rank the teams in order of their highest-paid starter?

starters.select('TEAM

given:'Data Science rocks! Data Science rocks: length is 19 Define a function str_len that takes a string as a parameter and retruns a new string that consists of: The given string a colon and a space "length is" the. length of the string

str_len def str_len(s) return st": length is"+ str(len(s)) #turn into a strength

L.7What proportion did not use their phone for online banking

sum is over 100 , so

combine tables

t=drinks.join('Cafe',discounts,"Location")

using apply with multiple arguments def midParent(mother_height,father_height)

table_name.

a common method to use with np.arange

take()

both sorted.take(np.arange(18,30)?

take() function only displays certain rows in the given criteria... 8-29

with_columns

tbl = Table().with_columns("N", np.arange(5), "2*N", np.arange(0, 10, 2)) Create a copy of a table with more columns

column

tbl.column("N") Create an array containing the elements of a column

drop

tbl.drop("2*N") Create a copy of a table without some of the columns

where

tbl.where("N", are.above(2)) Create a copy of a table with only the rows that match some predicate

cones.group('Flavor,max)

the second and thirds rows will have have the max value from different object that get grouped together

def C_to_F(x_: return x*9/5+32

to define a function, make your own function

If the name od the table is top and the name of our function is str_len, how do we find the length of each movie title?

top.apply(string_len, "Title")

turn a data table into a plot graph with x and y axis labeled

unemployment.with_columns("PTER",pter,"Year",2000+ np.arange(by_pter.num_rows)/4)..plot(x-axis,y-axis)

bar chart

used to compare variables. each column(or row represents a group defined by a categorial ​variable

Question 4. Assign west_births to the total number of births that occurred in region 4 (the Western US).

west_births = sum(pop.where('REGION',are.equal_to('4')).column('BIRTHS')) west_births

add a row to a table

with_row

L.7scatter()

x and y axis labels

minimize the cost

you should get espresso at nefeli

starters.drop('POSITION).group('TEAM,max).sort(1,descending=True)co

you're sorting by column 1


Related study sets

Chapter 1 Life Insurance Missed Questions

View Set

In my house في بيتي (Home rooms غرف البيت )

View Set

Begrepp i Robotprogrammering och Scratch JR, Scratch JR Block på svenska

View Set

The French Revolution AP Euro (AP World Unit 3 Test)

View Set

CHAPTER 23 Nursing Care of Patients With Valvular, Inflammatory, and Infectious Cardiac or Venous Disorders

View Set

Chapter 14: Substance Use and Gambling Disorders

View Set