Pandas

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

obj=Series([4,7,-5,3],dtype="int64") values=obj.values index=obj.index for i in index: print i, values[i] what happens if the print statement is not indented?

0 4 1 7 2 -5 3 3 ## i corresponds to indexes, values[i] corresponds to values - error

obj=Series([4,7,-5,3, "a"]) print obj

0 4 1 7 2 -5 3 3 4 a dtype: object

obj3=Series(["blue","orange","purple","yellow"], index=[0,1,2,4]) print obj3 print obj3.reindex(range(6),method="ffill")

0 blue 1 orange 2 purple 3 purple 4 yellow 5 yellow dtype: object

obj=Series([4,7,-5,3], index=["d","b","a","c"]) print obj["d"] print obj[["b","d"]]

4 b 7 d 4 dtype: int64

data={"State":["Ohio","Ohio","ohio","Nevada","Nevada"], "year":[2000,2001,2002,2001,2002], "pop":[1.5,1.7,3.6,2.4,2.9]} frame=DataFrame(data) print frame print frame1.values print frame1.values.dtype

State pop year 0 Ohio 1.5 2000 1 Ohio 1.7 2001 2 ohio 3.6 2002 3 Nevada 2.4 2001 4 Nevada 2.9 2002 [['Ohio' 1.5 2000] ['Ohio' 1.7 2001] ['ohio' 3.6 2002] ['Nevada' 2.4 2001] ['Nevada' 2.9 2002]] object

Two main structures of pandas in python?

Series and DataFrame built in Numpy

obj=Series([4,7,-5,3],dtype="int64") values=obj.values index=obj.index print values print index

[ 4 7 -5 3] RangeIndex(start=0, stop=4, step=1)

pop={"Nevada":{2001:2.4,2002:2.9}, "Ohio":{2000:1.5,2001:1.7,2002:3.6}} frame2=DataFrame(pop) print frame2.values print frame2.values.dtype

[[ nan 1.5] [ 2.4 1.7] [ 2.9 3.6]] float64 - values attribute to get the data as a 2D array.

Create a 1D array of numbers from 0 to 9

arr = np.arange(10) arr #> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

obj=Series([4,7,-5,3], index=["d","b","a","c"]) print obj

d 4 b 7 a -5 c 3 dtype: int64

obj=Series([4.5,7.2,-5.3,3.6],index=["d","b","a","c"]) print obj obj2=obj.reindex(["a","b","c","d"]) print obj2

d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 a -5.3 b 7.2 c 3.6 d 4.5 dtype: float64

obj=Series([4,7,-5,3], index=["d","b","a","c"]) print obj*2 print obj #Purpose of this last statement?

d 8 b 14 a -10 c 6 dtype: int64

obj=Series([4,7,-5,3], index=["d","b","a","c"]) print obj > 0 print obj[obj>0] print obj

d True b True a False c True dtype: bool d 4 b 7 c 3 dtype: int64 d 4 b 7 a -5 c 3 dtype: int64

Create a 3×3 numpy array of all True's

np.full((3, 3), True, dtype=bool) #> array([[ True, True, True], #> [ True, True, True], #> [ True, True, True]], dtype=bool) # Alternate method: np.ones((3,3), dtype=bool)

Create a series and assign dtype? and

obj=Series([4,7,-5,3],dtype="int64")

reindexing a series?

obj=Series([4.5,7.2,-5.3,3.6],index=["d","b","a","c"]) print obj obj2=obj.reindex(["a","b","c","d"]) print obj2

Creating a series?

s = pd.Series([ ])

What is Pandas?

pandas is an open source Python library for data analysis.

Pandas datatypes?

- "object" - "float64" (takes priority) - "int64" - "boolean"

iloc? and how to import?

- Selection By Position (purely integer based indexing). -

Steps to Creating a DataFrame using a dictionary?

1. Create a dictionary data={"State":["Ohio","Ohio","ohio","Nevada","Nevada"], "year":[2000,2001,2002,2001,2002], "pop":[1.5,1.7,3.6,2.4,2.9]} 2. Create a variable to contain the dictionary and print as a DataFrame frame=DataFrame(data)

Steps to Creating a DataFrame using a Nested dictionary?

1. Create a dictionary pop={"Nevada":{2001:2.4,2002:2.9}, "Ohio":{2000:1.5,2001:1.7,2002:3.6}} 2. 2. Create a variable to contain the dictionary and print as a DataFrame frame=DataFrame(data)

DataFrame?

A DataFrame is a tabular data structure comprised of rows and columns, akin to a spreadsheet, database table. You can also think of a DataFrame as a group of Series objects that share an index (the column names). For the rest of the tutorial, we'll be primarily working with DataFrames.

Series?

A Series is a one-dimensional object similar to an array, list, or column in a table. It will assign a labeled index to each item in the Series. By default, each item will receive an index label from 0 to N, where N is the length of the Series minus one.

pop={"Nevada":{2001:2.4,2002:2.9}, "Ohio":{2000:1.5,2001:1.7,2002:3.6}} frame=DataFrame(pop) print frame

Nevada Ohio 2000 NaN 1.5 2001 2.4 1.7 2002 2.9 3.6 ***Nevada and Ohio are keys of the dictionary. {2001:2.4,2002:2.9},{2000:1.5,2001:1.7,2002:3.6}} are the values. ***Tears are also keys with the values as the decimals.

obj=Series([4,7,-5,3,"a"],dtype="int64") print obj

error

Import numpy as `np` and print the version number.

import numpy as np print(np.__version__) #> 1.13.3

Modules for Importing Pandas?

import pandas as pd import numpy as np from pandas import Series, DataFrame


Ensembles d'études connexes

Types and Examples of Conjunctions

View Set

Psychology Exam 4 Practice Questions

View Set

BA. Overview of History of Terrorism Week 2 Lec 1

View Set

Exam 3 - CH 16 Recommended Problems

View Set