Pandas
obj=Series([4,7,-5,3],dtype="int64") values=obj.values index=obj.index for i in index: print i, values[i] what happens if the print statement is not indented?
0 4 1 7 2 -5 3 3 ## i corresponds to indexes, values[i] corresponds to values - error
obj=Series([4,7,-5,3, "a"]) print obj
0 4 1 7 2 -5 3 3 4 a dtype: object
obj3=Series(["blue","orange","purple","yellow"], index=[0,1,2,4]) print obj3 print obj3.reindex(range(6),method="ffill")
0 blue 1 orange 2 purple 3 purple 4 yellow 5 yellow dtype: object
obj=Series([4,7,-5,3], index=["d","b","a","c"]) print obj["d"] print obj[["b","d"]]
4 b 7 d 4 dtype: int64
data={"State":["Ohio","Ohio","ohio","Nevada","Nevada"], "year":[2000,2001,2002,2001,2002], "pop":[1.5,1.7,3.6,2.4,2.9]} frame=DataFrame(data) print frame print frame1.values print frame1.values.dtype
State pop year 0 Ohio 1.5 2000 1 Ohio 1.7 2001 2 ohio 3.6 2002 3 Nevada 2.4 2001 4 Nevada 2.9 2002 [['Ohio' 1.5 2000] ['Ohio' 1.7 2001] ['ohio' 3.6 2002] ['Nevada' 2.4 2001] ['Nevada' 2.9 2002]] object
Two main structures of pandas in python?
Series and DataFrame built in Numpy
obj=Series([4,7,-5,3],dtype="int64") values=obj.values index=obj.index print values print index
[ 4 7 -5 3] RangeIndex(start=0, stop=4, step=1)
pop={"Nevada":{2001:2.4,2002:2.9}, "Ohio":{2000:1.5,2001:1.7,2002:3.6}} frame2=DataFrame(pop) print frame2.values print frame2.values.dtype
[[ nan 1.5] [ 2.4 1.7] [ 2.9 3.6]] float64 - values attribute to get the data as a 2D array.
Create a 1D array of numbers from 0 to 9
arr = np.arange(10) arr #> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
obj=Series([4,7,-5,3], index=["d","b","a","c"]) print obj
d 4 b 7 a -5 c 3 dtype: int64
obj=Series([4.5,7.2,-5.3,3.6],index=["d","b","a","c"]) print obj obj2=obj.reindex(["a","b","c","d"]) print obj2
d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 a -5.3 b 7.2 c 3.6 d 4.5 dtype: float64
obj=Series([4,7,-5,3], index=["d","b","a","c"]) print obj*2 print obj #Purpose of this last statement?
d 8 b 14 a -10 c 6 dtype: int64
obj=Series([4,7,-5,3], index=["d","b","a","c"]) print obj > 0 print obj[obj>0] print obj
d True b True a False c True dtype: bool d 4 b 7 c 3 dtype: int64 d 4 b 7 a -5 c 3 dtype: int64
Create a 3×3 numpy array of all True's
np.full((3, 3), True, dtype=bool) #> array([[ True, True, True], #> [ True, True, True], #> [ True, True, True]], dtype=bool) # Alternate method: np.ones((3,3), dtype=bool)
Create a series and assign dtype? and
obj=Series([4,7,-5,3],dtype="int64")
reindexing a series?
obj=Series([4.5,7.2,-5.3,3.6],index=["d","b","a","c"]) print obj obj2=obj.reindex(["a","b","c","d"]) print obj2
Creating a series?
s = pd.Series([ ])
What is Pandas?
pandas is an open source Python library for data analysis.
Pandas datatypes?
- "object" - "float64" (takes priority) - "int64" - "boolean"
iloc? and how to import?
- Selection By Position (purely integer based indexing). -
Steps to Creating a DataFrame using a dictionary?
1. Create a dictionary data={"State":["Ohio","Ohio","ohio","Nevada","Nevada"], "year":[2000,2001,2002,2001,2002], "pop":[1.5,1.7,3.6,2.4,2.9]} 2. Create a variable to contain the dictionary and print as a DataFrame frame=DataFrame(data)
Steps to Creating a DataFrame using a Nested dictionary?
1. Create a dictionary pop={"Nevada":{2001:2.4,2002:2.9}, "Ohio":{2000:1.5,2001:1.7,2002:3.6}} 2. 2. Create a variable to contain the dictionary and print as a DataFrame frame=DataFrame(data)
DataFrame?
A DataFrame is a tabular data structure comprised of rows and columns, akin to a spreadsheet, database table. You can also think of a DataFrame as a group of Series objects that share an index (the column names). For the rest of the tutorial, we'll be primarily working with DataFrames.
Series?
A Series is a one-dimensional object similar to an array, list, or column in a table. It will assign a labeled index to each item in the Series. By default, each item will receive an index label from 0 to N, where N is the length of the Series minus one.
pop={"Nevada":{2001:2.4,2002:2.9}, "Ohio":{2000:1.5,2001:1.7,2002:3.6}} frame=DataFrame(pop) print frame
Nevada Ohio 2000 NaN 1.5 2001 2.4 1.7 2002 2.9 3.6 ***Nevada and Ohio are keys of the dictionary. {2001:2.4,2002:2.9},{2000:1.5,2001:1.7,2002:3.6}} are the values. ***Tears are also keys with the values as the decimals.
obj=Series([4,7,-5,3,"a"],dtype="int64") print obj
error
Import numpy as `np` and print the version number.
import numpy as np print(np.__version__) #> 1.13.3
Modules for Importing Pandas?
import pandas as pd import numpy as np from pandas import Series, DataFrame