Python Programming II- Midterm
matplotlib
#58 dataset_1 = anscombe[anscombe['dataset'] == 'I'] plt.plot(dataset_1['x'], dataset_1['y']) #59 plt.plot(dataset_1['x'], dataset_1['y'], 'o')
working with mutable vs immutable args
#immutable The double_the_number() function def double_the_number(value) : value = value * 2 # new int object created return value # new int object must be returned ◦ The calling code in the main() function value1 = 25 # int object created value2 = double_the_number(value1) print(value1) # 25 print(value2) # 50 #mutable The add_to_list() function def add_to_list(list, item): list.append(item) # list object changed ◦ The calling code in the main() function # list object created inventory = ["staff", "hat", "bread"] add_to_list(inventory, "robe") print(inventory) # ["staff","hat","bread","robe"] # NOTE: no need to return list object
strings
- can be enclosed in single or double quotes - multi line strings uses triple quotes - string literals: print('python is cool')
it makes sense to use inheritance when...
- one object is a type of another object - both classes are part of the same logical domain - the subclass primarily adds features to the superclass
Python will sort the strings that follow in this sequence: Peach peach 1peach 10Peaches
10Peaches, 1peach, Peach, peach
relational operators
== equal to != not equal to > greater than < less than >= greater than or equal to <= less than or equal to
chaining decorators
@star @percent def printer(msg): print(msg) is equivalent to def printer(msg): print(msg) printer = star(percent(printer)) ***************** %%%%%%%%%% Hello %%%%%%%%%% ******************
boxplot
Bivariate: ◦A box plot shows multiple statistics: the minimum, first quartile, median, third quartile, maximum, and, if applicable, outliers based on the interquartile range. boxplot = plt.figure() axes1 = boxplot.add_subplot(1,1,1) axes1.boxplot( [tips[tips['sex'] == 'Female']['tip'], tips[tips['sex'] == 'Male']['tip']], labels = ['Female', 'Male']) axes1.set_title('Boxplot of Tips by Sex') axes1.set_xlabel('Sex') axes1.set_ylabel('Tip') boxplot.show()
violin plot
Bivariate: ◦Boxplots are a classical statistical visualization, but they can obscure the underlying distribution of the data. ◦Violin plots are able to show the same values as a boxplot, but plot the "boxes" as a kernel density estimation.
hexbin plot
Bivariate: ◦Scatterplots are great for comparing two variables. However, sometimes there are too many points for a scatterplot to be meaningful. ◦Just as histograms can bin a variable to create a bar, so hexbin can bin two variables. ◦ A hexagon is used for this purpose because it is the most efficient shape to cover an arbitrary 2D surface.
scatterplot
Bivariate: ◦Scatterplots are used when a continuous variable is plotted against another continuous variable. scatter_plot = plt.figure() axes1 = scatter_plot.add_subplot(1,1,1) axes1.scatter(tips['total_bill'], tips['tip']) axes1.set_title('Scatterplot of Total Bill vs Tip') axes1.set_xlabel('Total Bill') axes1.set_ylabel('Tip') scatter_plot.show()
scatter plot bivariate
Bivariate: ◦There are a few ways to create a scatterplot in seaborn. ◦ There is no explicit function named scatter. Instead, we use regplot. It will plot a scatterplot and also fit a regression line
2d density plot
Bivariate: ◦You can also create a 2D kernel density plot. ◦This kind of process is similar to how sns.kdeplot works, except it creates a density plot across two variables.
deepcopy()
How to make a deep copy of a list import copy list_one = [1, 2, 3, 4, 5] list_two = copy.deepcopy(list_one) list_two[1] = 4 print(list_one) # [1, 2, 3, 4, 5] print(list_two) # [1, 4, 3, 4, 5]
IDE
Integrated Development Environment- makes program management easier
logical operators order of presence
NOT AND OR
SciPy
Python-based ecosystem of open-source software for mathematics, science, and engineering.
rug plot
Univariate ◦Rug plots are a one-dimensional representation of a variable's distribution. ◦They are typically used with other plots to enhance a visualization.
histograms
Univariate: ◦Histograms are the most common means of looking at a single variable. ◦The values are "binned," meaning they are grouped together and plotted to show the distribution of the variable. fig = plt.figure() axes1 = fig.add_subplot(1,1,1) axes1.hist(tips['total_bill'], bins=10) axes1.set_title('Histogram of Total Bill')
how to define list of lists
With 3 rows and 4 columns students = [ ["Joel", 85, 95, 70], ["Anne", 95, 100, 100], ["Mike", 77, 70, 80, 85] ] ◦ With 3 rows and 3 columns movies = [ ["The Holy Grail", 1975, 9.99], ["Life of Brian", 1979, 12.30], ["The Meaning of Life", 1983, 7.50] ]
escape sequences
\n: new line \t: tab \r: return \": quotations \':single quotations \\: backslash
what is playbook?
a single YAML file play- defines a set of activities(tasks) to be run on hosts task- an action to be performed on the host - execute command list of dictionaries, each play is a dictionary, each dictionary has a list of properties(name, hosts, tasks), tasks is a list, lists are ordered collections
ansible playbook
adds a user, create the hosts and tasks user is a module written for you in python, when you push it's created - hosts: all_my_web_servers_in_DR tasks: - user: name: johndoe
run ping command ansible
ansible target1 -m ping -i inventory.txt calls that text file because it stores the passwords result: target1 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python3" }, "changed": false, "ping": "pong" }
run playbook
ansible-playbook playbook.yaml ansible-playbook --help
len(list)
builtin function for getting the length of a list
multivariant data
can be represented in the form of different colors to easily analyze data def recode_sex(sex): if sex == 'Female': return 0 else: return 1 tips['sex_color'] = tips['sex'].apply(recode_sex) scatter_plot = plt.figure() axes1 = scatter_plot.add_subplot(1,1,1) axes1.scatter(tips['total_bill'], tips['tip']) axes1.set_title('Scatterplot of Total Bill vs Tip') axes1.set_xlabel('Total Bill') axes1.set_ylabel('Tip') scatter_plot.show()
yaml format
can have key:value pair, array lists, and dictionaries
pandas: exporting and importing values
can use pickle to save to binary format scientists.to_pickle('../output/scientists_df.pickle') scientist_names_from_pickle = pd.read_pickle('../output/scientists_names_series.pickle') scientists_from_pickle = pd.read_pickle('../output/scientists_df.pickle') can also save to csv in similar way
count()
count = numlist.count(14) # 2
controller
create one computer as ansible, push to all other computers, that one computer is the controller that can do things like installing security patches, adding user, reset admin password, etc
@property
decorator that lets us make a method behave like an attribute. ex) @property def fullname(self): return "%s%s" %(self.name, self.surname)
constructor
def __init__(self[, parameters]): self.attrname1 = attrValue1 self.attrName2 = attrValue2
make_pretty()
def make_pretty(func): def inner(): print("I got decorated") func() return inner can use the @ symbol along with the name of the decorator function and place it above the definition of the function to be decorated.
pandas: grouped and aggregated cols
df.groupby('year')['lifeExp'].mean() split data into parts by year, then get the lifeexp col and calc the mean
pandas: subsetting rows
df.loc[0] - locates the 1st row and would print out the first record and its features in a print statement iloc does the same things as loc but used to subset by the row index number, can use -1 to get the last row
sorted()
foodlist = ["orange", "apple", "Pear", "banana"] How to use the key argument to fix the sort order sorted_foodlist = sorted(foodlist, key=str.lower) print(sorted_foodlist) # ["apple", "banana", "orange", "Pear"]
sort()
foodlist = ["orange", "apple", "Pear", "banana"] foodlist.sort(key=str.lower) # ["apple", "banana", "orange", "Pear"]
loop thru rows and cols of a 2d list
for movie in movies: for item in movie: print(item, end = " | ") print()
import class
from module_name import ClassName1
recursion
function that calls itself in a kind of loop, needs a base case/exit condition
tuple
how to create: - mytuple = (item1, item2, ...) #a tuple of 5 floating-point numbers Stats = (48.0, 30.5, 20.2, 100.0, 48.0) # a tuple of 6 strings Herbs = ("lavaender", "pokeroot", "chamomile", "valerian", "nettles", "oatstraw") # a tuple that stores the data for a movei Movie = ("Monty Python and the Holy Grail", 1975, 9.99) to access items: herbs[0] herbs[1:4]
choice() and shuffle()
import random numlist = [5, 15, 84, 3, 14, 2, 8, 10, 14] choice = random.choice(numlist) # gets random item random.shuffle(numlist) # shuffles items randomly
index()
index(item) i = inventory.index("hat") #1
input() function
input([prompt]) e.g.: first_name = input("Enter your first name: ")
console input/output
input: use method input() output: use method print()
check ip address on controller
ip a use to connect to controller on ansible
how to add to list of lists
movies = [ ["The Holy Grail", 1975, 9,99] , ["Life of Brian", 1979, 12.30] ] movie = [] # Create empty movie movie.append("The Meaning of Life") # Add name to movie movie.append(1983) # Add year to movie movie.append(7.5) # Add price to movie movies.append(movie) # Add movie to movies
writing to CSV file
movies = [["Monty Python and the Holy Grail", 1975], ["Cat on a Hot Tin Roof", 1958], ["On the Waterfront", 1954 ] ] ◦How to import the CSV module import csv ◦How to write the list to a CSV file with open("movies.csv", "w", newline = " ") as file : writer = csv.writer(file) writer = writer.writerows(movies)
round() function
mpg = round(miles_driven / gallons_used, 2)
list syntax
mylist = [item1, item2, ...] Code that creates lists temps = [48.0, 30.5, 20.2, 100.0, 42.0] # 5 float values inventory = ["staff", "hat", "shoes"] # 3 str values movie = ["The Holy Grail", 1975, 9.99] # str, int, float test_scores = [] # an empty list
slice a list
mylist[start:end:step] Code that slices with the start and end arguments numbers= [52, 54, 56, 58, 60, 62] numbers[0:2] # [52, 54] numbers[ :2] # [52, 54] numbers[4: ] # [60, 62] Code that slices with the step argument number[0: 4: 2] #[52, 56] number[ : : -1 ] # [62, 60, 58, 56, 54, 52]
playbook.yaml
name: Test connectivity to the target servrs hosts: all tasks: - name: Ping test ping:
lambda with map()
num1 = [4, 5, 6] num2 = [5, 6, 7] result = map(lambda n1, n2: n1+n2, num1, num2) print(list(result)) result: [9, 11, 13] (added two lists together by using a function and applying it to lists while mapping it)
min() and max()
numlist = [5, 15, 84, 3, 14, 2, 8, 10, 14, 25] minimum = min(numlist) # 2 maximum = max(numlist) # 84
reverse()
numlist.reverse()
create object of a class
objectName = ClassName([parameters]) e.g.: product1 = Product('Stanley 13 Ounce Wood Hammer', 12,99, 62)
access attrs of object
objectName.attributeName ex) product1.discountPercent = 40
open file in write mode and close file manually
outfile = open("test.txt", "w") outfile.write("Test") outfile.close()
melt()
pew_long = pd.melt(pew, id_vars='religion') Pandas has a function called melt that will reshape the dataframe into a tidy format. melt takes a few parameters: §id_vars is a container (list, tuple, ndarray) that represents the variables that will remain as is. §value_vars identifies the columns you want to melt down (or unpivot). By default, it will melt all the columns not specified in the id_vars parameter. §var_name is a string for the new column name when the value_vars is melted down. By default, it will be called variable. § value_name is a string for the new column name that represents the values for the var_name. By default, it will be called value.
The feature of inheritance that allows an object of a subclass to be treated as if it were an object of the superclass is known as
polymorphism
pop()
pop([index]) ◦inventory = ["staff", "hat", "robe", "bread"] ◦item = inventory.pop() # item = "bread" # inventory = ["staff", "hat", "robe"] ◦item = inventory.pop(1) # item = "hat" # inventory = ["staff", "robe"]
using chaining to get value in 1 stmt
price = float(input("Enter the price: "))
print() function
print([data])
builtin functions
python contains thousands of modules that contain many funcs that you can use in your programs, use 'import' to import the module
raise statement
raise ExceptionName("Error Message") raising a ValueError exception: raise ValueError("Invalid value") you can raise an exception for the Exception class or any class that's a child class of the Exception class.
range() function
range(stop) range(start, stop[, step]) Examples of the range() function range(5) # 0, 1, 2, 3, 4 range(1, 6) # 1, 2, 3, 4, 5 range(2, 10, 2) # 2, 4, 6, 8 range(5, 0, -1) # 5, 4, 3, 2, 1 - for i in range(5)
reader() in CSV
reader(file) ◦How to read data from a CSV file with open("movies.csv", newline=" ") as file: reader = csv.reader(file) for row in reader: print(row[0] + "(" + str(row[1]) + ")")
ValueError exception
reasons: can't convert the data argument into an int/float value syntax: try: statements except [ExceptionName]: statements
missing data
recode/replace - use fillna method to recode missin gvalues to another value fill forward - ebola.fillna(method='ffill').iloc[0:10, 0:5]) fill backward - ebola.fillna(method='bfill').iloc[0:10, 0:5]) interpolate - uses existing vals to fill in missing vals - fills in linearly, treats missing values as if they should be equally spaced apart - ebola.interpolate() drop missing values - use dropna() (ebola.dropna()
concatenation
row_concat = pd.concat([df1, df2, df3]) used for multiple things at once if using single obj, use append() use ignore_index to reset row index after concatenation
pandas: dropping values
scientists_dropped = scientists.drop(['Age'], axis=1)
str() function
str(data) used to join a number to other strings e.g.: ◦name = "Bob Smith" ◦age = 40 ◦message = name + "is" + str(age) + "years old"
datatypes
string: "Mike" int: 1 2 3 float: 21.9
strip()
strips whitespace from string
dictionary
syntax: dictionary_name = {key1:value1, key2:value2}
create inventory file on ansible
target1 ansible_host=192.168.1.42 ansible_user=shakour ansible_ssh_pass=whatever target2 ansible_host=192.168.1.52 ansible_user=shakour ansible_ssh_pass=whatever target3 ansible_host=192.168.1.45 ansible_user=shakour ansible_ssh_pass=whatever
@property
temp = property(get_temp, set_temp) this makes a property object temperature. Simply put, property attaches some code (get_temperature and set_temperature) to the member attribute accesses (temperature). Any code that retrieves the value of temperature will automatically call get_temperature() instead of a dictionary (__dict__) look-up. Similarly, any code that assigns a value to temperature will automatically call set_temperature(). This is one cool feature in Python. property() is a builtin func that creates and returns a property object. it has 3 methods: getter(), setter(), and deleter() to specify fget, fset, and fdel. they can be implemented as decorators
unzipping zip()
the * operator can be used in conjunction with zip() to unzip the list zip(*zippedList)
date and time
to import: from datetime import date, time, datetime to use: date.today() date.now()
density plot
univariate: den, ax = plt.subplots() ax = s.distplot(tips['total_bill'], hist=False) ax.set_title('Total Bill Density') ax.set_xlabel('Total Bill') ax.set_ylabel('Unit Probability') plt.show()
format()
used to format numbers into things like prices, percentages, etc. print("{:.2f}".format(fp_number))
1-to-1 merge
visited_subset = visited.loc[[0, 2, 6,] ] c2c_merge = site.merge(visited_subset, left_on='name', right_on='site)
use loop to read each line of file
with open("members.txt") as file: for line in file : print(line, end=" ") print() ◦How to read the entire file as a string with open("members.txt") as file: contents = file.read() print(contents)
read entire file as a list
with open("members.txt") as file: members = file.readlines(); print(members[0], end = " ") print(members[1]) ◦How to read each line of the file with open("members.txt") as file: member1 = file.readline(); print(member1, end=" ") member2 = file.readline(); print(member2)
write()
write(str) ◦How to write one line to a text file with open("members.txt", "w") as file: file.write("John Cleese\n") ◦How to append one line to a text file with open("members.txt", "a") as file: file.write("Eric Idle\n")
block structure
•A Code Block (or just simply a Block) is a set of statements grouped together as a section of the code. Blocks can also be nested within other blocks. •The creation of a code block requires a specific syntax to mark the beginning and the end of the block. Different languages use different syntax to denote the beginning and the end. Some example would be the use of the keywords "begin" and "end" as in ALGOL and SQL, while others use the curly braces "{" and "}". •Unlike most other programming languages, Python uses indentation to signify its block structure instead of brackets and braces. •The Python style guidelines recommend four spaces per level of indentation, and only spaces (no tabs). However, Python will work fine with any number of spaces or with tabs, providing that the indentation used is consistent.
statement
•A statement is the smallest unit of code that the Python interpreter can execute. •Example: •print('hello, world') •x = 1 •A program of Python usually contains an ordered sequence of statements. The interpreter will execute the statements one after another, top to bottom, just as you normally read a page in English.
class
•A user-defined prototype for an object that defines a set of attributes that characterize any object of the class. The attributes are data members (class variables and instance variables) and methods, accessed via dot notation.
value
•A value is one of the basic things a program works with, like a letter or a number. •For example, the value 'Hello, World!' is a string, it's called that because it contains a "string", or sequence, of letters. •You (and the interpreter) can identify strings because they are enclosed in quotation marks. •Note that you can use both single and double quotes to enclose a Python string. •If you are not sure what is the type of a specific value, you can use the method type. •print(type('hello, world')) •The statement above will print: •<class 'str'>
variable
•A variable is a storage location paired with an associated symbolic name (an identifier), which contains information referred to as a value. •Another way to think of a variable is as a named storage location in memory. •Variables in Python are dynamically-typed where the type of a variable is the type of the value it refers to. •Variables are created (defined) and initialized in the same statement. •If a variable is not "defined" (assigned a value), trying to use it will give you an error: NameError: name 'n' is not defined.
function arguments- arbitrary arguments
•An Argument can be defined as a tuple (*t), allowing variable number of values to be passed in the function call. Before any arbitrary variable definition, zero or more normal arguments can be passed.
expression
•An expression is a combination of values, variables, and operators. A value all by itself is considered an expression, and so is a variable. 17 x x + 17 •Every expression is a statement, but not every statement is an expression.
decorators
•Decorators to add functionality to an existing code •This is also called metaprogramming as a part of the program tries to modify another part of the program at compile time. •Everything in Python (Yes! Even classes) are objects. •Functions are no exceptions, they are objects too (with attributes). •Various different names can be bound to the same function object. •Functions can be passed as arguments to another function. •Furthermore, a function can return another function.
function arguments- default values
•Function arguments can have a default value assigned in the function definition. •Arguments with default value are optional, so functions with default value can be called with fewer arguments that it is defined to allow.
functions
•Function is a named sequence of statements that performs a computation. When you define a function, you specify the name and the sequence of statements. Later, you can "call" the function by name. •It is common to say that a function "takes" an argument and "returns" a result. The result is called the return value. •To create a new function, we use the keyword def followed by the function name, then the argument(s) then a colon. •Function name follows the same rules as variable name •The first line of the function definition is called the header; the rest is called the body. def circle_area(radius): •Inside the function, the arguments are assigned to variables called parameters.
matplotlib
•Matplotlib •Matplotlib is a Python 2D plotting library. • Produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. •Matplotlib can be used in: •Python scripts, •the Python and IPython shells, •the Jupyter notebook, •web application servers, and •four graphical user interface toolkits.
Numpy
•NumPy is the fundamental package for scientific computing with Python. •It contains among other things: •a powerful N-dimensional array object •sophisticated (broadcasting) functions •tools for integrating C/C++ and Fortran code •useful linear algebra, Fourier transform, and random number capabilities •NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
operators
•Operators are special symbols that represent computations like addition and multiplication. •The values the operator is applied to are called operands. •The operators +, -, *, / and ** perform addition, subtraction, multiplication, division and exponentiation, as in the following examples: hours = 1 minutes = 50 51 + 74 hours += 1 minutes = hours*60 + minutes minutes =/ 60 5**2 (5+9)*(15+7) •In addition, the modulus operator % is another useful operator.
arguments/parameters
•Parameters used inside a function are local •The life scope of a parameter is usable only inside the function •For basic types, arguments are passed by value
pandas
•Python Data Analysis Library •BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
SymPy
•SymPy is a Python library for symbolic mathematics. •It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. •SymPy is written entirely in Python. •Lightweight: SymPy only depends on mpmath, a pure Python library for arbitrary floating point arithmetic, making it easy to use.
functions(cont.)
•The function definition is followed by the function body statements indented to specify the method scope def circle_area(radius): return 3.14*(radius**2) •To call the function, we just use the function name and pass the desired arguments: print(circle_area(4)) •If the function has a return value, then we can use the value to a variable: x = circle_area(6) •There is a difference between the function type and the function return type:
map()
•The map() function applies a given function to each item of an iterable and returns a list of the results. •The returned value from map() (map object) then can be passed to functions like list() (to create a list), set() (to create a set) and so on.
return value from zip()
•The zip() function returns an iterator of tuples based on the iterable object. • •If no parameters are passed, zip() returns an empty iterator •If a single iterable is passed, zip() returns an iterator of 1-tuples. Meaning, the number of elements in each tuple is 1. •If multiple iterables are passed, ith tuple contains ith •Suppose, two iterables are passed; one iterable containing 3 and other containing 5 elements. Then, the returned iterator has 3 tuples. It's because iterator stops when shortest iterable is exhaused.
zip()
•The zip() function take iterables (can be zero or more), makes iterator that aggregates elements based on the iterables passed, and returns an iterator of tuples. •zip(*iterables) •zip() Parameters •iterables - can be built-in iterables (like: list, string, dict), or •user-defined iterables (object that has __iter__ method).
operators(cont)
•There is full support for floating point, operators with mixed type operands convert the integer operand to floating point. •Prior to Python 3.x, the division operator / performs floor division if the values were integer print(30/60) # 3.x => 0.5, 2.7 => 0 print(30.0/60.0) # 0.5 •In 3.4 the operator // is used for floor division print(10/4) # 2.5 print(10//4) # 2 •Operators can be overloaded in Python (Some libraries already have overloaded operators, Sets module uses | and & for union and intersection). •The operator ** is used to calculate the power, whereas the operator ^ is used as bitwise XOR
loops and iterations
•Two main pattern of loops: for and while •"for" loops are traditionally used when you have a piece of code which you want to repeat n number of times. On the other hand, "while" loop is used when a condition is to be met, or if you want a piece of code to repeat forever. •In an infinite while loop, you can use "continue" and "break" to control the loop exit strategy.
variables(cont.)
•Variable name can be created with a combination of letters, numbers, and underscore (_): •Rules: ◦It must start with a letter or underscore, but can't start with a number. ◦The remainder of the variable name may consist of letters, numbers and underscores ◦Names are case sensitive ◦It can't be a keyword •Conventions: ◦It's a good practice to start the variable with a lowercase letter ◦Use underscore to separate multiple words in the variable name (instead of CamelCase) ◦Variable name should be descriptive, meaningful, and indicates the variable usage •window_size_x = 500 •employee_name = 'john'
decorators: __call__()
•any object which implements the special method __call__() is termed callable. • •So, in the most basic sense, a decorator is a callable that returns a callable. • •Basically, a decorator takes in a function, adds some functionality and returns it.
bitwise operators
•x << y : Returns x with the bits shifted to the left by y places (and new bits on the right-hand-side are zeros). This is the same as multiplying x by 2**y. •x >> y : Returns x with the bits shifted to the right by y places. This is the same as integer division x by 2**y. •x & y : "bitwise AND". Each bit of the output is 1 if the corresponding bit of x AND of y is 1, otherwise it's 0. •x | y : "bitwise OR". Each bit of the output is 0 if the corresponding bit of x AND of y is 0, otherwise it's 1. •~ x : Returns the complement of x - the number you get by switching each 1 for a 0 and each 0 for a 1. This is the same as (-X + 1). •x ^ y : "bitwise exclusive or". Each bit of the output is the same as the corresponding bit in x if that bit in y is 0, and it's the complement of the bit in x if that bit in y is 1.
class variable
•− A variable that is shared by all instances of a class. Class variables are defined within a class but outside any of the class's methods. Class variables are not used as frequently as instance variables are.
Naming variables
◦A variable must begin with a letter or underscore. ◦A variable name can't contain spaces, punctuation, or special character other than the underscore. ◦A variable name can't begin with a number, but can use numbers later in the name. ◦A variable name can't be the same as a keyword that's reserved by Python. underscore notation: variable_name camel case: variableName
count plot(bar plot)
◦Bar plots are very similar to histograms, but instead of binning values to produce a distribution, bar plots can be used to count discrete variables Bivariate: ◦Bar Plots can also be used to show multiple variables. ◦By default, barplot will calculate a mean, but you can pass any function into the estimator parameter.
dataframe
◦DataFrame can be thought of as a dictionary of Series objects. ◦Dictionaries are the most common way of creating a DataFrame. ◦The order is not guaranteed because Python dictionaries are not ordered. ◦If we want an ordered dictionary, we need to use the OrderedDict from the collections module. scientists = pd.DataFrame({ 'Name': ['Rosaline Franklin', 'William Gosset'], 'Occupation': ['Chemist', 'Statistician'], 'Born': ['1920-07-25', '1876-06-13'], 'Died': ['1958-04-16', '1937-10-16'], 'Age': [37, 61]}) print(scientists)
Syntax for calling any function
◦Function_name ([arguments])
pickle
◦How to import the pickle module import pickle ◦How to write an object to a binary file with open("movies.bin", "wb") as file: # write binary pickle.dump(movies, file)
2 ways to continue 1 stmt over 2 or more lines
◦Implicit continuation print("Total Score: " + str(score_total) + "\nAverage Score: " + str(average_score) ) ◦Explicit continuation print("Total Score: " + str(score_total) \ + "\nAverage Score: " str(average_score)) ◦With implicit continuation, you can divide statements after parentheses, brackets and braces, and before or after operators like plus or minus signs. With explicit continuation, you can use the \ character to divide statements anywhere in a line
Python Origins
◦Invented in early 90's by Guido van Rossum in the Netherlands ◦Named after Monty Python (Flying Circus) ◦Open-source, general purpose, high-level, multi-paradigm programming language ◦Fully supports object-oriented programming (OOP) as well as structured programming ◦Highly extensible language •In this course we will be using CPython version 3.7.0.
series
◦Pandas Series is a one-dimensional container, similar to the built-in Python list. ◦It is the data type that represents each column of the DataFrame. Each column in a dataframe must be of the same dtype. The easiest way to create a Series is to pass in a Python list s = pd.Series(['Wes', 'Creator'], index=['Person', 'Who']
Why python is a great 1st language
◦Python has a simple syntax that's easier to read and use than most other languages. ◦Python has most of the features of traditional programming languages. As a result, you can use Python to learn concepts and skills that apply to those languages too. ◦Python supports the development of a wide range of programs, including games , web applications, and system administration. ◦Python is used by many successful companies, including Google, IBM, Disney, and EA Games. As a result, knowing Python is a valuable skill. ◦Python is open source. There are many advantages to being open source.
Indentation coding rules
◦Python relies on proper indentation. Incorrect indentation causes an error. ◦The standard indentation is four spaces.
How python compiles and runs code
◦Step 1 The Programmer uses a text editor or IDE to enter and edit the source code. Then, the programmer saves the source code to a file with a .py extension. ◦Step 2 The source code is compiled by the Python interpreter into bytecode. ◦Step 3 The bytecode is translated by the Python virtual machine into instructions that can interact with the operating systems of the computer.
pandas: data
◦The quintessential example for creating visualizations of data is Anscombe's quartet. This data set was created by English statistician Frank Anscombe to show the importance of statistical graphs. ◦The Anscombe data set contains four sets of data, each of which contains two continuous variables ◦Each set has the same mean, variance, correlation, and regression line.
how to use with stmts to open and close files
◦The syntax of the with statement for file I/O with open(file, mode) as file_object: statements ◦Code that opens a text file in write mode and automatically closes it with open("test.txt", "w") as outfile: outfile.write("Test") ◦Code that opens a text file in read mode and automatically closes it with open("test.txt", "r") as infile: print(infile.readline())
how to code literal values
◦To code literal values for a string, enclose the characters of the string in single or double quotation marks. This is called a string literal. ◦To code a literal value for a number , code the number without quotation marks. This is called a numeric literal.
How to use shell
◦To test a statement, type it at the prompt and press the Enter key. You can also type the name of a variable the prompt to see what its value is. ◦Any variable that you create remain active for the current session. As a result, you can use them in statements that you enter later in the same session. ◦To retype your previous entry, press Alt+p (Windows) or Command+p (OS X). ◦To cycle through all of the previous entries, continue pressing the Alt+p (Windows) or Command+p (OS X) keystroke until the entry you want is displayed at the prompt.
UML diagramming
◦UML (Unified Modeling Language) is the industry standard used to describe the classes and objects of an object-oriented application. ◦A UML class diagram describes the attributes and methods of one or more classes.
Comment guidelines
◦Use comments to describe portions of code that are hard to understand, but don't overdo them. ◦Use comments to comment out (our disable) statements that you don't want to test. ◦If you change the code that's described by comments, change the comments too.
How disk storage and main memory work together
◦When you start the computer, it loads the operating system into main memory. Then, you use the features of the operating system to start an application. ◦When you start an application, the operating system loads it into main memory. Then, it runs the application. ◦As the application runs, it may read data from main memory to disk storage.
append(), insert(), remove()
◦append(item) ◦insert(index, item) ◦remove(item) ◦stats.append(99.5) # [48.0, 30.5, 20.2, 100.0, 99.5] ◦ ◦inventory.insert(3, "robe") # ["staff", "hat", "shoes", # "robe", "bread", "potion"] ◦inventory.remove("shoes") #["staff", "hat", #"bread","potion"]
check whether an item is in a list
◦inventory = ["staff", "hat", "bread", "potion"] ◦ ◦item = "bread" ◦if item in inventory: ◦inventory.remove(item) #["staff", "hat", "potion"]
seaborn
◦seaborn builds on matplotlib by providing a higher-level interface for statistical graphics. ◦It provides an interface to produce prettier and more complex visualizations with fewer lines of code. ◦The seaborn library is tightly integrated with Pandas and the rest of the PyData stack (numpy, scipy, statsmodels), making visualizations from any part of the data analysis process a breeze.