Python -Final
Steps in Python for database interface:
- Connect to the database and obtain handle - Execute commands using the handle - Close the database using the handle
SQLite DBMS
- It is a common database. Most used database engine. - The file format is stable, cross-platform, and backwards compatible. Open source GNU license. - It is built into Python, in mobile phones and most computers. Firefox uses SQLite.
['2', '19', '42']
>>> import re >>> x = 'My 2 favorite numbers are 19 and 42' >>> y = re.findall('[0-9]+',x) >>> print(y)
['2', '19', '42'] []
>>> import re >>> x = 'My 2 favorite numbers are 19 and 42' >>> y = re.findall('[0-9]+',x) >>> print(y) >>> y = re.findall('[AEIOU]+',x) >>> print(y)
['$10.00']
>>> import re >>> x = 'We just received $10.00 for cookies.' >>> y = re.findall('\$[0-9.]+',x) >>> print(y)
set
A __________ is an unordered collection of immutable data types but the __________ itself is mutable -are iterable, mutable and have no duplicate elements -can have multiple types (integers, strings, tuples, etc.) -cannot have duplicates -Elements in a set cannot be referenced individually and are immutable
cursor
A ____________is like a file handle. Calling ___________ is similar to a file open() command
immutable
Are sets mutable immutable?
immutable
Are tuples mutable or immutable, and list?
"import re"
Before you can use regular expressions in your program, you must import the library using ____________
Oracle, MySQL, Microsoft SQL Server, SQLite
Common Relational DBMS
HTML is a Language while HTTP is a Protocol. ... On the contrary, HTTP (Hypertext Transfer Protocol) is a protocol for transferring the hypertext pages from Web Server to Web Browser. For exchanging web pages between Server and Browser, an HTTP session is setup using protocol methods (e.g. GET, POST etc.)
Difference between HTTP vs HTML
Hypertext Markup Language
HTML
The HyperTextTransfer Protocol is the set of rules to allow browsers to retrieve web documents from servers over the Internet
HTTP
• Assumption: Database and a Table exists • Create the data in a list() • Construct a string containing the SQL command • Call the execute() method with the appropriate variables
How do you write a a database from a csv file?
re.findall()
If we actually want the matching strings to be extracted, we use _______________
Parentheses
In Fine-Tuning String Extraction ______________ are not part of the match - but they tell where to start and stop what string to extract >>> x='From [email protected] Sat Jan 5 09:14:16 2008' >>> y = re.findall('\S+@\S+',x) >>> print(y) >>> y = re.findall('^From (\S+@\S+)',x) >>> print(y)
commit() changes to the database, otherwise the delete will not be persistent
In order to make sure you delete changes to your database you need to use __________.
yes
List can be sliced, reassigned, and concatenated, can tuples?
^ $ . \S \S * *? + +? [aeiou] [^XYZ] [a-z0-9] ( )
Matches the beginning of a line Matches the end of the line Matches any character Matches whitespace Matches any non-whitespace character Repeats a character zero or more times Repeats a character zero or more times (non-greedy) Repeats a character one or more times Repeats a character one or more times (non-greedy) Matches a single character in the listed set Matches a single character not in the listed set The set of characters can include a range Indicates where string extraction is to start Indicates where string extraction is to end
add()
The __________ method adds an item to an existing set The addition is only accomplished if the item does not exist in the set
update()
The __________ method adds multiple items to an existing set • update takes an iterable and makes a unions of the set with that iterable. • Multiple datatypes can be added • Duplicate items are ignored
union()
The __________ method gives the union of all elements in the two sets -The symbol "|" can be used to form the union of sets
pop()
The __________ method removes a random element from the set (since sets are unordered) -method returns the removed element
discard()
The ____________ method removes the element if it exists and does not give an error if the element cannot be found
intersection()
The _____________ method gives the common elements in the two sets • The original sets are not modified • The symbol "&" can be used to form the intersection of sets
difference()
The ______________ method gives the unique elements in the first set as compared to the second set • The original sets are not modified • The symbol "-" can be used to form the union of sets
remove()
The ______________ method removes the element if it exists or gives an error if it does not
symmetric_difference()
The __________________ method gives the unique elements in the first set and the second set • The original sets are not modified • The symbol "^" can be used to form the _________________________ of two sets
conn.commit()
The changes are committed to the database using __________
Greedy Matching
The repeat characters (* and +) push outward in both directions (greedy) to match the largest possible string
Non-Greedy Matching
The repeat characters (* and +) push outward in both directions (greedy) to match the largest possible string
A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it.
URL
multiple
We can return single or multiple values from a function using a tuple datatype
hand = open('mbox.txt') for line in hand: line = line.rstrip() if line.startswith('From:') : print(line) #or import re hand = open('mbox.txt') for line in hand: line = line.rstrip() if re.search('^From:', line) : print(line)
We fine-tune what is matched by adding special characters to the string
• There is some controversy about web page scraping and some sites are a bit strict about it. • Read the Terms of Service before scraping data for commercial purposes. • Republishing copyrighted information is not allowed • Yelp Terms of Service extract:
What are the Don'ts of Web Scraping
It will create a new one
What happens if the file you are trying to create in DB4S does not exist?
•When a program or script pretends to be a browser and retrieves web pages, looks at those web pages, extracts information, and then looks at more web pages • Search engines scrape web pages - we call this "spidering the web" or "web crawling"
What is Web Scraping?
•A set of rules that all parties follow so we can predict each other's behavior •And not bump into each other-In the US, drive on the right-hand side of the road-In the UK, drive on the left-hand side of the road
What is a Protocol?
Delete a Table
What is the code doing? import sqlite3 conn = sqlite3.connect('music.sqlite') cur = conn.cursor() cur.execute('DELETE FROM Tracks WHERE Plays < 100') conn.commit() cur.close()
Add rows to a Table
What is the code doing? import sqlite3 conn = sqlite3.connect('music.sqlite') cur = conn.cursor() cur.execute('INSERT INTO Tracks (Title, Plays) VALUES (?,?)', ('Thunderstruck', 20)) conn.commit() cur.close()
Display data from a Table
What is the code doing? import sqlite3 conn = sqlite3.connect('music.sqlite') cur = conn.cursor() print("Tracks: ") cur.execute('SELECT Title, Plays FROM Tracks') for row in cur: print(row) cur.close()
['From: Using the :'] Greedy Matching
What is the output and what is this an example of? >>> import re >>> x = 'From: Using the : character' >>> y = re.findall('^F.+:', x) >>> print(y)
['From:'] Non-Greedy Matching
What is the output and what is this an example of? >>> import re >>> x = 'From: Using the : character' >>> y = re.findall('^F.+?:', x) >>> print(y)
Create a Database file and a Table with 2 columns in Python
What is this code doing? import sqlite3 conn = sqlite3.connect('music.sqlite') cur = conn.cursor() cur.execute('DROP TABLE IF EXISTS Tracks') cur.execute('CREATE TABLE Tracks (Title TEXT, Plays INTEGER)') conn.close()
['[email protected]'] ['[email protected]']
What's the output? >>> x='From [email protected] Sat Jan 5 09:14:16 2008' >>> y = re.findall('\S+@\S+',x) >>> print(y) >>> y = re.findall('^From (\S+@\S+)',x) >>> print(y)
tuple
When insert values in a table for SQLlite DB4S. The values of the row are passed as a
re.findall()
When we use ____________, it returns a list of zero or more sub-strings that match the regular expression
• Pull data - particularly social data - who links to who? • Get your own data back out of some system that has no "export capability" • Monitor a site for new information • Spider the web to make a database for a search engine
Why Scrape Data?
re.findall()
You can refine the match for _______________ and separately determine which portion of the match is to be extracted by using parentheses
re.search()
You can use ___________ to see if a string matches a regular expression
re.findall()
You can use ____________ to extract portions of a string that match your regular expression
Tuples
________ are used for heterogeneous sequence of fixed length
Lists
__________ are used for a homogeneous sequence of variable length
re.search()
______________returns a True/False depending on whether the string matches the regular expression
('Pepsi', 'Coke', '7-Up') ('Water', 'Coke', '7-Up')
a = ("Pepsi", "Coke", "7-Up") print(a) a=("Water", "Coke", "7-Up") print(a)
('Pepsi', 'Coke', '7-Up') TypeError: 'tuple' object does not support item assignment
a = ("Pepsi", "Coke", "7-Up") print(a) a[0]="Water" print(a)
2182131317312 2182131821696
a = ("Pepsi", "Coke", "7-Up") print(id(a)) a=("Water", "Coke", "7-Up") print(id(a))
['Pepsi', 'Coke', '7-Up'] ['Water', 'Coke', '7-Up']
a = ["Pepsi", "Coke", "7-Up"] print(a) a[0]="Water" print(a)
2182131303872 2182131303872
a = ["Pepsi", "Coke", "7-Up"] print(id(a)) a[0]="Water" print(id(a))
Row (or Tuple)
a set of fields that generally represents an "object" like a person or a music track
[(701, 'S.Oak', 'Arlington', 'TX')] [(701, 'S.Oak', 'Arlington', 'TX'), (211, 'S.Cooper', 'Arlington', 'TX')]
addr_list=list() addr_list.append((701,'S.Oak','Arlington','TX')) print(addr_list) addr_list.append((211,'S.Cooper','Arlington','TX')) print(addr_list)
Database
contains many tables
Table (or Relation)
contains rows and columns
CREATE TABLE
creates a new table with the specified fields
(1, 2, 3) <class 'tuple'>
def myoutput(i): if i==0: return(1,2,3) else: return('a','b','c') print(myoutput(0)) print(type(myoutput(1)))
DROP TABLE
deletes the table and all its contents
tuple
is a sequence of values • Any value can be stored in a tuple • Values are indexed by integers like lists • Comma separated list of values
execute()
method is used to execute commands in the Database using SQL
3
mylist=[frozenset({'a','b'}), frozenset({'c','d'})] mydict1=dict() count=1 for item in mylist: mydict1[item]=count count=count+1 print(count)
frozenset({'d', 'c'})
mylist=[frozenset({'a','b'}), frozenset({'c','d'})] mydict1=dict() count=1 for item in mylist: mydict1[item]=count count=count+1 print(item)
{frozenset({'a', 'b'}): 1, frozenset({'d', 'c'}): 2}
mylist=[frozenset({'a','b'}), frozenset({'c','d'})] mydict1=dict() count=1 for item in mylist: mydict1[item]=count count=count+1 print(mydict1)
<class 'set'> <class 'set'> {1, (1, 2), 'hello'} <class 'dict'> <class 'set'> {1, 2, 3, 4}
myset = {"a", "b", "c"} print(type(myset)) myset = set(["a", "b","c"]) print(type(myset)) myset = {1,'hello',(1,2)} print(myset) mynewset={} print(type(mynewset)) myrealset=set() print(type(myrealset)) myset = {1,2,3,4,3,2} print(myset)
{'a', 'b'} {'a', 'c', 'b'} {'d', 'c', 'e'} {'a', 'b'}
myset1={'a', 'b', 'c'} myset2={'c','d','e'} print(myset1.difference(myset2)) print(myset1, myset2) print(myset1-myset2)
{'c'} {'a', 'c', 'b'} {'d', 'c', 'e'} {'c'}
myset1={'a', 'b', 'c'} myset2={'c','d','e'} print(myset1.intersection(myset2)) print(myset1, myset2) print(myset1&myset2)
{'d', 'b', 'a', 'e'} {'a', 'c', 'b'} {'d', 'c', 'e'} {'d', 'b', 'a', 'e'}
myset1={'a', 'b', 'c'} myset2={'c','d','e'} print(myset1.symmetric_difference(myset2)) print(myset1, myset2) print(myset1^myset2)
{'d', 'a', 'e', 'c', 'b'} {'a', 'c', 'b'} {'d', 'e'} {'d', 'a', 'e', 'c', 'b'}
myset1={'a', 'b', 'c'} myset2={'d','e'} print(myset1.union(myset2)) print(myset1) print(myset2) print(myset1|myset2)
{'a', 'c', 'b'} frozenset({'a', 'c', 'b'})
myset1={'a','b','c'} print(myset1) myfrozenset1=frozenset(myset1) print(myfrozenset1)
True False False True
myset1={'a','b'} myset2={'a','b','c','d'} print(myset1<myset2) print(myset1>myset2) print(myset2<myset1) print(myset2>myset1)
{'d', 'b', 'a', 'e', 'c'} {'d', 'b', 'a', 'c'} {'d', 'b', 'a', 'c'}
myset={'a', 'b', 'c','d','e'} print(myset) myset.discard('e') print(myset) myset.discard('e') print(myset)
{'d', 'b', 'a', 'e', 'c'} {'d', 'b', 'a', 'c'} KeyError: 'e'
myset={'a', 'b', 'c','d','e'} print(myset) myset.remove('e') print(myset) myset.remove('e') print(myset)
{'d', 'b', 'a', 'e', 'c'} {'b', 'a', 'e', 'c'} d {'a', 'e', 'c'} b
myset={'a', 'b', 'c','d','e'} print(myset) x=myset.pop() print(myset) print(x) y=myset.pop() print(myset) print(y)
{'d', 'c', 'b', 'a', 'y', 'e', 'x'} {'h', 'b', 'o', 'a', 'e', 'l', 'c'} {'a', 'c', 'b'} {'world', 'b', 'a', 'hello', 'c'} {('hello', 'world'), 'a', 'c', 'b'}
myset={'a', 'b', 'c'} myset.update('dx','ey') print(myset) myset={'a', 'b', 'c'} myset.update('hello') print(myset) myset={'a', 'b', 'c'} print(myset) myset.update(['hello','world']) print(myset) myset={'a', 'b', 'c'} myset.update([('hello','world')]) print(myset)
{'a', 'c', 'b'} {'d', 'a', 'c', 'b'} {'d', 'a', 'c', 'b'}
myset={'a', 'b', 'c'} print(myset) myset.add('d') print(myset) myset.add('d') print(myset)
{'a', 'c', 'b'} {'d', 'b', 'a', 'e', 'c'} {'d', 1, 2, 3, 'b', 'a', 'e', 'c'} TypeError: 'int' object is not iterable
myset={'a', 'b', 'c'} print(myset) myset.update('d','e') print(myset) myset.update([1,2,3]) print(myset) myset.update(5,6) myset.update([5,6],{7,8},(9,10)) print(myset)
Column (or Attribute or Field)
one of possibly many elements of data corresponding to the object represented by the row
(1, 2, 3, 1, 2, 3) (1, 2, 3, 1)
t=(1,2,3) newtuple=t+t print(newtuple) newtuple=t+t[0:1] print(newtuple)
(1,) (1, 2) (1, 2, 3)
t=(1,2,3) print(t[0:1]) print(t[:2]) print(t[:])
('h', 'e', 'l', 'l', 'o') (1, 2, 3) (1, 2, 3)
t=tuple('hello') print(t) t=tuple([1,2,3]) print(t) t=tuple((1,2,3)) print(t)
Database Browser for SQLite (DB4S)
• DB4S is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite. • Uses a familiar spreadsheet-like interface • Does not require familiarity with SQL commands, simple point and click interface
frozenset "freezes" "freeze"
• Note that a set is mutable • The ____________ method ___________ a set and makes it immutable -A set cannot be used as a dictionary key (since it is mutable) -One can ________ a set and use it as a dictionary key
< and <= > and >=
• The __________ operators test if one set is a subset of the other set • The __________ operators test if one set is a superset of the other set
Select
• The ______________ command is used to retrieve rows from a Table • After the ___________ command is executed the cursor is like a file handle that we can loop through to retrieve the values
Python Interface for SQL
• Using the Structured Query Language (SQL), all the DB4S commands and more can be executed • Python provides an interface into a SQLite database file through SQL commands