Files
Different modes of opening a file
'r' read-only: ● If file exists --> open the file; if file does not exist --> Error N.B 'r' is the default mode if none specified 'w' write-only ● If file exists --> clears the file contents ; if file does not exist --> Creates and opens a new file 'a' write-only ● If file exists --> file contents unchanged and new data appended at end of file; if file does not exist --> Creates and opens a new file 'r+' read and write ● If file exists --> reads and overwrites from beginning of file; if file does not exist --> Error 'w+' read and write ● If file exists --> clears the file contents ; if file does not exist --> Creates and opens a new file 'a+' read and write ● If file exists --> file contents unchanged and read and write at end of file; if file does not exist --> Creates and opens a new file
file path
A path to a file is a path through the hierarchy to the node that contains a file /bill/python/code/myCode.py ● path is from the root node /, to the bill directory, to the python directory, to the code directory where the file myCode.py resides
Two types of files
Files come in two general types: ● text files. Files where control characters such as "\n" are translated. These are generally human readable ● binary files. All the information is taken directly without translation. Not human readable. We won't be using binary files in this subject.
different 'paths' for different OSs
It turns out that each OS has its own way of specifying a path: ● Windows C:\bill\python\myFile.py ● Unix /Users/bill/python/myFile.py ● Python knows that and translates to the appropriate OS
writing to a file
Once you have created a file object, opened for reading, you can use the print command to write to it by adding the keyword argument like: file=file_object # open file for writing # creates file if it does not exist # overwrites file if it exists temp_file = open("temp.txt", "w") # function print("first line", file=temp_file) print("second line", file=temp_file) temp_file.close() # method
Open function
Syntax: file object = open(file_name [, access_mode][, buffering]) ● file_name: The file_name argument is a string value that contains the name of the file that you want to access. ● access_mode: The access_mode determines the mode in which the file has to be opened, i.e., read, write, append, etc. A complete list of possible values is given below in the table. This is optional parameter and the default file access mode is read (r). ● buffering: If the buffering value is set to 0, no buffering takes place. If the buffering value is 1, line buffering is performed while accessing a file. If you specify the buffering value as an integer greater than 1, then buffering action is performed with the indicated buffer size. If negative, the buffer size is the system default(default behavior).
close method
When the program is finished with a file, we close the file: ● flush the buffer contents from the computer to the file ● tear down the connection to the file ● close is a method of a file object: file_object.close() ● All files should be closed!
os.path names
assume p = '/Users/bill/python/myFile.py' ● os.path.basename(p) returns 'myFile.py' ● os.path.dirname(p) returns '/Users/bill/python' ● os.path.split(p) returns ['Users/bill/python','myFile.py'] ● os.path.splitext(p) returns '/Users/bill/python/myFile', '.py' ● s.path.join(os.path.split(p)[0], 'other.py') returns '/Users/bill/python/other.py'
Making a file object
in_file = open("my_file.txt", "r") ● in_file is the file object. It contains the buffer of information. ● The open function creates the connection between the disk file and the file object. The first quoted string is the file name on disk, the second is the mode to open it (here, "r" means to read)
csv writer
much the same, except: ● the opened file must have write enabled ● the method is writerow, and it takes a list of strings to be written as a row
with statement
open and close occur in pairs (or should) so Python provides a shortcut, the with statement: ● creates a context that includes an exit which is invoked automatically ● for files, the exit is to close the file ● Syntax: with expression as variable: suite Example: >>> with open(filename) as f: f.readlines( ) ● File is closed automatically when the suite ends
CSV, basic sharing
● A basic approach to share spreadsheet data is the comma separated value (CSV) format: --> it is a text format, accessible to all apps --> each line (even if blank) is a row --> in each row, each value is separated from the others by a comma (even if it is blank) --> cannot capture complex things like formulas
What is a file?
● A file is a collection of data that is stored on secondary storage like a disk or a thumb drive ● accessing a file means establishing a connection between the file and the program and moving data between the two ● Without file I/O, we can't store data persistently - our programs will 'forget' everything when they finish.
csv module
● As simple as that sounds, even CSV format is not completely universal --> different apps have small variations ● Python provides a module to deal with these variations called the csv module ● This module allows you to read spreadsheet info into your program ● csv module creates a csv file from a spreadsheet
Directory tree
● Directories can be organised in a hierarchy, with the root directory and subsequent branch and leaf directories ● Each directory can hold files or other directories ● This allows for sub and super directories
What we already know
● Files are bytes on disk. ● Two types: text and binary (we are working with text) ● open creates a connection between the disk contents and the program ● different modes of opening a file, 'r', 'w', 'a' ● files might have different encodings (default is utf_8)
Text files use strings
● Remember that everything in a text file is a string ● everything you read is a string (including numbers) ● if you write to a file, you can only write a string
Two special directory names
● The directory name '.' is shortcut for the name of the current directory you are in as you traverse the directory tree ● The directory name '..' is a shortcut for the name of the parent directory of the current directory you are in --> up one level in tree
Utility to find strings in files
● The main point of this function is to look through all the files in a directory structure and see if a particular string exists in any of those files ● Pretty useful for mining a set of files ● Allows you to look thru all of the text files in a directory and find files that match the text you're looking for
What is the os module?
● The os module in Python is an interface between the operating system and the Python language. ● As such, it has many sub-functionalities dealing with various aspects. ● We will look mostly at the file related stuff ● os module uses 'wrappers' for OS commands --> for functions at command line or DOS prompt level
Where is the disk file?
● When opened, the name of the file can come in one of two forms: ● "file.txt" assumes the file name is file.txt and it is located in the current program directory ● "c:\bill\file.txt" or "/Users/bill/file.txt" are fully qualified file names including the directory information
File objects or stream
● When opening a file, you create a file object or file stream that is a connection between the file information on disk and the program. ● The stream contains a buffer of the information from the file, and provides the information to the program
What is a directory/folder?
● Whether in Windows, Linux or on OS X, all OSes maintain a directory structure. ● A directory is a container of files or other directories ● These directories are arranged in a hierarchy or tree directories == folders
more of what we know
● all access, reading or writing, to a text file is by the use of strings ● we can read and write using various file methods ● iteration via a for loop gathers info from a file opened for reading one line at a time ● we write to a file opened for reading using the print function with an argument file=...
the path module
● allows you to gather some info on a path's existence: ● os.path.isfile(path_str) - is this a path to an existing file? (T/F) ● os.path.isdir(path_str) - is this a path to an existing directory? (T/F) ● os.path.exists(path_str) - does the path (either file or directory) exist? (T/F)
csv reader
● import the csv module ● open the file as normal, creating a file object. ● create an instance of a csv reader, used to iterate through the file just opened --> you provide the file object as an argument to the constructor ● iterating with the reader object yields a row as a list of strings ● Universal new line is working by default --> needed for this worksheet ● A blank line in the CSV shows up as an empty list ● empty column shows up as an empty string in the list
More ways to read
● my_file.read() ● Reads the entire contents of the file as a string and returns it. It can take an optional argument integer to limit the read to N bytes, that is my_file.read(N) ● my_file.readline() ● Delivers the next line as a string . ● my_file.readlines() # note plural ● Returns a single list of all the lines from the file
Some os commands
● os.getcwd() - Returns the full path of the current working directory ● os.chdir(path) - Change the current directory to the path provided ● os.listdir(path) - Return a list of the files and directories in the path (including '.') ● os.rename(source_path, dest_path) - Renames a file or directory ● os.mkdir(path) - make a new directory. ● os.mkdir('/Users/bill/python/new') - creates the directory new under the directory python. ● os.remove(path) - Removes the file ● os.rmdir(path) - Removes the directory, but the directory must be empty
the walk module
● os.walk(path) - Starts at the path directory. It yields three values: --> dir_name, name of the current directory --> dir_list, list of subdirectories in the directory --> files, list of files in the directory ● If you iterate through, walk will visit every directory in the tree. Default is top down --> every file in every folder