Working with strings in Python

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Strings as a Sequence

'Hello World' is a sequence of 11 characters—remember that a space is a character. Because a sequence has an order, we can number the characters by their position in the sequence, the 'index' of the character within the string. In Python, and other languages as well, the first index (the first position) in a sequence is index 0.

Indexing and Slicing

A 'slice' is a sub-sequence of the string selected with the proper indices N.B. - a slice returns a new string and does not change the original string ● uses the syntax [start : finish], where: ● start is the index of where we start the sub-sequence ● finish is the index of one after where we end the sub-sequence ● if either start or finish are not provided, it defaults to the beginning of the sequence for start and the end of the sequence for finish To index a sub-sequence, you indicate a range of indices within the square bracket by providing a pair of indices separated by a colon (:). Example: >>> hello_str = "Hello World" >>> hello_str[6:11] 'World' >>> hello_str[6:] # no ending value defaults to the end of string 'World' >>> hello_str[:5] # no s t a r t value defaults to beginning of string, i.e index = 0 'Hello' >>> hello str[0:5] 'Hello' >>> hello_str[-1] # negative index works back from the end 'd' >>> hello_str[3:-2] 'lo Wor'

Collection

A 'string type' is a special kind of collection; collection is a group of Python objects that can be treated as a single object.

Chaining of Methods

A powerful feature of the Python language is that methods and functions can be chained, meaning there are a series of "dot notation" invocations, such as 'A string'.upper().find('S'). The calls are chained in the sense that an object returned from one method can be used as the calling object in another method. The rule for the order of invocation is to proceed from left to right, using the resulting object of the previous method as the calling object in the next method. Example: >>> my_str = 'Python rules!' >>> my_str.upper() 'PYTHON RULES!' >>> my str.upper().find('O') # convert to uppercase and then ' find ' 4

Sequence

A sequence type has its collection of objects organized in some order — a sequence of objects.

Operator overloading

A single operator can perform multiple tasks depending on the types of its operands. ● what does a + b mean? ● what operation does the above represent? It depends on the types! ● two strings => concatenation ● two integers => addition the + operator is overloaded. The operation that + performs depends on the types it is working on N.B. - If you give a Python operator a combination of types it does not have an operation for, it generates an error. Example: digits = "0123456789" sum = 12 print(digits + sum) Traceback (most recent call last): File "G:/CP1404 Programming I/Sandbox/junk.py", line 7, in <module> print(digits + sum) TypeError: Can't convert 'int' object to str implicitly

Comparison Operators - singlecharacter strings

All comparisons between two single characters are done on the basis of their UTF-8 integer mapping. 1. The equality operator == 'a' == 'a'. If the two single characters are the same, the expression returns True. Note that the expression 'a' == 'A' returns False as those are indeed two different strings. 2. The greater than (>) or less than (<) operators print(ord("A")) print(ord("a")) print('a' > 'A') --> 65 --> 97 --> True >>>'a' > 'a' False >>>'a' > 'A' True

Comparing Strings with More than One Character

Although the process is slightly more complicated, though still based on the concept of a character's UTF-8 number. 1. Start at index 0, the beginning of both strings. 2. Compare the two single characters at the present index of each each string. If the two characters are equal, increase the present index of both strings by 1 and go back to the beginning of step 2. If the two characters are not equal, return the result of comparing those two characters as the result of the string comparison. 3. If both strings are equal up to some point but one is shorter than the other, then the longer string is always greater. For example, 'ab' < 'abc' returns True. Examples: >>> 'abc' < 'cde' # different at index 0, 'a ' < ' c ' True >>> 'abc' < 'abd' # different at index 2, ' c ' < 'd ' True >>> 'abc' < 'abcd' # ' abc ' equal up to 'd ' but shorter than ' abcd ' True >>> '' < 'a' # the empty string' s length is 0, always smaller True The empty string (") is always less than any other string, because it is the only string of length 0.

Non-printing (escape) characters

If inserted directly, are preceded by a backslash (the \ character): ● new line '\n' ● tab '\t'

Copy Slice

If the programmer provides neither a beginning nor an end—that is, there is only a colon character in the square brackets([:])—a complete copy of the string is made. Example: >>> name one = 'Monty' >>> name two = name one[:] >>> name two 'Monty' >>> N.B. - Remember, a new string is yielded as the result of a slice; the original string is not modified. Thus a copy slice is indeed a new copy of the original string.

Optional Arguments with Methods

Some methods have additional optional arguments. If the argument is not provided, a default for that argument is assumed. The default value depends on the method. However, you can choose to provide that argument and override the default. The 'find' method is one with default arguments. You can start the find process from an index other than 0, the leftmost index. By default, find starts at index 0, but if you provide a second argument, that is the index where the find process begins. Example: >>>a_str = 'He had the bat.' >>>a_str.find('t') 7 >>>a_str.find('t', 8) 13 The 'find' method also has a third optional argument, the index where searching stops. The default is the end of the string, but if you provide the third argument, find will stop its search at that index. Example: >>>a_str = 'He had the bat.' >>>a str.find('t', 1, 6) #searches for 't' in the index range 1-6, which returns a -1 value (it is not found in the given range). -1

STRING OPERATIONS - Concatenation (+) and Repetition (*)

The + and the * operators can be used with string objects. However, their meanings are not what you are used to with integers and floats: + : concatenate. The operator + requires two string objects and creates a new string object. The new string object is formed by concatenating copies of the two string objects together: the first string joined at its end to the beginning of the second string. * : repeat. The * takes a string object and an integer and creates a new string object. The new string object has as many copies of the string as is indicated by the integer. Examples: >>> my_str = "Hello" >>> your_str = "World" >>> my_ str + your_str # concatenation 'HelloWorld' >>> your_str + my_str # order does matter in concatenation >>> 'WorldHello' >>> my_str + ' ' + your_str # add a space between 'Hello World' >>> my_str * 3 # r e p l i c a t i o n 'HelloHelloHello' >>> 3 * my str # order does not matter in replication 'HelloHelloHello' >>> (my_str + ' ')*3 # parentheses force ordering 'Hello Hello Hello ' >>> my_str + ' ' * 3 # without parentheses : repeats 3 spaces 'Hello ' >>> my_str 'Hello' >>> your_str # original s trings unchanged 'World' >>> 'hello' + 3 # wrong types for concatenation , requires two strings Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: cannot concatenate 'str' and 'int' objects >>> 'hello' * 'hello' # wrong types for replication : requires string and int Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't multiply sequence by non-int of type 'str' >>>

Indexing operator [ ]

The indexing operator is represented by the square brackets operator [ ]. Example: >>> hello str = 'Hello World' >>> hello str[0] # counting starts at zero 'H' >>> hello str[5] # space is a character ' ' >>> hello str[-1] # negative index works back from the end 'd' >>> hello str[10] 'd' >>> hello str[11] # error as index is out of range Traceback (most recent call last): File "<pyshell#19>", line 1, in <module> hello str[11] IndexError: string index out of range N.B. - a single index does not change the original string in any way. Strings are immutable objects in Python

'string type'

The string type is one of the many collection types provided by Python.

String Collections Are Immutable

This means that once the string is created, usually by assignment, its contents cannot be modified. Example: >>> my str = 'Hello' >>> my str[0] = 'J' # change 'H' to ' J ' , make the string ' J e l l o ' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment You cannot change one once you make it: >>> word = 'spam' >>> word [1] = 'l' ERROR However, you can use it to make another string (copy it, slice it, etc.) >>> new_word = word[:1] + 'l' + word[2:] >>> word → 'spam' >>> new_word → 'slam' By definition, strings cannot be changed; they are immutable. As a result, all Python string operators must generate a new string. Once you create a string, you cannot change it. You must create a new string to reflect any changes you desire. Example: >>> my_str = 'Hello' >>> my_str = 'J' + my_str[1:] # create new string with ' J ' and a slice >>> my_str # my_str is now associated with the new string 'Jello'

Floating-Point Precision Descriptor

When printing floating-point values, it is desirable to control the number of digits to the right of the decimal point—that is, the precision. Precision is specified in the format descriptor using a decimal point followed by an integer to specify the precision. Example: >>> import math >>> print(math.pi) # unformatted print ing 3.141592653589793 >>> print("Pi is {:.4f}".format(math.pi)) # floating−point precision 4 Pi is 3.1416 >>> print("Pi is {:8.4f}".format(math.pi)) # specify both precision and width Pi is 3.1416 >>> print("Pi is {:8.2f}".format(math.pi)) Pi is 3.14 There is a % floating point descriptor that converts from a decimal to a percent, including the insertion of the % character. Example: >>> 2/3 0.6666666666666666 >>> print("{:8.2%}".format(2/3)) 66.67%

Nesting of Methods

You can also use method and function invocations as arguments to another method call. The rule for nested calls is that all invocations inside parentheses, such as those found in a function invocation, are done first. Example: >>> a_str = 'He had the bat.' >>> a_str.find('t') # look for ' t ' starting at beginning 7 >>> a_str.find('t',8) # s t a r t at index 8 = 7 + 1 13 >>> a_str.find('t', a_str.find('t')+1) # start at one after the first ' t ' 13

Determining Method Names and Method Arguments

You can use the IDE to find available methods for any type. You enter a variable of the type, followed by the '.' (dot) and then a tab. Remember, methods match with a type. Different types have different methods. If you type a method name, the IDE will remind you of the needed and optional arguments.

More string functions

my_str = 'Python rules!' ● max(my_str)--> y ● min(my_str--> " " ● ord9 (c) - Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. ord('?')--> 63 ● chr(i) - Return the string representing a character whose Unicode code point is the integer i. chr(63)--> ? ● sorted(iterable) - Return a new sorted list from the items in iterable. sorted(my_str)--> [' ', '!', 'P', 'e', 'h', 'l', 'n', 'o', 'r', 's', 't', 'u', 'y']

Descriptor Codes

s - string d - decimal integer f - floating-point decimal e - floating-point exponential % - floating-point as percent < - left > - right ∧ - centre

'str' constructor

strings are created with the constructor 'str' or by enclosing characters with a pair of quotes. my_string = "ABCD" my_num = str(12) N.B. Just as important as its creation, the type of an object determines much of what you can do with that object.

Identity operators

● 'is' and 'is not' are the identity operators in Python. ● They are used to check if two values (or variables) are located on the same part of the memory. ● Two variables that are equal does not imply that they are identical. >>>print('a' is 'a') True >>>print('a' is not 'A') True

Functions

● A function is a part of a program that performs some operation. Its details are hidden (encapsulated); we only need to know about its interface (how to use it). ● A function takes some number of inputs (arguments) and returns a value based on the arguments and the function's operation. ● Not all functions take arguments or return values: def menu(): print("Enter a or b")

String Methods

● A method is a variation on a function. It looks very similar. It has a name and it has a list of arguments in parentheses and an output. It differs, however, in the way it is invoked or called. ● The invocation is done using what is called the ('.') dot notation. ● Every method is called in conjunction with a particular object. The kinds of methods that can be used in conjunction with an object depends on the object's type (class). ● A method is a function that belongs to a class ● Every variable in Python is an instance of a class ● String objects have a set of methods suited for strings, just as integers have integer methods, and floats have float methods.

String

● Any sequence of characters: - not necessarily a word as we know it - independent of language ● characters are ORDERED ● A string is indicated between ' ' or " " Example: 'good4u2' is a legal string 'Gesundheit' is a legal string ● The exact sequence of characters is maintained ● A sequence of characters requires no underlying meaning; it is only a sequence.

The Index

● Because the elements of a string are a sequence, we can associate each element with an index, a location in the sequence: ● positive values count up from the left, beginning with index 0 ● negative values count down from the right, starting with -1

String methods - testing strings

● How could you check a filename has the right extension? file_name = input("Text file name: ") if not file_name.endswith(".txt"): print("Let me stop you right there") ● How could you check that a phone number (string) has the right area code for Queensland ('07') at the start? phone_number = "0742356478" if phone_number.startswith("07") print(True) ● Many string methods start with is... my_string.isalpha() .islower() .istitle() .isnumeric() (a great prefix for a Boolean-returning function)

The 'in' Operator

● Identity operator ● The in operator is useful for checking membership in a collection. ● The operator takes two arguments: the collection we are testing and the element we are looking for in the collection. ● BOOLEAN operator - →Returns True or False ● As it is a membership check, it returns a Boolean value to indicate whether the first argument is a member (can be found in) the second argument. ● As it applies to strings, the operator tests to see if a substring is an element of a string. Example: >>> vowels = 'aeiou' >>> 'a' in vowels True >>> 'x' in vowels False >>> 'eio' in vowels True >>> 'aiu' in vowels False >>> if 'e' in vowels: print("it's a vowel") it's a vowel

Extended Slicing

● Slicing allows a third parameter that specifies the step in the slice. ● also takes three arguments: [start:finish:countBy] ● defaults are: ● start is beginning ● finish is end ● countBy is 1 Example: # every other letter in the slice >>> hello str = "Hello World" >>> hello str[::2] 'HloWrd' # every third letter >>> hello str[::3] 'HlWl' # step backwards from the end to the beginning >>> hello str[::-1] 'dlroW olleH' # backwards , every other letter >>> hello str[::-2] 'drWolH' An interesting application of the step can be seen by using a string of digits. Different steps and different starting points yield even, odd, or reversed digits, as shown in this session: >>> digits = "0123456789" >>> digits[::2] # even digits ( default start at 0; skip every other ) '02468' >>> digits[1::2] # odd digits ( start at 1; skip every other ) '13579' >>> digits[::-1] # reverse digits '9876543210' >>> digits[::-2] # reverse odds '97531' >>> digits[-2::-2] # reverse evens ( start with 2nd last letter ) '86420'

String Representation

● Strings, like all other data, are represented in a computer as numbers. ● every character is "mapped" (associated) with an integer ● UTF-8, subset of Unicode, is such a mapping ● the function ord() takes a character and returns its UTF-8 integer value, chr() takes an integer and returns the UTF-8 character. Example: >>> chr(65) >>> 'A' >>> chr(97) >>> 'a' >>> ord('A') >>> 65 >>> ord('B') >>> 66

The 'is' Operator

● Tests object identity. Example: >>> s = "Hello World" >>> s == "Hello World" True >>> s is "Hello World" False # different memory addresses

find method

● The 'find' method's task is to locate a substring within the calling string (the object that invoked the method using the dot notation). The find method returns the index of the substring in the string where the substring first occurs (if there are multiple occurrences) but returns -1 if the substring is not found. Example: >>> my_str = 'mellow yellow' >>> my_str.find('m') 0 >>> my_str.find('ll') 2 >>> my_str.find('z') -1 ● Note how the method 'find' operates on the string object, word, and the two are associated by using the "dot" notation: word.find('l'). ● Terminology: the thing in parenthesis, i.e. the 'l' in this case, is called an argument.

String Functions

● The 'len' function is used to find a string's length, i.e. the number of individual characters in a string. ● The 'len' function operates on any collection: ● dict ● set ● list Example: >>> my_str = 'Hello World' >>> len(my_str) 11 >>> length_int = len(my_str) >>> print(length_int) 11 >>> len() Traceback (most recent call last): File "<pyshell#48>", line 1, in <module> len() TypeError: len() takes exactly one argument (0 given)

upper method

● The 'upper' method takes the associated object and creates a new string where all the letters are converted to uppercase. Example: >>> my_str = 'Python rules!' >>> my_ str.upper() 'PYTHON RULES!' >>>

String formatting

● The basic form of the 'format' string method is shown below: print("First: {}, Second: {} value".format(x, y)) ● As with all strings, use of the format method creates a new string. Strings are immutable!!!!! ● The programmer can insert special character sequences, enclosed in braces {}, in the format string that indicate a kind substitution that should occur at that position in the new string. ● The substitution is driven by the arguments provided in the format method. The objects in the arguments will be placed in the new string at the indicated position, as well as how it will be placed in the string at that position. After substituting the formatted data into the new string, the new string is returned. ● In its simplest form, the formatting commands are just empty braces. The objects that will be substituted for each brace are determined by the order of both the braces and the arguments. ● The first brace will be replaced by the first argument, the second brace by the second argument, and so on. Example: >>> "{} is {} years old".format("Bill",25) 'Bill is 25 years old.' >>> import math >>> "{} is nice but {} is divine!".format(1, math.pi) '1 is nice but 3.141592653589793 is divine!'

Strings Are Iterable

● The for loop iterates through each element of a sequence in order. ● For a string, this means character by character: Example: >>> for char in 'Hi mom': print(char, type(char)) H <class 'str'> i <class 'str'> <class 'str'> m <class 'str'> o <class 'str'> m <class 'str'> >>> N.B. - Iterating through the elements of a string is a very common operation in Python.

Structure of 'format' command

● The general structure of the most commonly used parts of the format command is: {:[align] [minimum width] [.precision] [descriptor]} ● where the square brackets, [ ], indicate optional arguments. It is important to note the placement of the colon. All the optional information comes after a colon in the braces. ● the content of the curly bracket elements are the format string, descriptors of how to organize that particular substitution. ● types are the kind of thing to substitute, numbers indicate total spaces. Example: {:.2f} # two floating point {:4} # 4 spaces wide including argument {:>10s} # string 10 spaces wide including argument, right justified

split method

● The split function will take a string and break it into multiple new string parts depending on the argument character, e.g. " , " or " . " or " _ " or " " ● by default, if no argument is provided, split is on any whitespace character (tab, blank, etc.) ● you can assign the pieces with multiple assignment if you know how many pieces are yielded: Example: >>> name = "John Marwood Cleese" >>> first, second, last = name.split() >>> transformed = last + ", " + first + " " + middle >>> print(transformed) Cleese, John Marwood >>> print(first) John >>> print(last) Cleese

map args to {}

● The string is modified so that the {} elements in the string are replaced by the format method arguments ● The replacement is in order: first {} is replaced by the first argument, second {} by the second argument and so forth. print("First: {}, Second: {} value".format(x, y))

Override the {}-to-argument matching

● To override the {}-to-argument matching we have seen, you can indicate the argument you want in the bracket ● if other descriptor stuff is needed, it goes behind the arg, separated by a : >>> print("{0} is {2} and {0} is also {1}".format("Bill", 25, "tall")

Iteration through a sequence

● We can use the for statement for iteration, such as the elements of a list or a string ● We use the for statement to process each element of a list, one element at a time for item in sequence: suite

enumerate iterator

● We frequently look for both an index and the character of a string; the enumerate iterator provides both the index of the character and the character itself as it steps through the string. ● The enumerate function returns two values: the index of an element and the element itself ● Can use it to iterate through both the index and element simultaneously, doing dual assignment Example: >>> river = "Mississippi" >>> for index, letter in enumerate(river): print(index,letter) 0 M 1 i 2 s 3 s 4 i 5 s 6 s 7 i 8 p 9 p 10 i

How can you find out the methods?

● You can use the IDE to find available methods for any type. You enter a variable of the type, followed by the '.' (dot) and then a tab. ● Remember, methods match with a type. Different types have different methods ● If you type a method name, the IDE will remind you of the needed and optional arguments.

docstring

● a brief description of a function using the """ """ ● If you have a long, multi-line comment that you want to insert, consider using a triple-quoted string. You need provide only the quotes at the beginning and the end of the comment, unlike using the # at the beginning of every line.

More string methods

● count method: my_str = 'Python rules!' my_str.count("y") returns 1 my_str.count("y", 1, 4) returns 0 ● endswith(suffix[, start[, end]]) - Return True if the string ends with the specified suffix, otherwise return False.

more on dot notation

● in generation, dot notation looks like: object.method(...) ● It means that the object to the left of the dot is calling the method to the right of the dot that is associated with that object's type. ● The method is an attribute of the object ● The methods that can be called are tied to the type of the object calling it. Each type has different methods.

string literal

● represents a fixed value that cannot be changed --> an immutable type Example: my_string = "Hello World"

half open range for slices

● slicing uses what is called a half-open range ● the first index is included in the sequence ● the last index is one after what is included

join method

● str.join(iterable) ● Return a string which is the concatenation of the strings in iterable:

Triple-Quote String

● triple quotes preserve both the vertical and horizontal formatting of the string ● allows you to type tables, paragraphs, whatever and preserve the formatting """this is a test today"""

String methods - changing case

● upper, lower, title, swapcase Example: >>> s = "Hello world" >>> s.upper() 'HELLO WORLD' >>> s.title() 'Hello World' ● Note that these do NOT modify s, but return a new string ● Strings are immutable!


Kaugnay na mga set ng pag-aaral

7.4 - Rational and Irrational Numbers

View Set

Assignment - Quiz - Ethical Principles: Application

View Set

Mishkin Chapter 4 Study Questions

View Set

Chapter 3 Test Bank Questions BA 323 EXAM 1

View Set

Chapter 58: Nursing Management: Stroke, Med Surg chapter 66, Med Surg Chapter 67, Chapter 46: Cerebral Dysfunction

View Set