Chapter 7 - Pattern Matching with Regular Expressions

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

10. What is the difference between the + and * characters in regular expressions?

The + matches one or more. The * matches zero or more.

22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following: 'Alice eats apples.' 'Bob pets cats.' 'Carol throws baseballs.' 'Alice throws Apples.' 'BOB EATS CATS.' but not the following: 'Robocop eats apples.' 'ALICE THROWS FOOTBALLS.' 'Carol eats 7 cats.'

re.compile(r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.', re.IGNORECASE)

21. How would you write a regex that matches the full name of someone whose last name is Nakamoto? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following: 'Satoshi Nakamoto' 'Alice Nakamoto' 'Robocop Nakamoto' but not the following: 'satoshi Nakamoto' (where the first name is not capitalized) 'Mr. Nakamoto' (where the preceding word has a nonletter character) 'Nakamoto' (which has no first name) 'Satoshi nakamoto' (where Nakamoto is not capitalized)

re.compile(r'[A-Z][a-z]*\sNakamoto')

20. How would you write a regex that matches a number with commas for every three digits? It must match the following: '42' '1,234' '6,368,745' but not the following: '12,34,567' (which has only two digits between the commas) '1234' (which lacks commas)

re.compile(r'^\d{1,3}(,\d{3})*$') will create this regex, but other regex strings can produce a similar regular expression.

18. If numRegex = re.compile(r'\d+'), what will numRegex.sub('X', '12 drummers, 11 pipers, five rings, 3 hens') return?

'X drummers, X pipers, five rings, X hens'

17. What is the character class syntax to match all numbers and lowercase letters?

Either [0-9a-z] or [a-z0-9]

5. In the regex created from r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group 0 cover? Group 1? Group 2?

Group 0 is the entire match, group 1 covers the first set of parentheses, and group 2 covers the second set of parentheses.

7. The findall() method returns a list of strings or a list of tuples of strings. What makes it return one or the other?

If the regex has no groups, a list of strings is returned. If the regex has groups, a list of tuples of strings is returned.

14. How do you make a regular expression case-insensitive?

Passing re.I or re.IGNORECASE as the second argument to re.compile() will make the matching case insensitive.

6. Parentheses and periods have specific meanings in regular expression syntax. How would you specify that you want a regex to match actual parentheses and period characters?

Periods and parentheses can be escaped with a backslash: \., \(, and \).

2. Why are raw strings often used when creating Regex objects?

Raw strings are used so that backslashes do not have to be escaped.

15. What does the . character normally match? What does it match if re.DOTALL is passed as the second argument to re.compile()?

The . character normally matches any character except the newline character. If re.DOTALL is passed as the second argument to re.compile(), then the dot will also match newline characters.

16. What is the difference between these two: .* and .*?

The .* performs a greedy match, and the .*? performs a nongreedy match.

9. What two things does the ? character signify in regular expressions?

The ? character can either mean "match zero or one of the preceding group" or be used to signify nongreedy matching.

13. What do the \D, \W, and \S shorthand character classes signify in regular expressions?

The \D, \W, and \S shorthand character classes match a single character that is not a digit, word, or space character, respectively.

12. What do the \d, \w, and \s shorthand character classes signify in regular expressions?

The \d, \w, and \s shorthand character classes match a single digit, word, or space character, respectively.

4. How do you get the actual strings that match the pattern from a Match object?

The group() method returns strings of the matched text.

19. What does passing re.VERBOSE as the second argument to re.compile() allow you to do?

The re.VERBOSE argument allows you to add whitespace and comments to the string passed to re.compile().

1. What is the function that creates Regex objects?

The re.compile() function returns Regex objects.

3. What does the search() method return?

The search() method returns Match objects.

11. What is the difference between {3} and {3,5} in regular expressions?

The {3} matches exactly three instances of the preceding group. The {3,5} matches between three and five instances.

8. What does the | character signify in regular expressions?

The | character signifies matching "either, or" between two groups.

Chapter summary

While a computer can search for text quickly, it must be told precisely what to look for. Regular expressions allow you to specify the precise patterns of characters you are looking for. In fact, some word processing and spreadsheet applications provide find-and-replace features that allow you to search using regular expressions. The re module that comes with Python lets you compile Regex objects. These values have several methods: search() to find a single match, findall() to find all matching instances, and sub() to do a find-and-replace substitution of text. There's a bit more to regular expression syntax than is described in this chapter. You can find out more in the official Python documentation at http://docs.python.org/3/library/re.html. The tutorial website http://www.regular-expressions.info/ is also a useful resource. Now that you have expertise manipulating and matching strings, it's time to dive into how to read from and write to files on your computer's hard drive.

Intro

You may be familiar with searching for text by pressing CTRL-F and typing in the words you're looking for. Regular expressions go one step further: They allow you to specify a pattern of text to search for. You may not know a business's exact phone number, but if you live in the United States or Canada, you know it will be three digits, followed by a hyphen, and then four more digits (and optionally, a three-digit area code at the start). This is how you, as a human, know a phone number when you see it: 415-555-1234 is a phone number, but 4,155,551,234 is not. Regular expressions are helpful, but not many non-programmers know about them even though most modern text editors and word processors, such as Microsoft Word or OpenOffice, have find and find-and-replace features that can search based on regular expressions. Regular expressions are huge time-savers, not just for software users but also for programmers. In fact, tech writer Cory Doctorow argues that even before teaching programming, we should be teaching regular expressions: "Knowing [regular expressions] can mean the difference between solving a problem in 3 steps and solving it in 3,000 steps. When you're a nerd, you forget that the problems you solve with a couple keystrokes can take other people days of tedious, error-prone work to slog through."[1] In this chapter, you'll start by writing a program to find text patterns without using regular expressions and then see how to use regular expressions to make the code much less bloated. I'll show you basic matching with regular expressions and then move on to some more powerful features, such as string substitution and creating your own character classes. Finally, at the end of the chapter, you'll write a program that can automatically extract phone numbers and email addresses from a block of text.


Set pelajaran terkait

CH 21-22 (Theory of metal machining/Machine tools and operations)

View Set

Anatomy Ch. 11- Axial and Appendicular Muscles

View Set

Biological Anthropology Chapter 13

View Set