Regular Expressions
Match a lower case word that starts with and ends with the same letters
([a-z])[a-z]*\1
Match the word cat or bobcat with any capitilzation
([bB][oO][bB])?[cC][aA][tT]
Match the word data repeated any number of times and ending with a single digit
(data)+\d
Write a regular expression that matches cat? Hat. bAt# 123! but does not match fatt Dan9 zApS
...\W or \w\w\w\W
What is a -1 in the Pearson Product-Moment Correlation?
A perfect negative correlation
What is a 1 in the Pearson Product- Moment Correlation?
A perfect positive correlation
What is an Engine?
A piece of software that can process these expressions and attempt to match the pattern to a given string.
What would the string \D\s\d look for?
A sequence that is any non-digit number followed by a space followed by a digit from 0-9
What is the Pearson Product-Moment Correltation
A value between -1 and 1
What do we need in order to use regular expressions?
An "engine"
What is a Literal?
Any character we use in a search or matching expression (E.g. to find ind in windows the ind is the literal string).
Character Classes: What would the pattern [0-9] match? What about [B-E]?
Any digit; Any upper case letters between B and E (upper case in specific)
What would \d\d find?
Any string of 2 digits from 0-9 in the data set
What would the patter \D\D\D find?
Any string of 3 non digit characters in a data set
How does the CORREL function work out?
CORREL (array 1, array 2)
What does the CORREL or Pearson Product-Moment Correlation attempt to do?
Draw a line of best fit through the data?
What is everything in data (in regards to regular expressions)
Everything is essentially a character.
What does a ^ at the start of a character class do?
Excludes any characters in the class (matches characters not listed between the square brackets)
What is an escape sequence
Is a way of indicating that we want to use one of our meta characters as a literal.
What is a meta-character?
Is one or more special characters that have a unique meaning and are not used as literals in the search expression. (e.g. \/ ^ $ [ ] + * Etc.)
What specific kinds of correlations does the Pearson-Product Moment correlation work?
Linear Correlations
What does \r do?
Looks for a cartridge return
What does \n do?
Looks for a line break
what does \s do?
Looks for a space
What does \t do?
Looks for a tab
What does \S do?
Looks for anything that is not a space
What would the sequence \d\.\d match?
Match a one digit number with a single decimal place
What would the line \w\r\n\w do?
Match a word character followed by the end of a line and another word character
What would the pattern [^bc]at find?
Match all three letter words that end in at except bat or cat
What would the search c.t find?
Match any three letter word starting with a 'c' and ending with a 't', can have numbers or characters in the middle
What would (...){min,max} match?
Match the pattern min to max times
What would (...)+ match?
Match the pattern one or more times
What would the patter (...)* match?
Match the pattern zero or more times
What would (...)? match
Match the pattern zero or one times
What would the patter [bc]at match?
Match the strings that have either a b or c and then an 'at' at the end. so if the example was bobcat --> match would be cat not bcat because it only gets a single character
What would [aeiou]{3,5} match?
Match three to five lower case vowels in a row
What would \d+ match?
Matches a digit one or more times
What would c.?t match?
Matches a word starting with c and ending with t that has any or no character between (ct, cat, czt, c6t, c%t)
Character Classes: What would [a-zA-Z] match? What about [a-zA-Z0-9]? What about [a-z\n\t]?
Matches any letter (upper or lower case). Matches any letter or digit. Matches any lower case letter and the tab or line break characters.
What does a . do? How would you match a period?
Matches any single character (letter, digit, white space, everything) To match a period you would need to search \.
What does ^ match?
Matches the beginning of a line (except when sued in a character class like [^...]
Kleene Star and Plus Notation: What does c* match?
Matches the caracter c zero or more times
Kleene Star and Plus Notation: What does c+ match?
Matches the character c one or more times
Kleene Star and Plus Notation: What does C? Match?
Matches the character c zero or one times
What does $ match?
Matches the end of a line
What would [a-z]* match?
Matches the lower case letters zero or more times.
Conditional: What would patt1 | patt 2
Matches the patter patt1 or patt2
Conditional: What would (patt1|patt2) match?
Matches the same part and can be part of a grouping
What do the \b's do in : \b([a-z])[a-z]*\1\b
Matches word boundaries
Do Correlations Imply Causation?
NO
What is a 0 in the Pearson Product-Moment Correlation
No correlation
What does the expression [xyz] match?
Only a single x y or z (the exact characters between the brackets).
How are letters and numbers expressed in regular expressions?
Ordinary characters (or literals) are matched literally (a matches a, b matches b, 1 matches 1)
Groups: What is the purpose of groups?
Sometimes we need to reference a match we made previously
How do we match whole lines and not allow partial matches?
Use anchors ^
What do Groups allow us to remember?
What we matched
What does a line ended with \r\n do?
When looking it moves to the next line.
Match a digit or a lower case letter
[a-z] | [0-9]
How do we match [ ] brackets?
\[ \]
What is the difference between \b and \B
\b is a word boundary \B is not a word boundary
difference between \d and \D
\d = is a digit \D = is not a digit
Write a regular expression to match the phone number patter ###-###-#### where # = any digit from 0-9
\d\d\d-\d\d\d-\d\d\d\d
What is the difference between \h and \H
\h is a horizontal space (space, tab, \n) \H is not a horizontal space
Difference between \l and \L
\l = is a lower case letter \L is not a lower case letter
What is the difference between \s and \S
\s is a spacing character \S is not a spacing character
What is the difference between \u and \U
\u is a upper case letter \U is not an upper case letter
What is the difference between \v and \V
\v is a vertical space \V is not a vertical space
What is the difference between \w and \W
\w is a word character \W is not a word character
Match a line of characters that begins with a single digit and a lower case letter and ends with the same lower case letter followed by the same digit.
^([0-9])([a-z]).*\2\1$
What are the two ways to match whole words?
^([a-z])[a-z]*\1$ - Only matches if word is only text on the line \b([a-z])[a-z]*\1\b - matches any whole word
Match a line that starts with hello
^Hello.* (looks for a line beginning with hello and then searches for any character one or more times)
How would you match lines that only have lower case letters?
^[a-z]+$
What is \R
any new line character (\n \r and others)
what is \r
cartridge return (CR control character)
Match the word cat or the word dog
cat | dog
What would xy?z match?
it would match xyz or xz (cause it matches y zero or one times)
What does \d do?
looks for any digit from 0 - 9
what does \D do?
looks for any non-digit character
what does \W do?
looks for any non-word character (non alphanumeric)
what does \w do?
looks for any word character (alphanumeric)
What does the pattern [x-y] do?
matches any character between x and y alphabetically
What does c{n} match? What would c{3,5} match?
matches the character c, n times matches c 3 to 5 times times.
what is \n
new line
what is ^
start of a line
What is \t
tab character
what is $
the end of a line
Repetitions: What is the notation for matching a character multiple times?
{min, max}