Week 3 - #2 - Automation
You can also iterate through a string using regular expressions and the scan method, which accepts a regexp as an argument. In the following example, you could iterate through each letter in a string using the wildcard character.
"Ruby".scan(/./) do |letter| puts letter end R u b y => Ruby
If you begin your regular expressions with the %r symbol, you can use whatever delimiter you want instead of forward slashes. This is useful when your pattern might need to match on many forward slash characters, like a Windows directory path, and you don't want to escape them all.
%r{\} =~ "forward/slash" => 7
Examples mixing regex
/.ickle$/i =~ "triCKle" => 1 /.ickle$/i =~ "tickle" => 0 /.ickle$/i =~ "TRICKLE" => 1 /.ickle$/i =~ "tricycle" => nil /.ickle$/i =~ "trickles" => nil
The dot symbol is a wildcard and can be used to match any character.
/.ig/ =~ "wig" 0 /.ig/ =~ "pig" 0
Escape also works for forward slashes
/// =~ "forward/slash" SyntaxError: irb:1: syntax error, unexpected =~ /// =~ "forward/slash" /\// =~ "forward/slash" => 7
When used with the wildcard symbol, the repetition character can match a pattern containing any number of characters. The following pattern matches any string, no matter how long, as long as it starts with a capital A and ends in a lowercase a.
/A^.*a$/ =~ "Albania" => 0 /A^.*a$/ =~ "Argentina" => 0 /A^.*a$/ =~ "Afghanistan" => nil /A^.*a$/ =~ "Aa" => 0
The match method is called with the string to match the pattern against, and an optional positional argument to specify where in the string to start the search. If no match is found, nil is returned. If a match is found, instead of the starting position of the matched text, the match method returns an object of the MatchData class.
/R.*y/match("Ruby",1)
You can also give the match method a block, similar to iteration and looping. The MatchData object will be passed to the block, where you can manipulate it as needed. If there's no match, the nil value gets returned and the block won't execute unnecessarily.
/Ruby/.match("Scripting with Ruby!") do |m| puts m.to_s end Ruby => nil /Emerald/.match("Scripting with Ruby!") do |m| puts m.to_s end => nil
more examples using regexp
/^abc/ =~ "abc" => 0 /^abc/ =~ "abcdef" => 0 /^abc/ =~ "123abc" => nil /abc$/ =~ "123abc" => 3 /^abc$/ =~ "abc" => 0 /^abc$/ =~ "abcdef" => nil
more examples using plus or asterisk, where plus means that the string is repeated one or more times and asterisk means that the string is repeated zero or more times.
/a*bc/ =~ "bc" => 0 /a+bc/ =~ "bc" => nil /a+bc/ =~ "aaaaabc" => 0
more examples using wildcard dot character
/ab.de/ =~ "abcde" => 0 /ab.de/ =~ "abcde" => 0 /ab.de/ =~ "ab4de" => 0 /ab.de/ =~ "ab de" => 0 /ab.de/ =~ "abde" => nil
.* (dot asterisk) means any number of any characters
/abc.*/ =~ "abcdef" => 0 /abc.*/ =~ "abc" => 0
If you need to match a string containing a special character, like $ or ? use the escape character, represented by the \.
/abc./ =~ "abcd" => 0 /abc\./ =~ "abcd" => nil /$/ =~ "$10.00" => 5 /\$/ =~ "$10.00" => nil
Repetition symbols include both the + character, which means match one or more occurrences of the preceding character and the * symbol, which means matches 0 or more occurrences of the preceding character.
/p*ickle/ =~ "pickle" => 0 /p*ickle/ =~ "pppppickle" => 0 /p*ickle/ =~ "ickle" => 0 /p+ickle/ =~ "pickle" => 0 /p+ickle/ =~ "pppppickle" => 0 /p+ickle/ =~ "ickle" => nil
Use ___________________________ anchors to match the start and end of the whole string if it spans multiple lines.
\A and \Z
Regular expression patterns can be grouped using parentheses, usually called capture groups. In the following example, we combine the wild card and repetition symbols to signify we're only interested in the text that contains both the abc and 123 substrings. In plain English: Match abc, followed by any number of other characters, followed by 123, followed by any number of characters. The capture groups, represented by the parentheses, both group the regular expressions inside of them and capture the text that matches them for later use. The inspect method just gives you a representation of the object it's called on, in this case, showing that m.captures returns an array.
m = /(abc).*123.*/i.match("abcdefg1234567") #<MatchData "abcdefg1234567" 1: "abc" 2: "123"> puts m.capture.inspect ["abc", "123"] => nil puts m[0] abcdefg1234567 => nil puts m[1] abc => nil puts m[2] 123 => nil
You can use a dash to specify a range of characters in your character classes. For example, to match any number between zero and nine, you could write the following:
m = /0-9/.match("2") #<MatchData "2"> puts m.to_s 2 => nil
The MatchData object contains all of the information resulting from a Regex pattern match, like the Regexp used, the original string, the matched and unmatched portion of the string, etc.
m = /Ruby/.match("Scripting with Ruby!") => #<MatchData "Ruby"> # print original string with /string method puts m.string "Scripting with Ruby!" => nil # print pattern searched for with Regexp method m.regexp => /Ruby/ # see matching pattern string with to_s method puts m.to_s Ruby => nil # See the text before the match puts m.pre_match Scripting with => nil # See the text after the match puts m.post_match ! => nil
Ruby regular expressions can also accept __________________ characters after their final forward slash, which further refines pattern matching.
modifier
You can use split to chop up a string into an array, using either a character or regular expression to tell Ruby where to do the splitting. For example, break an IP address into its constituent octets. In the following example, we've passed a regular expression that matches the dot character literally, because we've used the slash to escape it. Split uses this pattern to break the string into chunks. This gives us each octet stored in an array, which we can then print.
octets = "192.186.1.1".split(/\./) => ["192", "186", "1", "1"] octets.each { |octet| puts octet } 192 186 1 1 => => ["192", "186", "1", "1"]
Regexp example: Match any string that contains the characters igers, with the optional start of character t, with a pattern to check against several strings.
pattern = /t?igers/ => /t?igers/ puts pattern =~ ("tigers") 0 => nil puts pattern =~ ("ligers") 1 => nil puts patter =~ ("bears") => nil
The easiest and fastest way to find out if a regular expression pattern matches a string is to use the basic pattern matching operator =~ (equal tilda symbols). If the pattern matches, the operator will return the string starting position of the substring that matched the pattern. Otherwise, if not a match, returns nil.
puts /Ruby/ =~ "The word Ruby is contained in this text." 9 => nil
The following example of combining class characters matches any lower case letter and any number between zero and nine.
puts /[a-z0-9]/.match("a").to_s a => nil puts /[a-z0-9]/.match("5").to_s 5 => nil
Ruby offers specialized metacharacters that behave like character classes that can be used as shortcuts to match specific types of text. For example, instead of using the square brackets syntax to match a numeric digit, you could use the /d metacharacter. In the following example, you can combine them with the repetition functionality to match exactly any three numbers.
puts /\d/.match("5").to_s 5 => nil puts /\d/.match("a").to_s a => nil puts /\d{3}/.match("123").to_s 123 => nil puts /\d{3}/.match("123456").to_s => nil
The ^ anchor character indicates that the regexp should match from the beginning of the line, but NOT a string.
puts /^x/ =~ "xylem" 0 => nil puts /^x/ =~ "foxes" => nil
The i modifier character ignores case when performing a pattern match.
puts /yelling/i =~ "I'm not YELLING!" 8 => nil puts /yelling/ =~ "I'm not YELLING!" => nil
The $ anchor character indicates that the regexp should match from the ending of the line, but NOT a string.
puts /z$/ =~ "quiz" 3 => nil puts /z$/ =~ "zanzibar" => nil
_______________________ or regexp is short for regular expressions and is essentially a search query for text expressed by a string pattern. It encompasses tools like sed, awk, grep, egrep, ed, etc.
regex
Character classes, delineated by square brackets in a regexp, allow you to match against the set of characters contained within them.
regex = /[Rr]uby/ => /[Rr]uby/ m1 = regex.match("Ruby") #<MatchData "Ruby"> puts m1.to_s Ruby => nil m2 = regex.match("ruby") #<MatchData "ruby"> puts m2.to_s ruby => nil
________________________ are commonly defined using the slash symbol as a delineator, similar to how quotation marks delineate string objects. For example, /Ruby/ matches the literal character sequence Ruby.
regexps
By adding regular expressions, we can make substitutions based on patterns. In the following example, the question mark symbol means the character before it in a regular expression is optional. Ruby reads the string and uses sub to replace the first substring that matches, swapping tomahto with banana. If we wanted to replace both the tomahto and tomato substrings, we could use gsub.
s = "you say tomato, I say tomahto" => "you say tomato, I say tomahto" puts s.sub(/tomah?to/, "banana ") you say banana, I say tomahto => nil s = "you say tomato, I say tomahto" => "you say tomato, I say tomahto" puts s.gsub(/tomah?to/, "banana ") you say banana, I say banana => nil
Sometimes extraction with strings is problematic. For example, some of the information may change for future queries.
str = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade" => "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade" puts str[40,5] 12345 => nil *Problems: ID processes and dates will change and affect indexing for future queries.
regular expressions example
str = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade" => "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade" regex = /\[(\d+)\]/ => /\[(\d+)\]/ results = regex.match(str) => #MatchData "[12345]" 1: "12345"> puts results.captures 12345 => nil