Information Technology Ch. 10
what a regular expression represents
A string of binary characters, represented in a collection of strings (more specifically, a string that combines literal characters, such as 1 or 0, with metacharacters, symbols that represent outputs). An output of strings will be given that match the pattern
[[:alpha:]]
Alphabetic-upper or lower case
metacharacters: [[:class:]]
An alternative form of [ ] where the :class: can be one of several categories such as alpha (alphabetic), digit, alnum (alphabetic or numeric), punct, space, upper, lower
[[:punct:]]
Any punctuation character
[[:graph:]]
Any visible character
[[:print:]]
Any visible character plus the space
[[:space:]]
Any whitespace (tab, return key, space, backspace)
Bash wildcard: [[:class:]]
As with regular expressions, matches any character in the specified class
ls *[[:upper:]]*
FOO, FOO11.txt, FOO1.dat, FOO.txt,/FOO4
ls *[[:upper:]]*.txt
FOO1.txt, FOO.txt
Escape character: \b
Match a word boundary
Escape character: \w
Match any letter (a-zA-Z) or digit
Escape character: \D
Match any non-digit
Escape character: \W
Match any non-letter/non-digit
Escape character: \S
Match any non-white space
Escape character: \B
Match any non-word boundary
Escape character: \s
Match any white space
metacharacters: [^...]
Match if the expression does not contain any characters in [ ]
metacharacters: ^
Match if this expression begins a string
Bash wildcard: **
Matches all files and directories
Bash wildcard: []
Matches any of the enclosed characters, ranges are permitted when using a hyphen, if the first character after the [ is either a - or a ^, it matches any character that is not enclosed in the brackets.
Bash wildcard: @
Matches any one of the listed patterns
Bash wildcard: ?
Matches any single character (note: does not match 0 characters)
Bash wildcard: *
Matches any string, including the null string
Bash wildcard: !
Matches anything except one of the list patterns
Bash wildcard: */
Matches directories
Bash wildcard: +
Matches one or more occurrences (similar to regular expressions)
metacharacters: ( )
The items in .. are treated as a group, match the entire sequence.
grep option: -e regex
The regular expression is placed after -e rather than where it normally is positioned in the instruction; this is used to protect the regular expression if it starts with an unusual character, for instance, the hypen
Bash wildcard: \
Used to escape the meaning of the given character (similar to regular expressions)
grep option: -d read
Used to handle all files of a given directory, use recurse in place of read to read all files of a given directory, and recursively for all subdirectories
ls *
Will list all items in directory
ls *.{dat,txt}
Will list all items in directory ending in either .txt or .dat
ls *[[:digit:]]*
Will list every item that contains a digit
[[:alnum:]]
alphanumeric- letter or digit
[[:cntrl:]]
any control character
grep
command that searches one or more text files for strings that match a given regular expression
grep option: -c
count the number of matches and output the total, do not output any matches found
[[:digit:]]
digit
ls foo[0-2].*
foo1.txt, foo2.dat
ls foo[[:digit:]].*
foo1.txt, foo2.dat, (it does not list foo11.txt because we are only seeking 1 digit, and it does not list foo5?.txt because we do not provide for the ? after the digit and before the period)
ls *\?.*
foo5?.txt
how grep/egrep works
grep pattern filename(s) you should use ``
[[:xdigit:]]
hexadecimal digit
grep option: -i
ignore case (e.g. [a-z] would match any letter whether upper or lower case)
grep option: -v
invert the match, that is, print all lines that do not match the given regular expression
ls *.txt
items returned: foo.txt, foo1.txt, foo11.txt, FOO.txt, FOO11.txt, foo5?.txt
ls *.*
items returned: foo.txt, foo1.txt, foo2.dat, foo11.txt, FOO.txt, FOO1.dat, FOO11.txt, foo5?.txt
ls foo?.*
items returned: foo1.txt, foo2.dat
ls foo??.*
items returned: foo11.txt, foo5?.txt
[[:lower:]]
lower case letter
how to use regular expressions in a long list of files
ls -l * | egrep '*'
Escape character: \d
match any digit
metacharacters: |
match any of these strings (OR)
metacharacters: []
match if the expression contains any of the characters in []
metacharacters: {n,}
match if the string contains at least n occurrences of the preceding character
metacharacters: {n,m}
match if the string contains between n and m consecutive occurrences of the preceding character
metacharacters: {n}
match if the string contains n consecutive occurrences of the previous character
metacharacters: {,m}
match if the string contains no more than m consecutive occurrences of the preceding character
metacharacters: .
match if this expression begins a string
metacharacters: [char-char]
match if this expression contains any characters in the range from char to char (ex. 1-9, a-z, A-Z)
metacharacters: $
match if this expression ends a string
metacharacters: ?
match the preceding character if it appears 0 or 1 time
metacharacters: *
match the preceding character if it appears 0 or more times
metacharacters: +
match the preceding character if it appears 1 or more times
grep option: -o
only output the portion of the line that matches the regular expression
grep option: -L
output any filenames with no matches, do not output matches
grep option: -n
output line numbers
grep option: -a
process a binary file as if it were a text file (this lets you search binary files for specific strings of binary numbers)
grep option: -R, -r
recursive search (same as -d recurse)
[[:blank:]]
space or tab
grep option: -m NUM
stop reading a file after NUM matches
grep option: -h
suppress filename from the output
metacharacters: \
the next character should be interpreted literally, used to escape the meaning of a metacharacter, for instance \$ means "match a $"
[[:upper:]]
upper-case digit
grep option: -E
use egrep (allow the extended expression set)