文档介绍:Regular expressions and sed & awk
Regular expressions
• Key to powerful, efficient, and flexible text processing by allowing for variable information in the search patterns
• Defined as a posed of letters, numbers, and special symbols, that defines one or more strings
• You have already used them in selecting files when you used asterisk (*) and question mark characters to select
filenames
• Used by several Unix utilities such as ed, vi, emacs, grep, sed, and awk to search for and replace strings
– Checking the author, subject, and date of each message in a given mail folder
egrep "^(From|Subject|Date): " <folder>
– The quotes above are not a part of the regular expression but are needed by mand shell
– The metacharacter | (or) is a convenient one bine multiple expressions into a single expression to match
any of the individual expressions contained therein
∗ The subexpressions are known as alternatives
• A regular expression posed of characters, delimiters, simple strings, special characters, and other metacharacters
defined below
• Characters
– A character is any character on the keyboard except the newline character ’\n’
– Most characters represent themselves within a regular expression
– All the characters that represent themselves are called literals
– A special character is one that does not represent itself (such as a metacharacter) and needs to be quoted
∗ The metacharacters in the example above (with egrep) are ", ^, (, |, and )
– We can treat the regular expressions as a language in which the literal characters are the words and the
metacharacters are the grammar
• Delimiters
– A delimiter is a character to mark the beginning and end of a regular expression
– Delimiter is always a special character for the regular expression being delimited
– The delimiter does not represent itself but marks the beginning and end of the regular expression
– Any character can be used as a delimiter as long as it (the same charact