正则表达式快速入门教程

摘要：
看到一本书写的正则表达式教程非常好，特地整理出来，本教程结合linux的grep命令，可以让大家迅速掌握正则表达式。正则在nginx配置和linux命令中应用非常广泛。这个正则教程尽量写的简单，肯定可以看懂，如果碰到一个很繁琐的正则表达式，只要耐心分析肯定可以看懂，因为正则表达式都是一段一段的，不像复杂抽象的程序逻辑。

grep是常用的linux命令，用于字符串数据的对比，将符合条件的字符串打印出来。

1	grep '搜寻字符串' filename

一个栗子：

1 2	grep 'root' /etc/passwd root:x:0:0:root:/root:/bin/bash

为了显示突出显示效果也就是高亮效果，可以定义grep别名：

1	grep='grep --color=auto'

范例文件r.txt

在linux可以通过下列命令获取：

wget http://linux.vbird.org/linux_basic/0330regularex/regular_express.txt
mv regular_express.txt r.txt
cat r.txt
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
However, this dress is about $ 3183 dollars.
GNU is free air not free beer.
Her hair is very beauty.
I can't finish the test.
Oh! The soup taste good.
motorcycle is cheap than car.
This window is clear.
the symbol '*' is represented as start.
Oh!     My god!
The gd software is a library for drafting programs.
You are the best is mean you are the no. 1.
The world <Happy> is the same with "glad".
I like dog.
google is the best tools for search keyword.
goooooogle yes!
go! go! Let's go.
# I am VBird

这文件一共22行，最后一行是空白行。

基础正则表达式的练习

例一：

# grep -n 'the' r.txt

8:I can't finish the test.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

例二：用中括号[]来查找

如果想查找test或taste这两个单词，发现它们的共同点是’t?st’。可以这样查找：

# grep -n 't[ae]st' r.txt

8:I can't finish the test.
9:Oh! The soup taste good.

[]不论有几个字符，它都只代表某“一个”字符。如果想查找有oo的字符：

# grep -n 'oo' r.txt

1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

如果不想要oo前面有g的话：

# grep -n '[^g]oo' r.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

如果想要oo前面有小写字母：

1 2	# grep -n '[^a-z]oo' r.txt 3:Football game is not use feet only.

类似想法还有：[a-z]、[A-Z]、[0-9]、[a-zA-Z0-9]等，例如：

1
2
3

# grep -n '[0-9]' r.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

例三：行首与行尾字符^$

只列出行首有the的行：

1 2	# grep -n '^the' r.txt 12:the symbol '*' is represented as start.

列出行首是小写字母的行：

# grep -n '^[a-z]' r.txt
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

如果要列出行首不是英文字母的行：

1
2
3

# grep -n '^[^a-zA-Z]' r.txt
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird

注意：^符号用在方括号[]里外是不同的。在[]内表示“反向选择”，在[]外则表示定位在行首。
要找出结尾是小数点(.)的行：

# grep -n '\.$' r.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.

小数点在正则表达式中有特殊含义（下面讲），需要用反斜线()转义。第5到9行的结尾也是小数点，怎么没有打印出来？用cat -A将5到9行打印出来：

# cat -An r.txt | head -n 10 | tail -n 6
     5  However, this dress is about $ 3183 dollars.^M$
     6  GNU is free air not free beer.^M$
     7  Her hair is very beauty.^M$
     8  I can't finish the test.^M$
     9  Oh! The soup taste good.^M$
    10  motorcycle is cheap than car.$

5~9行是windows(DOS)格式的断行字符(^M$)，而第10行是linux格式断行字符。通过这个也就理解了为啥用$符号表示行尾。如果想找出空白行：

1 2	# grep -n '^$' r.txt 22:

linux的配置文件中有大量以#开始的注释，如果想不显示空行和注释：

# grep -v '^$' /etc/deluser.conf | grep -v '^#'
REMOVE_HOME = 0
REMOVE_ALL_FILES = 0
BACKUP = 0
BACKUP_TO = "."
ONLY_IF_EMPTY = 0
EXCLUDE_FSTYPES = "(proc|sysfs|usbfs|devpts|tmpfs|afs)"

例四：任意一个字符.与重复字符*

.(小数点)：表示一定有一个任意字符；
*(星号)：表示重复前一个字符0到无穷次；
假设要找出g??d的字符串：

# grep -n 'g..d' r.txt
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.
16:The world <Happy> is the same with "glad".

假如要列出oo,ooo,oooo等数据，需要用到星号。需要注意的是’o‘表示’’,’o’,’oo’,’ooo’等，即空字符也用’o‘表示。而’oo‘，表示’o’,’oo’,’ooo’等，即至少有一个o。同理，想表示至少两个o用’ooo*’：

# grep -n 'ooo*' r.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

如何查找两个g之间至少一个o，即gog,goog,gooog等：

1
2
3

# grep -n 'goo*g' r.txt
18:google is the best tools for search keyword.
19:goooooogle yes!

如果要查找以g开头以g结尾的字符串，是’gg’吗？正确的应是’g.g’：

# grep -n 'g.*g' r.txt
1:"Open Source" is a good mechanism to develop programs.
14:The gd software is a library for drafting programs.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

如果只留下英文单词，则：

1
2
3

# grep -n 'g[a-zA-Z]*g' r.txt
18:google is the best tools for search keyword.
19:goooooogle yes!

如果查找任意数字：

1
2
3

# grep -n '[0-9][0-9]*' r.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

例五：限定连续RE字符范围{}

之前，用.和*来设置0个到无限个重复字符，如果需要限定重复次数呢？这需要用到限定范围的字符{}了。由于在shell中{}有特殊含义，需要用反斜线\进行转义。假如要找到两个o的字符串：

# grep -n 'o\{2\}' r.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

假设要要查找g后面2到5个o，然后再接一个g的字符串，则：

1 2	# grep -n 'go\{2,5\}g' r.txt 18:google is the best tools for search keyword.

第19行由于有6个o，导致没有被选择上。

基础正则表达式的总结

RE字符	含义
^word	带查找的字符串(word)在行首
word$	带查找的字符串(word)在行尾
.	代表一定有一个任意字符
\	转义字符
*	重复0次到无穷次的前一个字符
[list]	列举出想要选取的字符，如’a[al]y’表示可以查找aay,aly。
[n1-n2]	列举出想要选取的字符范围，如’[0-9]’表示十进制数字字符
[^list]	定义不要的字符或范围，如’[^A-Z]’表示不要大写字符
{n,m}	连续n到m个前一个RE字符

扩展正则表达式

grep使用扩展正则表达式要加-E参数或直接使用egrep别名命令。

RE字符	含义
+	重复1次到无穷次的前一个字符
?	代表0个或1个任意字符
\|	用或(or)的方式找出数个字符串.例如，egrep -n ‘gd\| good’ r.txt
()	找出”组”字符串。如查找glad或good， egrep -n ‘g(la\| oo)d’ r.txt
()+	重复1次到无穷次前面的组。如查找”AxyzxyzxyzxyzC”，echo ‘AxyzxyzxyzxyzC’ \| egrep ‘A(xyz)+C’

需要强调的是感叹号！在正则表达式中并不是特殊字符。

以上，希望有帮助