1 / 18
文档名称:

Analyzing Unicode Text with Regular Expressions (2004) - IBM.pdf

格式:pdf   页数:18
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

Analyzing Unicode Text with Regular Expressions (2004) - IBM.pdf

上传人:bolee65 2014/1/6 文件大小:0 KB

下载得到文件列表

Analyzing Unicode Text with Regular Expressions (2004) - IBM.pdf

文档介绍

文档介绍:Analyzing Unicode Text with Regular Expressions
Analyzing Unicode Text with Regular Expressions
Andy Heninger
IBM Corporation
******@us.
Abstract
For decades now, Regular Expressions have been used in the analysis of text data, for
searching for key words, for extracting out desired fields or substrings from larger bodies
of text and in editing or transforming text.
This paper will discuss the application of regular expressions to Unicode text data,
including the approaches and extensions that are required to work effectively with the
very large Unicode character repertoire. The emphasis is on Unicode specifically, not on
the features of regular expressions in general, which is a subject about which entire books
can, and have, been written.
A Very Quick Look At Regular Expressions
Although this paper will primarily be dealing with Unicode related questions, a regular
expression language is still needed for discussion and for use in examples Here is a
minimalist one, smaller than most real implementations, but sufficient for the purpose.
26th Internationalization and Unicode Conference 1 San Jose, CA, September 2004
Analyzing Unicode Text with Regular Expressions
Item Definition
. Match any single character
[range or set of characters] Match any character of a class or set of characters. Set
expressions will be described later.
* Match 0 or more occurrences of the preceding item.
+ Match 1 or more occurrences of the preceding item.
Literal Characters Match themselves.
\udddd Unicode Code Point Values, 16 or 32 bits.
\Udddddddd
( sub-expression ) Grouping. (abc)*, for example.
a|b|c Alternation. Match any one of 'a' or 'b' or 'c'.
And, to make things more concrete, here are a few samples of simple expressions
Expression Description
Hello Match or select appearances of the word “Hello” in the
target text.
aa[a-z]* Match any word beginning with “aa” and consisting of
only the lower case letters a-z. (Just what is in the range

最近更新

2025年华南理工大学马克思主义基本原理概论期.. 13页

2025年南京农业大学马克思主义基本原理概论期.. 12页

航空发动机故障诊断技术 29页

2025年南开大学马克思主义基本原理概论期末考.. 12页

2025年南通大学杏林学院马克思主义基本原理概.. 13页

绿色摄影技术的创新与实践 27页

2025年台江县招教考试备考题库及答案解析(必.. 31页

2025年吉林省白山市单招职业倾向性测试题库带.. 43页

2025年吐鲁番职业技术学院马克思主义基本原理.. 12页

2025年哈尔滨市职工大学马克思主义基本原理概.. 12页

2025年唐山幼儿师范高等专科学校单招综合素质.. 44页

网格间能量调度 35页

2025年四川工程职业技术学院单招职业倾向性考.. 45页

2025年四川现代职业学院单招职业适应性测试题.. 43页

网络安全攻防能力评估 33页

2025年天府新区信息职业学院单招综合素质考试.. 44页

2025年天津市和平区新华职工大学马克思主义基.. 12页

绿色消费与社会可持续发展关联 36页

高性能计算在工程中的应用-第1篇 37页

高速网络数据包处理机制改进 31页

肌病免疫调控研究 33页

绿色公交技术发展 37页

高分辨率显示与光源效率提升 35页

风电场投资回报分析 35页

2025年宣化县幼儿园教师招教考试备考题库含答.. 30页

风险合规与监管 38页

2025年尚义县招教考试备考题库带答案解析(必.. 31页

2025年山东政法学院马克思主义基本原理概论期.. 12页

2025年山西机电职工学院马克思主义基本原理概.. 13页

2025年岢岚县幼儿园教师招教考试备考题库及答.. 31页