The following Regex finds WikiWords in an i18n-proof way:
private static Regex _wikiWords = new Regex(@"
\b #start on a word bounday
\p{Lu} #start with uppercase letter
\p{Ll}* #zero or more lowercase letters
\p{Lu} #one uppercase letter
\w* #and zero or more arbitrary characters
| #or
\p{L}+\d\w* #a mix of letters and digits
| #or
\d+\p{L}\w* #a mix of digits and letters
", RegexOptions.IgnorePatternWhitespace);Speaking of i18n. The term i18n is flawed. I am from the Netherlands. If i want to support my own language, and no other language, no international boundary is crossed. But I still need a i18n-proof WikiWord engine.
No comments:
Post a Comment