The following Regex finds WikiWords in an i18n-proof way:
private static Regex _wikiWords = new Regex(@" \b #start on a word bounday \p{Lu} #start with uppercase letter \p{Ll}* #zero or more lowercase letters \p{Lu} #one uppercase letter \w* #and zero or more arbitrary characters | #or \p{L}+\d\w* #a mix of letters and digits | #or \d+\p{L}\w* #a mix of digits and letters ", RegexOptions.IgnorePatternWhitespace);Speaking of i18n. The term i18n is flawed. I am from the Netherlands. If i want to support my own language, and no other language, no international boundary is crossed. But I still need a i18n-proof WikiWord engine.
No comments:
Post a Comment