Feel like a geek and get yourself Ema Personal Wiki for Android and Windows
Showing posts with label i18n. Show all posts
Showing posts with label i18n. Show all posts

09 April 2010

i18n-proof c# regex recognizing uppercase and lowercase letters

Finding WikiWords with patterns like [A-Z][a-z][...] won't do: the recognition of uppercase and lowercase letters is not i18n-proof.
The following Regex finds WikiWords in an i18n-proof way:
private static Regex _wikiWords = new Regex(@"
    \b       #start on a word bounday
    \p{Lu}   #start with uppercase letter
    \p{Ll}*  #zero or more lowercase letters 
    \p{Lu}   #one uppercase letter 
    \w*      #and zero or more arbitrary characters 
    |                 #or
    \p{L}+\d\w*       #a mix of letters and digits
    |                 #or
    \d+\p{L}\w*       #a mix of digits and letters
", RegexOptions.IgnorePatternWhitespace);
Speaking of i18n. The term i18n is flawed. I am from the Netherlands. If i want to support my own language, and no other language, no international boundary is crossed. But I still need a i18n-proof WikiWord engine.