Feel like a geek and get yourself Ema Personal Wiki for Android and Windows

25 September 2009

Html Agility Pack to the rescue

One of our products is a messaging processor which uses XMl internally. Somehow someone messed up and managed to insert characters into the XML which are not valid, making the message "invalid XML". By the way "Invalid XML" is a contradictio in terminis. I would rather refer to "invalid XML file" as A Text File That Looks A Lot Like XML But Isn't.

Solving the issue would require updating the shared core component which is used by many applications. Updating the core component was not an option as a bugfix.

We decided to fix the bug in the following major release, but still needed a workaround for the problems in the field.

Enter Html Agility Pack. It is a tolerant library that can read HTML files (text files looking a lot like SGML or like XML) and convert them to XHTML. We inserted a processor into the processing chain that XMLlifies the message as it passes the processor, et voilĂ !


Ali Shemirani said...

This product has so many problem in Memory leak, you can test it with loading a document for 1000 times !! some properties keep their old value after DOM tree changed like OuterHTML

Jan Willem said...

I am not sure I understand your comment.

Ali Shemirani said...

please try this and look at the result.
Load a html document 1000 times and look at the memory. every time you load the document the memory increases and it doesn't free up memory.
Calling GC.Collect is not useful too

Jan Willem said...

You should probably post this issue to the html agility issues site: