Validating xml without namespace
I think the flaw here is that HTML is a Chomsky Type 2 grammar (context free grammar) and Reg Ex is a Chomsky Type 3 grammar (regular grammar).Since a Type 2 grammar is fundamentally more complex than a Type 3 grammar (see the Chomsky hierarchy), you can't possibly make this work. Transformer Factory Impl"); You may want to read Section2.4.1, “JAXP” for more details - in particular if you are using Java 1.4 or later.It allows an XML comparison to be made that ignores differences in the values of text and attribute nodes, for example when comparing a skeleton or outline piece of XML to some generated XML.Maybe if you give examples of the "(X)HTML syntax errors implemented in real world user agents" you're referring to, I'll understand what you're getting at [email protected] Mihalcin is exactly right.Most extant regex engines are more powerful than Chomsky Type 3 grammars (eg non-greedy matching, backrefs).Obviously we could use a DTD or a schema to validate the message output, but this approach wouldn't allow us to distinguish between valid XML with correct content (e.g. The failure message indicates both what the difference is and the XPath locations of the nodes that were being compared: .
Even Jon Skeet cannot parse HTML using regular expressions.HTML and regex go together like love, marriage, and ritual infanticide. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty.If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regex is not a tool that can be used to correctly parse HTML.