Tuesday, August 01, 2006

XMLSearch, XPath, and XML namespaces in ColdFusion

Sample code used in this posting is ColdFusion-specific, but the XPath syntax will likely apply to many XPath implementations in other languages.
I was looking a specific set of elements in a SOAP response and kept getting an empty array returned from XMLSearch. Turns out this was related to how the namespaces in the response were defined and/or assigned to certain elements. (see this Talking Tree posting)
The information in the Talking Tree posting helped me identify my problem, but in my case, it wasn't about noname namespaces. In fact, what if you want to find all elements of a given name and don't care about the namespace or even where it lies in the hierarchy? In the example below, I want to quickly retrieve all of the Response elements. Note that one Response element is a child of Other, while the remaining are direct descendants of ResponseList.

<ResponseList xmlns="urn:shama.lama.dingdong.net">
<Response>
<ns1:success xmlns:ns1="urn:core.shama.lama.dingdong.net">true</ns1:success>
<baseRef internalId="1234" xmlns:ns2="urn:core.shama.lama.dingdong.net"/>
</Response>
<Response>
<ns3:success xmlns:ns3="urn:core.shama.lama.dingdong.net">false</ns3:success>
<ns3:statusDetail type="ERROR">
<ns3:code>USER_ERROR</ns3:code>
<ns3:message>That record does not exist.</ns3:message>
</ns3:statusDetail>
</ns3:status>
<baseRef internalId="4421" xmlns:ns4="urn:core.shama.lama.dingdong.net"/>
</Response>
<Other>
<Response>
<ns3:success xmlns:ns3="urn:core.shama.lama.dingdong.net">false</ns3:success>
<ns3:statusDetail type="ERROR">
<ns3:code>RECORDNOTFOUND_ERROR</ns3:code>
<ns3:message>That record does not exist.</ns3:message>
</ns3:statusDetail>
</ns3:status>
</Response>
<warning>Import timed out briefly and process was restarted. No further errors were reported.</warning>
</Other>
</ResponseList>

Now if it weren't for the namespaces, I could simply use the following:
<cfset MyArray = XMLSearch(MyXMLDoc, "//Response")> 

...but in this case that would return an empty array.

Ok, but what about the noname namespace syntax:
<cfset MyArray = XMLSearch(MyXMLDoc, "//:Response")> 

That's fine if there's an actual noname namespace assigned to that element, but in this case there isn't one assigned.

After trying countless variations of XPath syntax, I still couldn't get anything other than a blank array returned. I became desperate and quickly wrote a function that strips all of the namespace definitions and labels out of the XML, then searched on the results of that. But I felt this was rather convoluted and added too much overhead. Surely there was a better way. I kept searching, and lo and behold, came across the local-name function. If you want to ignore namespaces and hierarchical context completely, you can search by the local name of the element:
<cfset MyArray = XMLSearch(MyXMLDoc, "//*[local-name()='Response']" 

And now you have your array containing the three Response elements.

March 11, 2008 UPDATE: Ryan commented that he had tried the syntax below with similar success. I have not tried this myself, but give it a shot:
<cfset MyArray = XMLSearch(MyXMLDoc, "//*:Response")>

11 comments:

Anonymous said...

Excellent solution.

Colin said...

Hey,

I too came to this solution after a lot of head scratching but now my source xml is getting quite large i'm finding XmlSearch really slow , have you experienced any performance problems?

Jeremy said...

Colin, actually I have run into that. Perhaps there are XML parsing performance improvements between MX 6 and 7 (we only just started to use 7 in production in the last few months where I work), but my experience with extremely large xml docs in 6 was similar to what you describe. Unfortunately I'm not aware of an elegant solution. I'm sure doing searches for explicit paths (as opposed to wildcard searches) would help quite a bit, but sometimes that can hurt the extensibility of the code.

Anonymous said...

Great stuff - i also thought namespace stripping was the answer, but you've opened my eyes to a more elegant solution. Thank you!

Ryan said...

We also had this problem, we are on CF8 now, so it may be a little different, but the cleanest syntax we found was a namespace of '*'.

<cfset MyArray = XMLSearch(MyXMLDoc, "//*:Response")%gt;

Jeremy Q. Afterglide said...

Thanks, Ryan! We haven't moved to 8 yet, but I will definitely keep that in mind.

Rob said...

Just in case someone else comes across this ... the solution above did not work for me in CF8. (I did not try it in any other version.) Instead, I had to specify no namespace identifier.

For example, getting Google Picasa XML and performing an XPath search for "entry": "/:feed/:entry".

To get the "media" namespace and the group element (media:group), use "/:feed/:entry/media:group"

Andrey said...

Thanks. It saved me time. Andrey

marchinram said...

Lifesaver, thanks man!

Amir said...

Namespace issue caught me out today, thanks for the solution, worked great!

sam said...

thanks for the tip NS issue was boggling my mind all morning