How to use SimpleXML parsing XML data with namespace and CDATA

There are many social network websites (e.g. YouTube) and big brand websites(e.g. Amazon, Yahoo) use REST or SOAP or even RSS web services to provide the enormous data for developers. Using these data, we can build a website virtually with no own database at all, all the information are come from these big sites. The nice thing is that, virtually all of the data returned for your web service client request are a XML data stream. So, we can use the nice PHP5 built in SimpleXML object to parsing the XML data stream, and format the representation of the data according our own website needs. However, the XML data format vary from site to site, some of the XML data returned from the sever, includes namespace and CDATA; if this is the case, you need find a way to access a sub item in the XML data use the same ” -> ” syntax.

In the case of CDATA, the SimpleXML constructor has a optional parameter, which allow you to merge CDATA as text nodes. Just pass two additional arguments, a string constant ‘SimpleXMLElement’ and LIBXML_NOCDATA (A PHP version >= 5.1.0 is required):

$xml = simplexml_load_string($xml_txt, ‘SimpleXMLElement’, LIBXML_NOCDATA);

if the $xml_txt is:

<root>
<folder ID=”65″ active=”1″ permission=”1″><![CDATA[aaaa]]></folder>
<folder ID=”65″ active=”1″ permission=”1″><![CDATA[bbbb]]></folder>
</root>

Then, echo $xml -> folder[0] will get:

aaaa

In the case the XML data includes ,namespace, you can just use the SimpleXMLElement->children() method to get the nodes in the media namespace; Just pass the value of the namespace as a argument of the functioin. For example, a typical XML response from YouTube, maybe as following, say the request url is $feedURL.

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'
xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
xmlns:gml='http://www.opengis.net/gml' xmlns:georss='http://www.georss.org/georss'
xmlns:media='http://search.yahoo.com/mrss/'
xmlns:yt='http://gdata.youtube.com/schemas/2007'
xmlns:gd='http://schemas.google.com/g/2005'>
<id>http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed</id>
<updated>2008-03-06T14:43:27.000-08:00</updated>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://gdata.youtube.com/schemas/2007#video'/>
<title type='text'>Most Viewed</title>
<logo>http://www.youtube.com/img/pic_youtubelogo_123x63.gif</logo>
<link rel='alternate' type='text/html' href='http://www.youtube.com/browse?s=mp'/>
<link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed'/>
<link rel='self' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed
?start-index=1&max-results=5'/>
<link rel='next' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed
?start-index=6&max-results=5'/>
<author>
<name>YouTube</name>
<uri>http://www.youtube.com/</uri>
</author>
<generator version='beta' uri='http://gdata.youtube.com/'>
YouTube data API</generator>
<openSearch:totalResults>94</openSearch:totalResults>
<openSearch:startIndex>1</openSearch:startIndex>
<openSearch:itemsPerPage>5</openSearch:itemsPerPage>
<entry>
<id>http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg</id>
<published>2006-04-06T14:30:53.000-07:00</published>
<updated>2008-03-12T00:22:25.000-07:00</updated>
<category scheme='http://gdata.youtube.com/schemas/2007/keywords.cat'
term='Dancing'/>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://gdata.youtube.com/schemas/2007#video'/>
<category scheme='http://gdata.youtube.com/schemas/2007/keywords.cat'
term='comedy'/>
<category scheme='http://gdata.youtube.com/schemas/2007/categories.cat'
term='Comedy' label='Comedy'/>
<title type='text'>Evolution of Dance</title>
<content type='text'>The funniest 6 minutes you will ever see!
Remember how many of these you have done!
Judson Laipply is dancing -
http://www.evolutionofdance.com -
for more info including song list!</content>
<link rel='alternate' type='text/html'
href='http://www.youtube.com/watch?v=dMH0bHeiRNg'/>
<link rel='http://gdata.youtube.com/schemas/2007#video.responses'
type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg/responses'/>
<link rel='http://gdata.youtube.com/schemas/2007#video.related'
type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg/related'/>
<link rel='self' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed/dMH0bHeiRNg'/>
<author>
<name>judsonlaipply</name>
<uri>http://gdata.youtube.com/feeds/api/users/judsonlaipply</uri>
</author>
<media:group>
<media:title type='plain'>Evolution of Dance</media:title>
<media:description type='plain'>The funniest 6 minutes you will ever see!
Remember how many of these you have done!
Judson Laipply is dancing -
http://www.evolutionofdance.com -
for more info including song list!</media:description>
<media:keywords>comedy, Dancing</media:keywords>
<yt:duration seconds='360'/>
<media:category label='Comedy'
scheme='http://gdata.youtube.com/schemas/2007/categories.cat'>Comedy
</media:category>
<media:content
url='http://www.youtube.com/v/dMH0bHeiRNg' type='application/x-shockwave-flash'
medium='video' isDefault='true' expression='full' duration='360' yt:format='5'/>
<media:content
url='rtsp://rtsp2.youtube.com/ChoLENy73wIaEQnYRKJ3bPTBdBMYDSANFEgGDA==
/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='360'
yt:format='1'/>
<media:content
url='rtsp://rtsp2.youtube.com/ChoLENy73wIaEQnYRKJ3bPTBdBMYESARFEgGDA==
/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='360'
yt:format='6'/>
<media:player url='http://www.youtube.com/watch?v=dMH0bHeiRNg'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/2.jpg' height='97' width='130'
time='00:03:00'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/1.jpg' height='97' width='130'
time='00:01:30'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/3.jpg' height='97' width='130'
time='00:04:30'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/0.jpg' height='240' width='320'
time='00:03:00'/>
</media:group>
<yt:statistics viewCount='78060679' favoriteCount='400468'/>
<gd:rating min='1' max='5' numRaters='276123' average='4.65'/>
<gd:comments>
<gd:feedLink href='http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg
/comments' countHint='124130'/>
</gd:comments>
</entry>
<entry>
<id>http://gdata.youtube.com/feeds/api/videos/cQ25-glGRzI</id>
<published>2007-02-27T15:08:01.000-08:00</published>
<updated>2008-03-12T00:46:25.000-07:00</updated>
...
</entry>
</feed>

Above, the xmlns:media='http://search.yahoo.com/mrss/' tells you what the value of the namespace media is. So after doing below:

$sxml = simplexml_load_file($feedURL);
foreach ($sxml->entry as $entry) {
$media = $entry->children('http://search.yahoo.com/mrss/');
...

then, you can access the first thumbnail url in the media group as this:


$attrs = $media->group->thumbnail[0]->attributes();


$img = $attrs['url']



Thank you for reading this post. You can now Read Comment (1)

Post Info

This entry was posted on   19 April 2008   8:36 PM   and is filed under   php   tagged with:,,

You can follow any responses to this entry through the Comments Feed. You can Leave A Comment , or A Trackback .



Previous Post: The philosophy of life »
Next Post: Curl better than simplexml_load_file when web screen scrapping »

Read More

Related Reading:

One Response to “ How to use SimpleXML parsing XML data with namespace and CDATA



Leave a Reply

Note: Any comments are permitted only because the site owner is letting you post, and any comments will be removed for any reason at the absolute discretion of the site owner.