How to use SimpleXML parsing XML data with namespace and CDATA

There are many social network websites (e.g. YouTube) and big brand websites(e.g. Amazon, Yahoo) use REST or SOAP or even RSS web services to provide the enormous data for developers. Using these data, we can build a website virtually with no own database at all, all the information are come from these big sites. The nice thing is that, virtually all of the data returned for your web service client request are a XML data stream. So, we can use the nice PHP5 built in SimpleXML object to parsing the XML data stream, and format the representation of the data according our own website needs. However, the XML data format vary from site to site, some of the XML data returned from the sever, includes namespace and CDATA; if this is the case, you need find a way to access a sub item in the XML data use the same ” -> ” syntax.

In the case of CDATA, the SimpleXML constructor has a optional parameter, which allow you to merge CDATA as text nodes. Just pass two additional arguments, a string constant ‘SimpleXMLElement’ and LIBXML_NOCDATA (A PHP version >= 5.1.0 is required):

$xml = simplexml_load_string($xml_txt, 'SimpleXMLElement', LIBXML_NOCDATA);

if the $xml_txt is:

<root>
<folder ID="65" active="1" permission="1"><![CDATA[aaaa]]></folder>
<folder ID="65" active="1" permission="1"><![CDATA[bbbb]]></folder>
</root>

Then, echo $xml -> folder[0] will get:

aaaa

In the case the XML data includes ,namespace, you can just use the SimpleXMLElement->children() method to get the nodes in the media namespace; Just pass the value of the namespace as a argument of the functioin. For example, a typical XML response from YouTube, maybe as following, say the request url is $feedURL.

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'
xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
xmlns:gml='http://www.opengis.net/gml' xmlns:georss='http://www.georss.org/georss'
xmlns:media='http://search.yahoo.com/mrss/'
xmlns:yt='http://gdata.youtube.com/schemas/2007'
xmlns:gd='http://schemas.google.com/g/2005'>
<id>http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed</id>
<updated>2008-03-06T14:43:27.000-08:00</updated>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://gdata.youtube.com/schemas/2007#video'/>
<title type='text'>Most Viewed</title>
<logo>http://www.youtube.com/img/pic_youtubelogo_123x63.gif</logo>
<link rel='alternate' type='text/html' href='http://www.youtube.com/browse?s=mp'/>
<link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed'/>
<link rel='self' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed
?start-index=1&max-results=5'/>
<link rel='next' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed
?start-index=6&max-results=5'/>
<author>
<name>YouTube</name>
<uri>http://www.youtube.com/</uri>
</author>
<generator version='beta' uri='http://gdata.youtube.com/'>
YouTube data API</generator>
<openSearch:totalResults>94</openSearch:totalResults>
<openSearch:startIndex>1</openSearch:startIndex>
<openSearch:itemsPerPage>5</openSearch:itemsPerPage>
<entry>
<id>http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg</id>
<published>2006-04-06T14:30:53.000-07:00</published>
<updated>2008-03-12T00:22:25.000-07:00</updated>
<category scheme='http://gdata.youtube.com/schemas/2007/keywords.cat'
term='Dancing'/>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://gdata.youtube.com/schemas/2007#video'/>
<category scheme='http://gdata.youtube.com/schemas/2007/keywords.cat'
term='comedy'/>
<category scheme='http://gdata.youtube.com/schemas/2007/categories.cat'
term='Comedy' label='Comedy'/>
<title type='text'>Evolution of Dance</title>
<content type='text'>The funniest 6 minutes you will ever see!
Remember how many of these you have done!
Judson Laipply is dancing -
http://www.evolutionofdance.com -
for more info including song list!</content>
<link rel='alternate' type='text/html'
href='http://www.youtube.com/watch?v=dMH0bHeiRNg'/>
<link rel='http://gdata.youtube.com/schemas/2007#video.responses'
type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg/responses'/>
<link rel='http://gdata.youtube.com/schemas/2007#video.related'
type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg/related'/>
<link rel='self' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed/dMH0bHeiRNg'/>
<author>
<name>judsonlaipply</name>
<uri>http://gdata.youtube.com/feeds/api/users/judsonlaipply</uri>
</author>
<media:group>
<media:title type='plain'>Evolution of Dance</media:title>
<media:description type='plain'>The funniest 6 minutes you will ever see!
Remember how many of these you have done!
Judson Laipply is dancing -
http://www.evolutionofdance.com -
for more info including song list!</media:description>
<media:keywords>comedy, Dancing</media:keywords>
<yt:duration seconds='360'/>
<media:category label='Comedy'
scheme='http://gdata.youtube.com/schemas/2007/categories.cat'>Comedy
</media:category>
<media:content
url='http://www.youtube.com/v/dMH0bHeiRNg' type='application/x-shockwave-flash'
medium='video' isDefault='true' expression='full' duration='360' yt:format='5'/>
<media:content
url='rtsp://rtsp2.youtube.com/ChoLENy73wIaEQnYRKJ3bPTBdBMYDSANFEgGDA==
/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='360'
yt:format='1'/>
<media:content
url='rtsp://rtsp2.youtube.com/ChoLENy73wIaEQnYRKJ3bPTBdBMYESARFEgGDA==
/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='360'
yt:format='6'/>
<media:player url='http://www.youtube.com/watch?v=dMH0bHeiRNg'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/2.jpg' height='97' width='130'
time='00:03:00'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/1.jpg' height='97' width='130'
time='00:01:30'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/3.jpg' height='97' width='130'
time='00:04:30'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/0.jpg' height='240' width='320'
time='00:03:00'/>
</media:group>
<yt:statistics viewCount='78060679' favoriteCount='400468'/>
<gd:rating min='1' max='5' numRaters='276123' average='4.65'/>
<gd:comments>
<gd:feedLink href='http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg
/comments' countHint='124130'/>
</gd:comments>
</entry>
<entry>
<id>http://gdata.youtube.com/feeds/api/videos/cQ25-glGRzI</id>
<published>2007-02-27T15:08:01.000-08:00</published>
<updated>2008-03-12T00:46:25.000-07:00</updated>
...
</entry>
</feed>

Above, the xmlns:media='http://search.yahoo.com/mrss/' tells you what the value of the namespace media is. So after doing below:

$sxml = simplexml_load_file($feedURL);
foreach ($sxml->entry as $entry) {
$media = $entry->children('http://search.yahoo.com/mrss/');
...

then, you can access the first thumbnail url in the media group as this:


$attrs = $media->group->thumbnail[0]->attributes();


$img = $attrs['url']

6 Comments - Leave a comment
  1. jeroen says:

    You have built a good websitea

  2. clonks says:

    Thanks! GREAT!

  3. jcvangent says:

    To speed things up (since I only want the description),if I do an:
    $entry->children(‘http://search.yahoo.com/mrss/’)->group->description;

    It should give me the contents of the description right? Somehow it end up being an empty string with not the description, what am I doing wrong here, some help would be gladly appreciated ^_^

  4. jcvangent says:

    even when doing it like the example above it gives me an empty string:

    $media = $item->children(‘http://search.yahoo.com/mrss/’);

    $videoDescription = $media->group->description;

    echo “videoDescription: $videoDescription\n”;

    any idea what could be going wrong? Thanks again ^_^

  5. David Adam says:

    Are you sure the live feed you got is in correct format. Here is the code I run on the above example:
    $feedURL = ‘testfeed.xml’;
    $sxml = simplexml_load_file($feedURL);
    foreach($sxml->entry as $entry){
    $media = $entry->children(‘http://search.yahoo.com/mrss/’);

    $mediaDescription = $media->group->description;
    echo $mediaDescription;

    }

    And, I get:
    The funniest 6 minutes you will ever see! Remember how many of these you have done!Judson Laipply is dancing – http://www.evolutionofdance.com – for more info including song list!

    So, for me, it seems fine.

    P.S. Remember clean up the format of the testfeed.xml for I just copy the content in the post above.

  6. jcvangent says:

    What is the url you are using for getting the xml feed, currently I’m using ‘http://gdata.youtube.com/feeds/base/videos/-/dance’ for example as a feed url.
    If I look at the source (in firefox) it will give me very nasty and ugly code back instead of a normal clean xml. So maybe that is the problem, that I’m not using the correct search url to search for videos. Which will result in a none correct xml file, causing me to not get that code part to work….

Leave a Reply

Your email address will not be published. Required fields are marked *

*