How to use SimpleXML parsing XML data with namespace and CDATA
There are many social network websites (e.g. YouTube) and big brand websites(e.g. Amazon, Yahoo) use REST or SOAP or even RSS web services to provide the enormous data for developers. Using these data, we can build a website virtually with no own database at all, all the information are come from these big sites. The nice thing is that, virtually all of the data returned for your web service client request are a XML data stream. So, we can use the nice PHP5 built in SimpleXML object to parsing the XML data stream, and format the representation of the data according our own website needs. However, the XML data format vary from site to site, some of the XML data returned from the sever, includes namespace and CDATA; if this is the case, you need find a way to access a sub item in the XML data use the same ” -> ” syntax.
In the case of CDATA, the SimpleXML constructor has a optional parameter, which allow you to merge CDATA as text nodes. Just pass two additional arguments, a string constant ‘SimpleXMLElement’ and LIBXML_NOCDATA (A PHP version >= 5.1.0 is required):
$xml = simplexml_load_string($xml_txt, ‘SimpleXMLElement’, LIBXML_NOCDATA);
if the $xml_txt is:
<root>
<folder ID=”65″ active=”1″ permission=”1″><![CDATA[aaaa]]></folder>
<folder ID=”65″ active=”1″ permission=”1″><![CDATA[bbbb]]></folder>
</root>
Then, echo $xml -> folder[0] will get:
aaaa
In the case the XML data includes ,namespace, you can just use the SimpleXMLElement->children() method to get the nodes in the media namespace; Just pass the value of the namespace as a argument of the functioin. For example, a typical XML response from YouTube, maybe as following, say the request url is $feedURL.
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'
xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
xmlns:gml='http://www.opengis.net/gml' xmlns:georss='http://www.georss.org/georss'
xmlns:media='http://search.yahoo.com/mrss/'
xmlns:yt='http://gdata.youtube.com/schemas/2007'
xmlns:gd='http://schemas.google.com/g/2005'>
<id>http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed</id>
<updated>2008-03-06T14:43:27.000-08:00</updated>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://gdata.youtube.com/schemas/2007#video'/>
<title type='text'>Most Viewed</title>
<logo>http://www.youtube.com/img/pic_youtubelogo_123x63.gif</logo>
<link rel='alternate' type='text/html' href='http://www.youtube.com/browse?s=mp'/>
<link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed'/>
<link rel='self' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed
?start-index=1&max-results=5'/>
<link rel='next' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed
?start-index=6&max-results=5'/>
<author>
<name>YouTube</name>
<uri>http://www.youtube.com/</uri>
</author>
<generator version='beta' uri='http://gdata.youtube.com/'>
YouTube data API</generator>
<openSearch:totalResults>94</openSearch:totalResults>
<openSearch:startIndex>1</openSearch:startIndex>
<openSearch:itemsPerPage>5</openSearch:itemsPerPage>
<entry>
<id>http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg</id>
<published>2006-04-06T14:30:53.000-07:00</published>
<updated>2008-03-12T00:22:25.000-07:00</updated>
<category scheme='http://gdata.youtube.com/schemas/2007/keywords.cat'
term='Dancing'/>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://gdata.youtube.com/schemas/2007#video'/>
<category scheme='http://gdata.youtube.com/schemas/2007/keywords.cat'
term='comedy'/>
<category scheme='http://gdata.youtube.com/schemas/2007/categories.cat'
term='Comedy' label='Comedy'/>
<title type='text'>Evolution of Dance</title>
<content type='text'>The funniest 6 minutes you will ever see!
Remember how many of these you have done!
Judson Laipply is dancing -
http://www.evolutionofdance.com -
for more info including song list!</content>
<link rel='alternate' type='text/html'
href='http://www.youtube.com/watch?v=dMH0bHeiRNg'/>
<link rel='http://gdata.youtube.com/schemas/2007#video.responses'
type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg/responses'/>
<link rel='http://gdata.youtube.com/schemas/2007#video.related'
type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg/related'/>
<link rel='self' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed/dMH0bHeiRNg'/>
<author>
<name>judsonlaipply</name>
<uri>http://gdata.youtube.com/feeds/api/users/judsonlaipply</uri>
</author>
<media:group>
<media:title type='plain'>Evolution of Dance</media:title>
<media:description type='plain'>The funniest 6 minutes you will ever see!
Remember how many of these you have done!
Judson Laipply is dancing -
http://www.evolutionofdance.com -
for more info including song list!</media:description>
<media:keywords>comedy, Dancing</media:keywords>
<yt:duration seconds='360'/>
<media:category label='Comedy'
scheme='http://gdata.youtube.com/schemas/2007/categories.cat'>Comedy
</media:category>
<media:content
url='http://www.youtube.com/v/dMH0bHeiRNg' type='application/x-shockwave-flash'
medium='video' isDefault='true' expression='full' duration='360' yt:format='5'/>
<media:content
url='rtsp://rtsp2.youtube.com/ChoLENy73wIaEQnYRKJ3bPTBdBMYDSANFEgGDA==
/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='360'
yt:format='1'/>
<media:content
url='rtsp://rtsp2.youtube.com/ChoLENy73wIaEQnYRKJ3bPTBdBMYESARFEgGDA==
/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='360'
yt:format='6'/>
<media:player url='http://www.youtube.com/watch?v=dMH0bHeiRNg'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/2.jpg' height='97' width='130'
time='00:03:00'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/1.jpg' height='97' width='130'
time='00:01:30'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/3.jpg' height='97' width='130'
time='00:04:30'/>
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/0.jpg' height='240' width='320'
time='00:03:00'/>
</media:group>
<yt:statistics viewCount='78060679' favoriteCount='400468'/>
<gd:rating min='1' max='5' numRaters='276123' average='4.65'/>
<gd:comments>
<gd:feedLink href='http://gdata.youtube.com/feeds/api/videos/dMH0bHeiRNg
/comments' countHint='124130'/>
</gd:comments>
</entry>
<entry>
<id>http://gdata.youtube.com/feeds/api/videos/cQ25-glGRzI</id>
<published>2007-02-27T15:08:01.000-08:00</published>
<updated>2008-03-12T00:46:25.000-07:00</updated>
...
</entry>
</feed>
Above, the xmlns:media='http://search.yahoo.com/mrss/' tells you what the value of the namespace media is. So after doing below:
$sxml = simplexml_load_file($feedURL);
foreach ($sxml->entry as $entry) {
$media = $entry->children('http://search.yahoo.com/mrss/');
...
then, you can access the first thumbnail url in the media group as this:
$attrs = $media->group->thumbnail[0]->attributes();
$img = $attrs['url']
感谢您阅读本文章。您现在可以 Read Comment (1)
日志信息
本日志发布于 4月 19, 2008 8:36 下午 发布在 php 标记为:php,Web Service,XML你可以跟进任何对此文章的回复通过 Comments Feed. 你可以 留一个评论 ,或 一个 Trackback .
前一日志 关于生活的哲学 »
下一日志 Curl better than simplexml_load_file when web screen scrapping »
- Review, php|architect’s Guide to Programming with Zend Framework
- Use Writer, the online Darkroom version!
- Learn to Design Web Themes and Templates with New Wiki
- 国谷哥发布: 四川震区急需 260 万顶帐篷
- WordPress problem: Url encoding on Tag’s slug.
- Curl better than simplexml_load_file when web screen scrapping
- How to use SimpleXML parsing XML data with namespace and CDATA
- 关于生活的哲学
- Zend PHP5 认证考试研究之12: 附三
- Zend PHP5 认证考试研究之11: 附二














7月 26, 2008 7:42 下午
You have built a good websitea