Collecting Information Using XMLReader in PHP

cased · Sep 8, 2010

In a browser game I play, the developers have allowed access to a XML file containing town data. I am trying to create a script to basically search and compile that data in various formats that will benefit players. It's been a pain in the butt, but I finally got it to print out town names and player names, however it's not 100% correct. It's assigning towns to the wrong players. I'd appreciate any help with this Here's the code:

Code:

<?php
 
 
$ch = curl_init("[URL]http://uk1.illyriad.co.uk/data_downloads/datafile_towns.xml[/URL]");
$fp = fopen("towndata.xml", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
  $reader = new XMLReader();
  $reader->open("towndata.xml");
  echo("<HTML><BODY>");
 
 
 
   while ($reader->read()) {
 
   FindElement($reader,"mapx");
   $mapx = $reader->value;
   FindElement($reader,"mapy");
   $mapy = $reader->value;
   FindElement($reader,"terraintype");
   $terraintype = $reader->value;
   FindElement($reader,"playername");
   $playername = $reader->value;
   FindElement($reader,"playerrace");
   $playerrace = $reader->value;
   FindElement($reader,"alliancename");
   $alliancename = $reader->value;
   FindElement($reader,"allianceticker");
   $allianceticker = $reader->value;
   FindElement($reader,"alliancetaxrate");
   $alliancetaxrate = $reader->value;
 
   FindElement($reader,"townname");
   $townname = $reader->value;
 
   FindElement($reader,"population");
   $population = $reader->value;
   FindElement($reader,"iscapitalcity");
   $iscapitalcity = $reader->value;
   FindElement($reader,"isalliancecapitalcity");
   $isalliancecapitalcity = $reader->value;
 
   if($allianceticker == "FDU")
   {
    echo "Town Name:  ", $townname, "<BR>";
    echo "Player Name:  ", $playername, "<BR>";
 
   }
 
   }
 
   echo("</BODY></HTML>");
 
 
function FindElement(&$readerobject, $name)
{
    while($readerobject->name != $name)
 {
  if(!$readerobject->read()) break;
 }
 if($readerobject->name == $name) 
 {$readerobject->read();
  return 1;
 }
 else return 0;
}   
?>

FindElement is a function I wrote to skip to the appropriate tag that contains the information I'm seeking. I'm not familiar enough with XMLReader and had a heck of a time finding any examples or tutorials on how to find it. The problem may be there.

misson · Sep 8, 2010

Examine the raw data and you'll note not all players have alliances. When your script comes across one of these, it gets some of the data for that user (such as the player name) and skips forward to the next town that has an alliance, getting the rest of the data (such as town name) for this other player.

To fix this, use something other than XMLReader, which isn't well suited for data extraction (it's better used to walk an XML document). Take a look at the XML classes that support XPath queries, such as SimpleXML and DOM. For example, the XPath "//allianceticker[text()="FDU"]/ancestor::town" will select all <town> elements for towns in the FDU alliance.

PHP:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <title></title>
  <style type="text/css">
     table thead tr, table tfoot tr {
	   background-color: #AAC;
       color: black;
     }
     table tbody tr:nth-child(even) {
         background-color: #CCC;
         border-collapse: collapse;
         border-spacing: 0;
     }
     table tbody tr:nth-child(even) td {
		 border: 1px solid black;
		 border-left-width: 0px;
		 border-right-width: 0px;
     }
     table tbody tr:nth-child(even) td:first-child {
		 border-left-width: 1px;
     }
     table tbody tr:nth-child(even) td:last-child {
		 border-right-width: 1px;
     }
  </style>
</head>
<body>
<?php
 
$datafile = 'data/towndata.xml';

set_error_handler('log_error');

$ch = curl_init("http://uk1.illyriad.co.uk/data_downloads/datafile_towns.xml");
$fp = @fopen($datafile, "w");
if ($fp) {
	curl_setopt_array($ch, 
					  array(CURLOPT_RETURNTRANSFER => TRUE,
							CURLOPT_FILE   => $fp,
							CURLOPT_HEADER => 0));
	curl_exec($ch);
	curl_close($ch);
	fclose($fp);
	$data = SimpleXML_load_file($datafile);
	$FDUTowns = $data->xpath('//allianceticker[text()="FDU"]/ancestor::town');
	$fields=array('townname' => 'Town', 'playername' => 'Player', 'allianceticker' => 'Alliance');
  ?>
  <table>
    <thead>
      <tr><th><?php echo implode('</th><th>', $fields);?></th></tr>
    </thead>
    <tfoot>
      <tr><th><?php echo implode('</th><th>', $fields);?></th></tr>
    </tfoot>
    <tbody>
      <?php foreach ($FDUTowns as $town): ?>
        <tr>
          <?php foreach ($fields as $name => $label): ?>
            <td><?php $val = $town->xpath(".//$name/text()"); echo $val[0]; ?></td>
          <?php endforeach; ?>
        </tr>
      <?php endforeach; ?>
    </tbody>
  </table>
  <?php 
} else { // !$fp: couldn't open file 
  ?>
  <p class="error">Couldn't open a temporary file to store data. It's been logged, and we'll look into it. Chances are the problem is transitory. Please try again in a few hours and, if it still fails, try again tomorrow.</p>
  <?php 
} 

function log_error($errno, $errstr, string $errfile, int $errline, array $errcontext) {
  // ...
}
?>
</body>
</html>

essellar · Sep 9, 2010

Note that if you decide to opt for the DOM parser, the documentation in the PHP manual is horrid. As far as I have been able to determine, the DOM class handles everything that's in the W3C DOM Level 3, but even such staple properties as nodeName, nodeType and nodeValue are entirely undocumented. If you've ever used the standard DOM methods in JavaScript (or in another programming language) you should be able to do just about anything you want, but you'll need something other than the PHP manual for a property/method reference.

misson · Sep 9, 2010

One more thing: that XML file is rather sizable. If your script runs on X10 with any frequency, you will probably find your site suspended. Since the other site only updates the data every so often, you can cache the result of processing the data and use an If-Modified-Since or If-None-Match header to check whether the cached data has gone stale (though with the latter, you'll have to cache the ETag for the document, while with the former you can simply use the modification time of the cached data). You can set the If-Modified-Since header under curl using option CURLOPT_TIMEVALUE. If the resource hasn't been modified, curl simply won't return any data (excepting headers, if you request them).

cased · Sep 9, 2010

I actually discovered the problem last night. I had assumed every town had alliance tags. I tried SimpleXML first. It can't handle the large file. I was directed to XMLReader after that failure. Thanks lots for the advice on cacheing the data file. I'm not confident the page will be used that often, but better safe than sorry.

misson · Sep 10, 2010

Interesting. In the test script, SimpleXML was able to handle the data in under 30 s and using 1 MiB. Of course, I tested it on my development server, not X10.

If your script is timing out before completion, you could break it up into 3 scripts: download the data, extract the data you care about, and display the data (along with any other processing). The first two (which are also the tasks that can use caching) could run as cron jobs, the third (which doesn't need to be cached) would be accessed via browser.

cased · Sep 10, 2010

Mission,

More good advice. Thanks. Below is a post I submitted on a different forum. It is there that they told me the file I was using was too big for SimpleXML:

Code:
I'm trying to use the xml data provided by the GMs of a game I'm currently playing, but I'm having difficulty. I'd appreciate any help. Here's my code:

Code:

<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "[URL]http://uk1.illyriad.co.uk/data_downloads/datafile_towns.xml[/URL]"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($ch); curl_close($ch); $p = xml_parser_create(); xml_parse_into_struct($p, $output, $vals, $index); xml_parser_free($p); echo "Index array\n"; print_r($index); echo "\nVals array\n"; print_r($vals); ?>

It doesn't work though. Instead I get the following error:
Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 40 bytes) in /home/cased/public_html/IllyriadTownData.php on line 19
Can anyone help?

Just FYI, I have a working tool (bare bones) right now at ohiotech.elementfx.com/Illyriad.html

cased · Sep 13, 2010

Well it happened just like Mission said. My account was suspended for overusage. However I'm confused. I followed his advice. Here is my code. It seemed to work, so what did I do wrong?

Code:

if(is_readable("towndata.xml"))
{
$ch2 = curl_init("http://uk1.illyriad.co.uk/data_downloads/datafile_towns.xml");
curl_setopt($ch2, CURLOPT_TIMECONDITION, CURL_TIMECOND_IFMODSINCE);
curl_setopt($ch2, CURLOPT_TIMEVALUE, filemtime("towndata.xml"));
curl_exec($ch2);
$status = curl_getinfo($ch2, CURLINFO_HTTP_CODE);
}
else{
$status = 0;}

if($status != 304){		
$ch = curl_init("http://uk1.illyriad.co.uk/data_downloads/datafile_towns.xml");
$fp = fopen("towndata.xml", "w");

curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);

curl_exec($ch);
curl_close($ch);
fclose($fp);
}

misson · Sep 14, 2010

For one thing, you're sometimes fetching the file twice. If you want to test for a 304 response first, set CURLOPT_NOBODY to send a HEAD request the first time (then unset it and set CURLOPT_HTTPGET the second time). You can also simply make a single request, though a 304 response will clobber the file you save to, so save to a temporary file and copy it over to towndata.xml if the temporary file has any content.

The second thing is that even though this can reduce network usage, you still need to process much data. That will cause high CPU usage and suspensions. Caching search results probably won't be very helpful (though this is entirely dependent on usage patterns). The only thing that will help is if you can discard any of the data.

raccia · Oct 23, 2010

I'm using xmlreader to load an xml file like:
<head>
<element param="hello"/>
<element param="hello"/>
<element param="hello"/>
</head>

And all is OK!

But now i need to write many times the xml and append elements... I have a problem with the <head></head>!!!!!!!!!!!!

without <head></head> should be perfect, i could simply append using php functions:
<element param="hello"/>
<element param="hello"/>
<element param="hello"/>
<element param="hello"/>
<element param="hello"/>
...

But without <head></head> xmlreader give error at opening!!

How can i read using xmlreader an xml <head></head>???
or
How can i write an xml between the last element and the </head> ???
Please help

misson · Oct 24, 2010

Don't threadjack. Start your own thread rather than reviving old ones.

Collecting Information Using XMLReader in PHP

cased

New Member

misson

Community Paragon

essellar

Community Advocate

misson

Community Paragon

cased

New Member

misson

Community Paragon

cased

New Member

cased

New Member

misson

Community Paragon

raccia

New Member

misson

Community Paragon

Free Web Hosting

Our Community

Legal