Parsing html

jensen

Active Member
Messages
1,168
Reaction score
22
Points
38
How do we parse the raw html page to retrieve the data so that we can display it on our website? Am trying to get a image that can show the number of current active members on this forum on my website.
 

The_Magistrate

New Member
Messages
1,118
Reaction score
0
Points
0
Really, the easiest way to do it would be to ask if there were an XML file which might have that data. I don't know if one is available, but Corey or another admin might be able to create one.

Otherwise, the way I've done it is using PHP. Below is a little example of some code which you would use to parse through a webpage to find a specific line or piece of a line:

This will allow you to read information from more than one line in the HTML source....

Code:
<?
// Read the webpage into the PHP script and store it as an array of strings
$theWebpage = file(##A VALID URL##);

// Read each line of the array
foreach ($theContents as $key => $value)
{
        // If the string you want to find is in this line...
	if(strpos($value, "##START OF THE STRING TO FIND##") !== FALSE)
		$start = $key;  // This is the line we want to start at

        // If the start was already found, and the string you want to end on is in this line
	if(isset($start) && strpos($value, "##END OF THE STRING TO FIND##") !== FALSE)
	{
                // This is the line we want to end at
		$end = $key - 1;
		break;  // Stop the loop.
	}
}

// Read all the lines in the webpage between the beginning and the end and store it.
for($i = $start; $i <= $end; $i++)
	$theExtractedInfo .= $theWebpage[$i];
?>
 

jensen

Active Member
Messages
1,168
Reaction score
22
Points
38
that's a great help. you must be coding regularly. will fit it in and learn. Thanks.
 

Bryon

I Fix Things
Messages
8,149
Reaction score
101
Points
48
I have a suggestion with this.

You should have a cron script run every few minutes that will get the data and save it (With a file or with MySQL). Loading a page over HTTP can take a few seconds to retrieve the entire page, so.. If you had an image it's load time would be a few seconds if it was directly taking the data from a site.
 

jensen

Active Member
Messages
1,168
Reaction score
22
Points
38
That's an important consideration that I overlooked. The loading time.
Won't the cron also need to load the page over HTML? Anyway, am not sure how to command and run the cron. Also with the concerns about cron taking up the server bandwidth, I'll take it step by step.

But am always ready to learn from the wise programmers at x10. You're the best.
 

Bryon

I Fix Things
Messages
8,149
Reaction score
101
Points
48
jensen said:
That's an important consideration that I overlooked. The loading time.
Won't the cron also need to load the page over HTML? Anyway, am not sure how to command and run the cron. Also with the concerns about cron taking up the server bandwidth, I'll take it step by step.

But am always ready to learn from the wise programmers at x10. You're the best.

The cron job wouldn't be hard to setup..

Once you have the script that loads and parses the HTML, have it update a mysql db table with whatever information you'd like to store (Users online, for example). Then for the cron job, you could use the command:

Code:
php -q /home/[username]/public_html/path/to/cron/script.php

I would say having the cron run every 5 minutes would be decent.. The script itself wouldn't really impact the server much. (Regarding CPU/MySQL usage)
 
Top