PHP script not working (RSSgenr8.php)

Status
Not open for further replies.

deepwater

New Member
Messages
12
Reaction score
0
Points
0
cPanal: dwadmin
PKG: no-ads
http://deepwater.x10hosting.com

path to script: http://deepwater.x10hosting.com/rssgen/rssgenr8.php
(also tried it in the root)

sample test file to parse: http://deepwater.x10hosting.com/rss-test.html

i played around with permissions to no avail. the script works here (although, for whatever reason, it's not picking up the 'rss:item' tags right now): http://www.xmlhub.com/rssgenr8.php

PHP:
<?php
if ($pageurl) {
  parse_html($pageurl);
} else {
  show_form();
}

function show_form() {
  $server = getenv("SERVER_NAME");
  $request = getenv("REQUEST_URI");
?>
<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>RSSgenr8: HTML to RSS Converter - Generate an RSS feed from a web page</title>
<meta name="description" content="RSSgenr8 is a hosted HTML to RSS Scraper Tool which generates a RSS feed from a HTML web page">
</head>

<body>
   <table width="100%" style="border-collapse: collapse" bordercolor="#111111" cellpadding="0" cellspacing="0">
   <tr>
   <td><a href="http://www.xmlhub.com">
<img border="0" src="http://www.xmlhub.com/images/xh_red_2.gif" alt="xml hub" width="120" height="60"></a>
   </td>
   <td>
   <B>RSSgenr8: HTML to RSS Converter</B>
   </td></tr>
   </table>
   <p>This form takes your web page and turns it into RSS 0.92. 
   <br />RSSgenr8 is a hosted HTML to RSS Scraper Tool which dynamically generates a RSS feed from a HTML web page.
   <br />Changes to the web page are then automatically reflected in the RSS feed.</p>
   <p align="left">RSSgenr8 is based on 
   <a href="http://www.voidstar.com">RSSify from VoidStar.com</a> but is much 
   modified.
   <br />Acknowledgements also to Aaron Swartz who came up with the idea and <a href="http://logicerror.com/blogifyYourPage">the first implementation.</a></p>
   <ol>
   <li>Put &lt;span class="rss:item"> ... &lt;/span> round each item in your page.
   <br />In Blogger you'd do this by going to your template in blogger and changing 
<br /><b>&lt;$BlogItemBody$></b> to <b>&lt;span class="rss:item">&lt;$BlogItemBody$>&lt;/span></b>
<br />And then publish something to re-create the page with the new template 
<li>Then put the URL of your new and modified page into the form below.
<li>Check that what you get back looks like RSS.
<li>Now you can make a link to this file like &quot;http://www.xmlhub.com/rssgenr8.php?pageurl=your_web_page_url"
<li>Finally add a link to it on your web page, using something like the XML image below.
<li>(As a condition of use, we ask you to show a visible HTML link to www.xmlhub.com on your site.)
</ol>
<center>
<font size=1>
<img src="images/xml.gif" alt="This gif is freely copyable. Just right click, save" width="36" height="14">
<br />
Powered by <br /><a href="http://www.xmlhub.com/rssgenr8.php">RSSgenr8 at xmlhub.com</A>
</font>
</center>

   <form action="<? print 'http://' . $server . $request; ?>">
     The URL of your web page:
     <br /><input type="text" name="pageurl" size=50> Include a final "/" or a filename.
     <br /><input type="submit" value="Create RSS">
   </form>
   <p>If your web server runs PHP, please 
   <a href="http://www.xmlhub.com/download.htm">download</a>
   the source and run it on your own server.
   No configuration is needed - Just copy one file to the server.</p>
   <br /><b>Usage</b>: http://www.xmlhub.com/rssgenr8.php?pageurl=your_web_page_url
   <p><B>Notes:</B>
   <ul>
   <li>The channel title is taken from the web page title.
   <li>The channel description is taken from the meta description.
   <li>The item text is put in the description element.
   <li>The first line or the first 100 characters of html stripped description are put in the title element.
   <li>The first link in the description is put in the link element. If there isn't one, the web page url is used.
   <li>Relative paths in the link url are converted to absolute paths.
   <li>All tags except &lt;A> &lt;B> &lt;BR> &lt;BLOCKQUOTE> &lt;CENTER> &lt;DD> &lt;DL> &lt;DT> &lt;HR> &lt;I> &lt;IMG> &lt;LI> &lt;OL> &lt;P> &lt;PRE> &lt;U> &lt;UL> are stripped from the description.
   <li>Tabs, NewLines, etc, in the description are converted to a single space
   <li>A maximum of 25 items are included in the rss.
   <li>if you want more detail about RSS, take a look at the
   <a href="http://www.xmlhub.com/rssfaqs.htm">FAQs</a>.</ul>
   <center>
   <a href="/">Home</a>
   </center>
 </body>
</html>   
<?  
}

function parse_html($pageurl){
  $itemregexp = "%rss:item *\" *>(.+?)</span>%is";
  $allowable_tags = "<A><B><br /><br><BLOCKQUOTE><CENTER><DD><DL><DT><HR><I><IMG><LI>&nbsp;<OL><P><PRE><U><UL>";

  $pageurlparts = parse_url($pageurl);
  if ($pageurlparts[path] == "") $pageurl .= "/";

  if ($fp = @fopen($pageurl, "r")) {
    while (!feof($fp)) {
      $data .= fgets($fp, 128);
    }
    fclose($fp);
  }

//  print "<pre>";
//  print htmlentities($data);  

//  eregi("<title>(.*)</title>", $data, $title);
//  $channel_title = $title[1];

  $channel_title = "";
  if (preg_match('/<title>(.+?)<\/title>/i', $data, $regs) > 0) { $channel_title = $regs[1];
  }

  
  if (preg_match('/<meta .*description.*"(.+?)"/i', $data, $regs) > 0) { $channel_desc = $regs[1];
  }
  if ($channel_desc == "") $channel_desc = $pageurl;

  $match_count = preg_match_all($itemregexp, $data, $items);
  $match_count = ($match_count > 25) ? 25 : $match_count;
  
  header("Content-Type: text/xml");

  $output .= "<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?>\n";
  $output .= "<!-- generator=\"rssgenr8/0.92\" -->\n";
  $output .= "<!DOCTYPE rss PUBLIC \"-//W3C//ENTITIES Latin 1 for XHTML//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent\">\n";
  $output .= "<rss version=\"0.92\">\n";
  $output .= "  <channel>\n";
  $output .= "    <title>". htmlentities(strip_tags($channel_title)) ."</title>\n";
  $output .= "    <link>". htmlentities($pageurl) ."</link>\n";
  $output .= "    <description>". htmlentities($channel_desc) ."</description>\n";
  $output .= "    <webMaster>". htmlentities("webmaster") ."</webMaster>\n";
  $output .= "    <generator>". htmlentities("RSSgenr8 from XMLhub.com") ."</generator>\n";
  $output .= "    <language>en</language>\n";

  for ($i=0; $i< $match_count; $i++) {

    $desc = $items[1][$i];
    $title = wsstrip($desc);
    $descout = $desc;
    

      if (preg_match("/(.+?)(?:<\/P|<\/div|<br|<\/h|<\/td)/i", $title, $regs) > 0) { 
        $title = $regs[1];
        if (strlen(wsstrip(trim(strip_tags($title)))) < 100) {
          $descout = str_replace($title,"",$descout);
        }
      }
    
    $title = wsstrip(trim(strip_tags($title)));
    if (strlen($title) > 100) {
      $title = substr($title,0,100) . " ...";
    }


    
    $item_url = get_link($desc, $pageurl);
    $descout = wsstrip(strip_tags($descout, $allowable_tags));
      $pos = strpos($descout, "<br>");
      if (is_int($pos) and ($pos == 0)) {
        $descout=substr($descout, 4);
      }  
      $pos = strpos($descout, "<br />");
      if (is_int($pos) and ($pos == 0)) {
        $descout=substr($descout, 6);
      }

    $descout = htmlentities(wsstrip($descout));

    $output .= "    <item>\n";
    $output .= "      <title>". htmlentities($title) ."</title>\n";
    $output .= "      <link>". htmlentities($item_url) ."</link>\n";
    $output .= "      <description>". $descout ."</description>\n";
    $output .= "    </item>\n";
  }

  $output .= "  </channel>\n";
  $output .= "</rss>\n";

  print $output;
//  print htmlentities($output);
//  print "</pre>"; 
}

function get_link($desc, $pageurl) {
  if (stristr($desc, "href")) {
    $linkurl = stristr($desc, "href");
    $linkurl = substr($linkurl, strpos($linkurl, "\"")+1);
    $linkurl = substr($linkurl, 0, strpos($linkurl, "\""));
    $linkurl = trim($linkurl);
    $pageurlarray = parse_url($linkurl);
    if (empty($pageurlarray['host'])) {
      $linkurl = make_abs($linkurl, $pageurl);
    }
    return $linkurl;
  } else {
    return $pageurl;
  }
}

function wsstrip($str)
{
 $str=ereg_replace("[\r\t\n]"," ",$str);
 $str=ereg_replace (' +', ' ', trim($str));
return $str;
}

 
function make_abs($rel_uri, $base, $REMOVE_LEADING_DOTS = true) { 
 preg_match("'^([^:]+://[^/]+)/'", $base, $m); 
 $base_start = $m[1]; 
 if (preg_match("'^/'", $rel_uri)) { 
  return $base_start . $rel_uri; 
 } 
 $base = preg_replace("{[^/]+$}", '', $base); 
 $base .= $rel_uri; 
 $base = preg_replace("{^[^:]+://[^/]+}", '', $base); 
 $base_array = explode('/', $base); 
 if (count($base_array) and!strlen($base_array[0])) 
  array_shift($base_array); 
 $i = 1; 
 while ($i < count($base_array)) { 
  if ($base_array[$i - 1] == ".") { 
   array_splice($base_array, $i - 1, 1); 
   if ($i > 1) $i--; 
  } elseif ($base_array[$i] == ".." and $base_array[$i - 1]!= "..") { 
   array_splice($base_array, $i - 1, 2); 
   if ($i > 1) { 
$i--; 
if ($i == count($base_array)) array_push($base_array, ""); 
   } 
  } else { 
   $i++; 
  } 
 } 
 if (count($base_array) and $base_array[-1] == ".") 
  $base_array[-1] = ""; 

 if ($REMOVE_LEADING_DOTS) { 
  while (count($base_array) and preg_match("/^\.\.?$/", $base_array[0])) { 
   array_shift($base_array); 
  } 
 } 
 return($base_start . '/' . implode("/", $base_array)); 
}

?>
 

Corey

I Break Things
Staff member
Messages
34,551
Reaction score
204
Points
63
Can you echo after the POST?
 

deepwater

New Member
Messages
12
Reaction score
0
Points
0
sorry Cory, i know what you're talking about, but i don't know how to do it.
 

Corey

I Break Things
Staff member
Messages
34,551
Reaction score
204
Points
63
Okay, try removing the @ symbol before all the functions so it throws errors instead of suppressing them.

-Corey
 

deepwater

New Member
Messages
12
Reaction score
0
Points
0
Sorry for the late reply Corey. For some reason it appears i'm not being notified of replies (was set to 'instant notification' and i never changed it).

The only function is 'fopen'. I removed the '@' and the script acted same as before (just returns to the same page with with out creating the XML output). There was no error output, nor 'error_log.txt' created. (i replaced the '@').

If you want to try it, go here:
http://www.deepwater.x10hosting.com/rssgen/rssgenr8.php
and input:
http://deepwater.x10hosting.com/rss-test.html
 

Corey

I Break Things
Staff member
Messages
34,551
Reaction score
204
Points
63
Where did you get the script ? Is it possible you can get help from them? I just don't have time to debug it right now. Someone on the forums might though.

-Corey
 

deepwater

New Member
Messages
12
Reaction score
0
Points
0
The guy hasn't returned emails. If you know, off the top of your head, a better solution create RSS feeds (i create large articles every day with 20+ headlines), let me know. The script automates the process by looking at the <span class="rss:item"> tags, so all i have to do is add the tags.

Don't worry about debugging it -- i'm on a free account and i don't expect any support.
 

Corey

I Break Things
Staff member
Messages
34,551
Reaction score
204
Points
63
I'm actually not very familiar with RSS feeds so I probably wouldn't be able to help there. If you post in the site management sub forum you could probably get someone to debug it for you or help you get it working. If things settle down in a few days I'll be able to take a quick look at it also.

-Corey
 
Status
Not open for further replies.
Top