My Markup Validator

Discussion in 'Scripts, 3rd Party Apps, and Programming' started by nonsensep, Oct 24, 2007.

  1. nonsensep

    nonsensep New Member

    Messages:
    39
    Likes Received:
    0
    Trophy Points:
    0
    Hi! I would like to tell you about my first web tool. It's not really a markup validator, like the title says. It's just a PHP script that sends an HTTP request to the W3C Markup Validator and then outputs what it says about a given URI (as a PNG image). Here are some examples of using the validator on various websites:

    Code:
    <img src="http://www.nonsensep.x10hosting.com/[COLOR="Blue"]markupvalid.php[/COLOR]?uri=[COLOR="Red"]http://www.w3.org/[/COLOR]" />
    [​IMG](Valid markup)
    Code:
    http://www.nonsensep.x10hosting.com/[COLOR="Blue"]markupvalid.php[/COLOR]?uri=[COLOR="Red"]http://www.youtube.com/[/COLOR]
    [​IMG](Invalid markup)
    Code:
    http://www.nonsensep.x10hosting.com/[COLOR="Blue"]markupvalid.php[/COLOR]?uri=[COLOR="Red"]http://www.w3.org/&charset=iso-8859-1[/COLOR]
    [​IMG](Tentatively valid markup)
    Code:
    http://www.nonsensep.x10hosting.com/[COLOR="Blue"]markupvalid.php[/COLOR]?uri=[COLOR="Red"]http://www.circuitcity.com/[/COLOR]
    [​IMG](No <!DOCTYPE>)
    Code:
    http://www.nonsensep.x10hosting.com/[COLOR="Blue"]markupvalid.php[/COLOR]?uri=[COLOR="Red"]http://www.bestbuy.com/[/COLOR]
    [​IMG](404 error, not a URI, etc.)

    The image is just 176x31 pixels. As the colored text points out, it's only using one image file for many different cases. I hope you guys find it useful.

    Any suggestions? Comments? Tell me so! I plan on making a better style sometime. But it works, and that's what counts. And feel free to use it however you like
     
    Last edited: Oct 25, 2007
  2. QuwenQ

    QuwenQ Member

    Messages:
    960
    Likes Received:
    0
    Trophy Points:
    16
    That seems useful, except for the minor problem that someone already made a button like this:
    [​IMG]
    What's the advantage of using yours?
     
    Last edited: Oct 25, 2007
  3. nonsensep

    nonsensep New Member

    Messages:
    39
    Likes Received:
    0
    Trophy Points:
    0
    Yeah, but that image will always say "Valid XHTML 1.0", even when there are errors. My image changes every time and tells the user straight from the W3C validator whether the page is valid or not. And how many errors/warnings. For example:

    Code:
    http://www.nonsensep.x10hosting.com/[COLOR="Blue"]markupvalid.php[/COLOR]?uri=[COLOR="Red"]http://forums.x10hosting.com/[/COLOR]
    [​IMG]

    Now let's see what W3C says about the same URI:

    http://validator.w3.org/check?uri=http://forums.x10hosting.com/

    Try using my image with your own URI. Just enter this in your address bar:

    Code:
    http://www.nonsensep.x10hosting.com/[COLOR="Blue"]markupvalid.php[/COLOR]?uri=[COLOR="Red"]http://YOUR_SITE_HERE/[/COLOR]
     
    Last edited: Oct 25, 2007
  4. Archkronos

    Archkronos New Member

    Messages:
    257
    Likes Received:
    0
    Trophy Points:
    0
    [​IMG]

    Horray for lazy coding!
     
  5. nonsensep

    nonsensep New Member

    Messages:
    39
    Likes Received:
    0
    Trophy Points:
    0
    thats how computers began! laziness!
     
    Last edited: Oct 25, 2007
  6. Slothie

    Slothie New Member

    Messages:
    1,430
    Likes Received:
    0
    Trophy Points:
    0
    [​IMG]

    Interesting concept :) Glad to know there are actually developers on x10...

    Have you implemented caching yet? Parsing the site every time the image is loaded would be rather taxing on the server. It would also cause the image to load slower as it has to wait for the server to download and parse the page...

    Good work though :D
     
  7. eminemix

    eminemix Member

    Messages:
    350
    Likes Received:
    0
    Trophy Points:
    16
    Good job nonsensep
    :coolugh:
     
  8. nonsensep

    nonsensep New Member

    Messages:
    39
    Likes Received:
    0
    Trophy Points:
    0
    thank you Slothie and eminemix! And about caching, I see what you're saying, but I'm not sure if that'd defeat the purpose of it or not...

    Well, do you think there's a way to have a session for the website and when the session ends the cache expires? That would be sort of a compromise, I guess. I'll look into that.

    Oh, and I'm planning on making one for CSS, also.
     
  9. Slothie

    Slothie New Member

    Messages:
    1,430
    Likes Received:
    0
    Trophy Points:
    0
    You could cache it on an hourly basis or so. Sessions would be too short :p
     
  10. nonsensep

    nonsensep New Member

    Messages:
    39
    Likes Received:
    0
    Trophy Points:
    0
    I don't know, an hour seems too long. I don't think that there will be too much of a problem if the image takes a little while to load.
     
  11. Slothie

    Slothie New Member

    Messages:
    1,430
    Likes Received:
    0
    Trophy Points:
    0
    No? Imagine if a site has 10000 viewers, that's 10000 people constantly refreshing the image which in turn means that your script has to parse through that page that many times.

    A slightly less CPU resource intensive way of caching would be to hash the URL data and compare it to the existing cache (possibly the filename). This would save on some CPU time but you'd still be using a decent chunk of bandwidth.
     
  12. Thewinator

    Thewinator New Member

    Messages:
    256
    Likes Received:
    0
    Trophy Points:
    0
    Well it would be a problem for the site owner.
    Its also a good reason not to implement it, so I suggest you work on it ;)
    You could tho store an MD5 checksum of the site, then compare it.
    If its the same then show the same image, otherwise hand it over to w3.
     
  13. nonsensep

    nonsensep New Member

    Messages:
    39
    Likes Received:
    0
    Trophy Points:
    0
    what do you mean? how would i do this?
     
  14. Avalanche

    Avalanche New Member

    Messages:
    49
    Likes Received:
    0
    Trophy Points:
    0
    use fopen() and md5(), nuff said.
     
  15. Slothie

    Slothie New Member

    Messages:
    1,430
    Likes Received:
    0
    Trophy Points:
    0
    Get the entire content of the site, since you're parsing it anyway. Then prepend or append the the site URL.

    them $hash=md5($reallylongvariable)
     
  16. nonsensep

    nonsensep New Member

    Messages:
    39
    Likes Received:
    0
    Trophy Points:
    0
    So each time my script runs...



    It will get an MD5 checksum of the URI's contents.
    It will look up the URI of the site in a database on my server, find the checksum associated with the URI, and compare the two.
    If they are the same, it will look up in the database the variables that were stored along with the URI (num. erros, num. warnings, markup type, etc.) and then display an image based on those variables.
    If not, fsockopen() on the W3C site for that URI.
    Then after all that, store the URI, checksum, and variables in a database.



    Sound good?
     
    Last edited: Oct 28, 2007
  17. Slothie

    Slothie New Member

    Messages:
    1,430
    Likes Received:
    0
    Trophy Points:
    0
    There are tonnes of ways to optimize a script that does what yours does. That's one of 'em :p
     
  18. nonsensep

    nonsensep New Member

    Messages:
    39
    Likes Received:
    0
    Trophy Points:
    0
    Ok. So, basically, it's either bandwidth or database memory. At least the way I'm approaching it. I guess I could store them in an XML file instead of a database. Then it wouldn't be database memory.
     
    Last edited: Oct 28, 2007
  19. Slothie

    Slothie New Member

    Messages:
    1,430
    Likes Received:
    0
    Trophy Points:
    0
    No. No. No.
    That would be a bad idea. Databases are MEANT for storing information. It takes more effort to parse an XML file than to run a query of the database.

    Basically its either
    Bandwidth or CPU consumption.

    Less regular checks would save your someone on bandwidth.
    The hashing method we just discussed would save some CPU consumption instead of reparsing every page that has that button.

    You might want to consider accepting http_Referers as a default value, so someone can just put the button on their site w/o using a GET var. That way he can track his subpages as well, instead of having to get separate image links for different pages.
     
  20. nonsensep

    nonsensep New Member

    Messages:
    39
    Likes Received:
    0
    Trophy Points:
    0
    How do you accept the refere because I tried using $http_referer, but it didn't work and I didn't know why
     

Share This Page