extracting from a string

Discussion in 'Scripts, 3rd Party Apps, and Programming' started by garrensilverwing, May 20, 2009.

  1. garrensilverwing

    garrensilverwing New Member

    Messages:
    148
    Likes Received:
    0
    Trophy Points:
    0
    to try and speed up the process of adding chess games to my website i would like a piece of code that will search a large string for certain bits of information, so instead of hand typing them into the database every time i could just enter the string and it will do the job for me. here is an example of what i want to do:

    Code:
    [Event "Wyoming Open"]
    [Site "Laramie College"]
    [Date "2006.05.06"]
    [Round "3"]
    [White "James Kulbacki"]
    [Black "Chris Peterson"]
    [Result "0-1"]
    [ECO "A00"]
    [WhiteElo "1880"]
    [BlackElo "1668"]
    [PlyCount "38"]
    [EventDate "2006.??.??"]
    [TimeControl "35/90:0"]
    
    1. b4 e5 2. Bb2 Bxb4 3. f4 Nc6 4. fxe5 f6 5. exf6 Nxf6 6. Nf3 O-O 7. g3 Ng4 
    8.c3 Bc5 9. d4 Qe7 10. Qd3 d5 11. dxc5 Bf5 12. Qxd5+ Kh8 13. Nbd2 Rad8 
    14. Qb3 Qe3 15. c4 Qf2+ 16. Kd1 Rxd2+ 17. Kxd2 Rd8+ 18. Kc1 Ne3 
    19. Nd2 Qe1+ 
    0-1
    this is a PGN format chess game. i need to extract the first and last names (seperately) for both white and black, both their ratings, the ECO, the result and the year. the rest of the information can be discarded. I have no idea how to do this and tutorials online are not very helpful so thanks in advance :biggrin:
     
    Last edited: May 20, 2009
  2. garrettroyce

    garrettroyce Community Support Community Support

    Messages:
    5,601
    Likes Received:
    239
    Trophy Points:
    63
    Code:
    $strings = explode('[', $string); //every time a [ is encountered, a new string is added to the array $strings
    $white = substr($strings[4], 6, -2); // white is the 4 element (from zero), name starts after 7th char (from zero) remove last 2 chars "]
    $white_first = strstr($white, ' '); // get everything before and including the first space in white's full name
    $white_last = substr($white, strlen($white_first)); // get everything after the first space in white's full name
    $white_first = substr($while, 0, -1); // remove the last char (space) from white's first name
    $white_rating = substr($strings[8], 9, -2); // white's rating is 8th from zero string, starts after 9 chars, drop last 2 chars
    $result = substr($strings[6], 9, -2); // you get the picture by now :)
    $year = substr($strings[11], 11, 4); // year is always 4 chars long
    I'll leave black up to you, I'm going to bed :p
     
    Last edited: May 20, 2009
  3. misson

    misson Community Paragon Community Support

    Messages:
    2,572
    Likes Received:
    72
    Trophy Points:
    48
  4. garrensilverwing

    garrensilverwing New Member

    Messages:
    148
    Likes Received:
    0
    Trophy Points:
    0
    i was hoping to do it in php so i dont have to fiddle with any more types of code lol
     
  5. garrettroyce

    garrettroyce Community Support Community Support

    Messages:
    5,601
    Likes Received:
    239
    Trophy Points:
    63
    mission's answer is php code, it uses a more complex system. Regular expressions are extremely powerful, and even though the rules are simple, they are difficult to master. If you were to go mission's route, it will definitely be a good tool for you to use in the future, but it will take a lot of patience and learning. My route is very simple, but it takes more code to do the same thing.
     
  6. garrensilverwing

    garrensilverwing New Member

    Messages:
    148
    Likes Received:
    0
    Trophy Points:
    0
    well im not too worried about length of code right now because sometimes you have to do things the hard way first and im doing everything else the hard way and then converting it when i get it working which is probably an ass backwards way of doing it but i am learning so much
     
  7. misson

    misson Community Paragon Community Support

    Messages:
    2,572
    Likes Received:
    72
    Trophy Points:
    48
    To illustrate the utility of regular expressions, here are some example functions for this problem. pgn2array will turn a string containing PGN formatted pairs into an associative array.
    PHP:
    function parseName($name) {
        
    preg_match('/^(?:(\S+) (?:(.*) )?)?(\S+)$/'$name$name);
        return 
    array_combine( array('full''first''middle''last'), $name);
    }

    function 
    parseDate($date) {
        return 
    array_combine(array('year''month''day'), explode('.'$date));
    }

    function 
    pgn2array($pgn) {
      static 
    $filters = array('white' => 'parseName''black' => 'parseName'
                              
    'date' => 'parseDate''eventdate' => 'parseDate');
      
    $arr=False;
      if (
    preg_match_all('/\[(\w+) "([^"]+)"\]/'$pgn$matchesPREG_PATTERN_ORDER)) {
          
    $arr array_combine(array_map('strtolower'$matches[1]), $matches[2]);
          foreach (
    $filters as $key => $filter) {
              
    $arr[$key] = call_user_func($filter$arr[$key]);
          }
      }
      return 
    $arr;
    }
    If the data is stored in a file, the above would require you to read the whole file before parsing it. To parse a file in place, try:
    PHP:
    function filterTagPair($key$value) {
        switch(
    $key) {
        case 
    'white':
        case 
    'black':
            return 
    parseName($value);
            break;
        case 
    'date':
        case 
    'eventdate':
            return 
    parseDate($value);
            break;
        }
        return 
    $value;
    }

    function 
    pgnFile2array($file) {
        
    $arr = array();
        
    $fpos ftell($file);
        while (!
    feof($file) && $line=fgets($file)) {
            if (
    preg_match('/\[(\w+) "([^"]+)"\]/'$line$matches)) {
                
    $matches[1] = strtolower($matches[1]);
                
    $matches[2] = filterTagPair($matches[1], $matches[2]);
                
    $arr[$matches[1]] = $matches[2];
                
    $fpos ftell($file);
            } else {
                
    fseek($file$fpos);
                break;
            }
        }
        return 
    count($arr) ? $arr False;
    }
    Looping over an array of filters (as pgn2array does) would work just as well for pgnFile2array. I used a switch to illustrate another approach.

    None of the above deals with errors, so they could be fleshed out in this regard.
     
    Last edited: May 21, 2009
  8. xav0989

    xav0989 Community Public Relation Community Support

    Messages:
    4,467
    Likes Received:
    95
    Trophy Points:
    0
    Well in this case, the hard way, regular expressions, is also the best way!
     
  9. misson

    misson Community Paragon Community Support

    Messages:
    2,572
    Likes Received:
    72
    Trophy Points:
    48
    As for the power of regular expressions, XKCD says it best:

    [​IMG]
     
  10. xav0989

    xav0989 Community Public Relation Community Support

    Messages:
    4,467
    Likes Received:
    95
    Trophy Points:
    0
    Nice one there, mission!
     
  11. misson

    misson Community Paragon Community Support

    Messages:
    2,572
    Likes Received:
    72
    Trophy Points:
    48
    Thank Randall Munroe, not me.
     
  12. garrensilverwing

    garrensilverwing New Member

    Messages:
    148
    Likes Received:
    0
    Trophy Points:
    0
    ok so if i have a file of say 500 pgn's and i want to write a code to extract certain information from it i can use the regular expressions to do that but how would i keep it from extracting it from all of them at the same time rather than keeping them separate?
     
    Last edited: Jun 3, 2009
  13. fguy64

    fguy64 New Member

    Messages:
    218
    Likes Received:
    0
    Trophy Points:
    0
    Probably you have considered this, but...

    You say you want to extract first and last name separately. How confident are you that your source data is created in a consistent manner? Do you know for example that first name always comes before last name? With a space separator?

    I've done a lot of stuff with pgn files, but mostly using java. Usually I used string tokenizers to parse data, so I don't know if there is a similar construct in php.

    Just out of curiosity, what is your reason for separating first and last name?
     
  14. garrensilverwing

    garrensilverwing New Member

    Messages:
    148
    Likes Received:
    0
    Trophy Points:
    0
    well i want to have them separated for ease of searching but its not necessary, a majority of the games will have usernames from games played online which means there wont be a first/last name and i get the games from my friend and i told him to make sure they are in that particular format (the pgns are generated by a program called chessbase which almost always does them in that format) if the name is a username i was just going to have the username be both the first and last name, it all really depends how it works out
     
  15. fguy64

    fguy64 New Member

    Messages:
    218
    Likes Received:
    0
    Trophy Points:
    0
    I know chessbase, it handles pgn nicely. I suppose having an accurate parsing of first and last will make for speedier searching, as long as you have quality parsing. But if you don't have accurate parsing, better to just leave it as is, and search on substrings.

    anyways, if regular expressions make your head spin, try string tokenizers, they are easier to understand, and they will do the job. I did a little checking and it can be done in php, here is one link.

    http://cs.metrostate.edu/~fitzgesu/courses/ics325/summer04/Ch0405.htm
     
  16. garrensilverwing

    garrensilverwing New Member

    Messages:
    148
    Likes Received:
    0
    Trophy Points:
    0

    well i will want to learn them eventually so i will just buckle down and figure them out i just want to know about the file thing or if i will have to do each pgn manually
     
  17. misson

    misson Community Paragon Community Support

    Messages:
    2,572
    Likes Received:
    72
    Trophy Points:
    48
    I think any approach will come down to separating the games first, then parse each game separately. Trying to do it all at one go will just be a mess. You can split the games up, then find the one you want, then parse those, or you can extract the games you want (using REs, maybe?) and parse those. Splitting the games up may make it easier to find games, depending on what search criteria you want to allow.

    Give me a use case. How do you decide which games you are looking for? Is this a one-time thing? Are you going to search the large file just once for each game (ie multiple searches, but any particular game will only be found by one search)? Are you going to repeat searches (ie multiple searches, and a particular game may be picked up by multiple searches)?
     
  18. garrensilverwing

    garrensilverwing New Member

    Messages:
    148
    Likes Received:
    0
    Trophy Points:
    0
    i'll see if i can explain it without sounding crazy or stupid :D

    right now, in order to display the games on my site, i use javascript code generated by chessbase which uses iframes to separate the list of games and the chessboard from the actual game itself (brianwallchess.x10hosting.com/games). my friend brian wall wanted me to add a search feature to the games so i created a mysql database and table and manually put in all the information for the games and set up a search feature (which i was working on in my previous posts if you remember), well that was really tedious so i want an easier way to put the information into the database since he just sent me over 400 games to put on the website. so instead of manually putting in all the information i want to write code that will do a majority of the work for me

    so far this is what i got but it doesnt extract anything yet cause i wanted to see about doing all the games at once first http://www.brianwallchess.x10hosting.com/php/pgn.php
     
    Last edited: Jun 3, 2009
  19. fguy64

    fguy64 New Member

    Messages:
    218
    Likes Received:
    0
    Trophy Points:
    0
    Garren, not to cut in on what Mission is doing, cause he knows his stuff. But consider the following...

    Programs such as Chessbase and Chess Assistant are not just repositories for games. They are sophisticated database applications that have all kinds of search and query functionality that is specifically designed for this kind of data.

    If you are trying to write code to manipulate or extract data, do queries, selects, create subsets of games based on any criteria imaginable, then maybe before you go to a lot of work, consider that going back to these programs to massage data may save you a lot of work.

    I would not be surprised if you can use these applications to covert from pgn to comma delimited text file, which can be imported directly into mySQL.

    In any event, a pgn database is not that far away from a delimited file that can be imported directly into a database. So instead of parsing what is basically flat text file, consider importing the data pretty much as is, with maybe a few modifications, into mySQL, and then use SQL commands do do the work you are currently trying to get php to do.

    anyways, if parsing pgn directly is still the way to go, I will step back now and let you guys deal with it.
     
  20. garrensilverwing

    garrensilverwing New Member

    Messages:
    148
    Likes Received:
    0
    Trophy Points:
    0
    well i have an older version of chessbase and it doesnt look like it has the ability to output anything other than PGN or TXT neither of which are helpful to me, maybe if i could install chessbase onto the webserver then i could do something but i doubt that is possible
     
    Last edited: Jun 3, 2009

Share This Page