extracting from a string

garrensilverwing

New Member
Messages
148
Reaction score
0
Points
0
to try and speed up the process of adding chess games to my website i would like a piece of code that will search a large string for certain bits of information, so instead of hand typing them into the database every time i could just enter the string and it will do the job for me. here is an example of what i want to do:

Code:
[Event "Wyoming Open"]
[Site "Laramie College"]
[Date "2006.05.06"]
[Round "3"]
[White "James Kulbacki"]
[Black "Chris Peterson"]
[Result "0-1"]
[ECO "A00"]
[WhiteElo "1880"]
[BlackElo "1668"]
[PlyCount "38"]
[EventDate "2006.??.??"]
[TimeControl "35/90:0"]

1. b4 e5 2. Bb2 Bxb4 3. f4 Nc6 4. fxe5 f6 5. exf6 Nxf6 6. Nf3 O-O 7. g3 Ng4 
8.c3 Bc5 9. d4 Qe7 10. Qd3 d5 11. dxc5 Bf5 12. Qxd5+ Kh8 13. Nbd2 Rad8 
14. Qb3 Qe3 15. c4 Qf2+ 16. Kd1 Rxd2+ 17. Kxd2 Rd8+ 18. Kc1 Ne3 
19. Nd2 Qe1+ 
0-1

this is a PGN format chess game. i need to extract the first and last names (seperately) for both white and black, both their ratings, the ECO, the result and the year. the rest of the information can be discarded. I have no idea how to do this and tutorials online are not very helpful so thanks in advance :biggrin:
 
Last edited:

garrettroyce

Community Support
Community Support
Messages
5,609
Reaction score
250
Points
63
Code:
$strings = explode('[', $string); //every time a [ is encountered, a new string is added to the array $strings
$white = substr($strings[4], 6, -2); // white is the 4 element (from zero), name starts after 7th char (from zero) remove last 2 chars "]
$white_first = strstr($white, ' '); // get everything before and including the first space in white's full name
$white_last = substr($white, strlen($white_first)); // get everything after the first space in white's full name
$white_first = substr($while, 0, -1); // remove the last char (space) from white's first name
$white_rating = substr($strings[8], 9, -2); // white's rating is 8th from zero string, starts after 9 chars, drop last 2 chars
$result = substr($strings[6], 9, -2); // you get the picture by now :)
$year = substr($strings[11], 11, 4); // year is always 4 chars long

I'll leave black up to you, I'm going to bed :p
 
Last edited:

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48

garrensilverwing

New Member
Messages
148
Reaction score
0
Points
0
Regular expressions are the easiest way to do it, though REs can be hard to learn. Studying the relationship between regular languages and finite state machines will give you the best understanding. You could also try MDC RE guide or the Perl RE tutorial

For example, /\[(White|Black) ".* ([A-Za-z']+)"\]/ will match and extract the player's color and last name. What language do you want to use to translate the data?

i was hoping to do it in php so i dont have to fiddle with any more types of code lol
 

garrettroyce

Community Support
Community Support
Messages
5,609
Reaction score
250
Points
63
mission's answer is php code, it uses a more complex system. Regular expressions are extremely powerful, and even though the rules are simple, they are difficult to master. If you were to go mission's route, it will definitely be a good tool for you to use in the future, but it will take a lot of patience and learning. My route is very simple, but it takes more code to do the same thing.
 

garrensilverwing

New Member
Messages
148
Reaction score
0
Points
0
well im not too worried about length of code right now because sometimes you have to do things the hard way first and im doing everything else the hard way and then converting it when i get it working which is probably an ass backwards way of doing it but i am learning so much
 

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
To illustrate the utility of regular expressions, here are some example functions for this problem. pgn2array will turn a string containing PGN formatted pairs into an associative array.
PHP:
function parseName($name) {
    preg_match('/^(?:(\S+) (?:(.*) )?)?(\S+)$/', $name, $name);
    return array_combine( array('full', 'first', 'middle', 'last'), $name);
}

function parseDate($date) {
	return array_combine(array('year', 'month', 'day'), explode('.', $date));
}

function pgn2array($pgn) {
  static $filters = array('white' => 'parseName', 'black' => 'parseName', 
                          'date' => 'parseDate', 'eventdate' => 'parseDate');
  $arr=False;
  if (preg_match_all('/\[(\w+) "([^"]+)"\]/', $pgn, $matches, PREG_PATTERN_ORDER)) {
	  $arr = array_combine(array_map('strtolower', $matches[1]), $matches[2]);
	  foreach ($filters as $key => $filter) {
		  $arr[$key] = call_user_func($filter, $arr[$key]);
	  }
  }
  return $arr;
}

If the data is stored in a file, the above would require you to read the whole file before parsing it. To parse a file in place, try:
PHP:
function filterTagPair($key, $value) {
    switch($key) {
    case 'white':
    case 'black':
        return parseName($value);
        break;
    case 'date':
    case 'eventdate':
        return parseDate($value);
        break;
    }
    return $value;
}

function pgnFile2array($file) {
    $arr = array();
    $fpos = ftell($file);
    while (!feof($file) && $line=fgets($file)) {
        if (preg_match('/\[(\w+) "([^"]+)"\]/', $line, $matches)) {
            $matches[1] = strtolower($matches[1]);
            $matches[2] = filterTagPair($matches[1], $matches[2]);
            $arr[$matches[1]] = $matches[2];
            $fpos = ftell($file);
        } else {
            fseek($file, $fpos);
            break;
        }
    }
    return count($arr) ? $arr : False;
}
Looping over an array of filters (as pgn2array does) would work just as well for pgnFile2array. I used a switch to illustrate another approach.

None of the above deals with errors, so they could be fleshed out in this regard.
 
Last edited:

xav0989

Community Public Relation
Community Support
Messages
4,467
Reaction score
95
Points
0
well im not too worried about length of code right now because sometimes you have to do things the hard way first and im doing everything else the hard way and then converting it when i get it working which is probably an ass backwards way of doing it but i am learning so much
Well in this case, the hard way, regular expressions, is also the best way!
 

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
As for the power of regular expressions, XKCD says it best:

 

xav0989

Community Public Relation
Community Support
Messages
4,467
Reaction score
95
Points
0
Nice one there, mission!
 

garrensilverwing

New Member
Messages
148
Reaction score
0
Points
0
ok so if i have a file of say 500 pgn's and i want to write a code to extract certain information from it i can use the regular expressions to do that but how would i keep it from extracting it from all of them at the same time rather than keeping them separate?
 
Last edited:

fguy64

New Member
Messages
218
Reaction score
0
Points
0
Probably you have considered this, but...

You say you want to extract first and last name separately. How confident are you that your source data is created in a consistent manner? Do you know for example that first name always comes before last name? With a space separator?

I've done a lot of stuff with pgn files, but mostly using java. Usually I used string tokenizers to parse data, so I don't know if there is a similar construct in php.

Just out of curiosity, what is your reason for separating first and last name?
 

garrensilverwing

New Member
Messages
148
Reaction score
0
Points
0
Probably you have considered this, but...

You say you want to extract first and last name separately. How confident are you that your source data is created in a consistent manner? Do you know for example that first name always comes before last name? With a space separator?

I've done a lot of stuff with pgn files, but mostly using java. Usually I used string tokenizers to parse data, so I don't know if there is a similar construct in php.

Just out of curiosity, what is your reason for separating first and last name?

well i want to have them separated for ease of searching but its not necessary, a majority of the games will have usernames from games played online which means there wont be a first/last name and i get the games from my friend and i told him to make sure they are in that particular format (the pgns are generated by a program called chessbase which almost always does them in that format) if the name is a username i was just going to have the username be both the first and last name, it all really depends how it works out
 

fguy64

New Member
Messages
218
Reaction score
0
Points
0
well i want to have them separated for ease of searching but its not necessary, a majority of the games will have usernames from games played online which means there wont be a first/last name and i get the games from my friend and i told him to make sure they are in that particular format (the pgns are generated by a program called chessbase which almost always does them in that format) if the name is a username i was just going to have the username be both the first and last name, it all really depends how it works out

I know chessbase, it handles pgn nicely. I suppose having an accurate parsing of first and last will make for speedier searching, as long as you have quality parsing. But if you don't have accurate parsing, better to just leave it as is, and search on substrings.

anyways, if regular expressions make your head spin, try string tokenizers, they are easier to understand, and they will do the job. I did a little checking and it can be done in php, here is one link.

http://cs.metrostate.edu/~fitzgesu/courses/ics325/summer04/Ch0405.htm
 

garrensilverwing

New Member
Messages
148
Reaction score
0
Points
0
I know chessbase, it handles pgn nicely. I suppose having an accurate parsing of first and last will make for speedier searching, as long as you have quality parsing. But if you don't have accurate parsing, better to just leave it as is, and search on substrings.

anyways, if regular expressions make your head spin, try string tokenizers, they are easier to understand, and they will do the job. I did a little checking and it can be done in php, here is one link.

http://cs.metrostate.edu/~fitzgesu/courses/ics325/summer04/Ch0405.htm


well i will want to learn them eventually so i will just buckle down and figure them out i just want to know about the file thing or if i will have to do each pgn manually
 

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
I think any approach will come down to separating the games first, then parse each game separately. Trying to do it all at one go will just be a mess. You can split the games up, then find the one you want, then parse those, or you can extract the games you want (using REs, maybe?) and parse those. Splitting the games up may make it easier to find games, depending on what search criteria you want to allow.

Give me a use case. How do you decide which games you are looking for? Is this a one-time thing? Are you going to search the large file just once for each game (ie multiple searches, but any particular game will only be found by one search)? Are you going to repeat searches (ie multiple searches, and a particular game may be picked up by multiple searches)?
 

garrensilverwing

New Member
Messages
148
Reaction score
0
Points
0
I think any approach will come down to separating the games first, then parse each game separately. Trying to do it all at one go will just be a mess. You can split the games up, then find the one you want, then parse those, or you can extract the games you want (using REs, maybe?) and parse those. Splitting the games up may make it easier to find games, depending on what search criteria you want to allow.

Give me a use case. How do you decide which games you are looking for? Is this a one-time thing? Are you going to search the large file just once for each game (ie multiple searches, but any particular game will only be found by one search)? Are you going to repeat searches (ie multiple searches, and a particular game may be picked up by multiple searches)?

i'll see if i can explain it without sounding crazy or stupid :D

right now, in order to display the games on my site, i use javascript code generated by chessbase which uses iframes to separate the list of games and the chessboard from the actual game itself (brianwallchess.x10hosting.com/games). my friend brian wall wanted me to add a search feature to the games so i created a mysql database and table and manually put in all the information for the games and set up a search feature (which i was working on in my previous posts if you remember), well that was really tedious so i want an easier way to put the information into the database since he just sent me over 400 games to put on the website. so instead of manually putting in all the information i want to write code that will do a majority of the work for me

so far this is what i got but it doesnt extract anything yet cause i wanted to see about doing all the games at once first http://www.brianwallchess.x10hosting.com/php/pgn.php
 
Last edited:

fguy64

New Member
Messages
218
Reaction score
0
Points
0
Garren, not to cut in on what Mission is doing, cause he knows his stuff. But consider the following...

Programs such as Chessbase and Chess Assistant are not just repositories for games. They are sophisticated database applications that have all kinds of search and query functionality that is specifically designed for this kind of data.

If you are trying to write code to manipulate or extract data, do queries, selects, create subsets of games based on any criteria imaginable, then maybe before you go to a lot of work, consider that going back to these programs to massage data may save you a lot of work.

I would not be surprised if you can use these applications to covert from pgn to comma delimited text file, which can be imported directly into mySQL.

In any event, a pgn database is not that far away from a delimited file that can be imported directly into a database. So instead of parsing what is basically flat text file, consider importing the data pretty much as is, with maybe a few modifications, into mySQL, and then use SQL commands do do the work you are currently trying to get php to do.

anyways, if parsing pgn directly is still the way to go, I will step back now and let you guys deal with it.
 

garrensilverwing

New Member
Messages
148
Reaction score
0
Points
0
Garren, not to cut in on what Mission is doing, cause he knows his stuff. But consider the following...

Programs such as Chessbase and Chess Assistant are not just repositories for games. They are sophisticated database applications that have all kinds of search and query functionality that is specifically designed for this kind of data.

If you are trying to write code to manipulate or extract data, do queries, selects, create subsets of games based on any criteria imaginable, then maybe before you go to a lot of work, consider that going back to these programs to massage data may save you a lot of work.

I would not be surprised if you can use these applications to covert from pgn to comma delimited text file, which can be imported directly into mySQL.

In any event, a pgn database is not that far away from a delimited file that can be imported directly into a database. So instead of parsing what is basically flat text file, consider importing the data pretty much as is, with maybe a few modifications, into mySQL, and then use SQL commands do do the work you are currently trying to get php to do.

anyways, if parsing pgn directly is still the way to go, I will step back now and let you guys deal with it.

well i have an older version of chessbase and it doesnt look like it has the ability to output anything other than PGN or TXT neither of which are helpful to me, maybe if i could install chessbase onto the webserver then i could do something but i doubt that is possible
 
Last edited:
Top