preg_match help

Mitch

New Member
Messages
908
Reaction score
0
Points
0
i got this at the moment:
PHP:
preg_match( "[a-zA-Z0-9]", $msg )
i want that it also checks on whitespace and !@?.:$=&
i am saving the data in mysql and gets loaded in a php image
 

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
Patterns must begin and end with delimiters. You can use basically any non-alphanumeric, non-whitespace character you want (or matching braces); '/' is the most common choice. If you want to match multiple characters (or subpatterns), place after the subpattern a '*' for 0 or more, '+' for 1 or more, or '{n,m}' for n through m repeats. '^' at the start of the pattern anchors it to the start of the string (or newlines, for multiline matching); '$' similarly anchors the end. '\s' matches whitespace, even within character classes. For the other characters, just include them in the character class. Character classes have a different set of metacharacters: '[', ']', '^' and '-'. Characters other than those (e.g. '.') aren't special within character classes.

If you're doing this to prevent SQL injection, you should instead use prepared statements, which are immune to such.

That's all I can tell you without a more precise description of the problem, including the behavior you expect and the behavior you get.
 
Last edited:

Mitch

New Member
Messages
908
Reaction score
0
Points
0
So something like this:

PHP:
preg_match( "/[a-zA-Z0-9]/!/@/$/_/=/-/+/\s/", $msg )
Updated it:
PHP:
preg_match( "/[a-zA-Z0-9]!@$_=-+\s/", $msg )

I found this on your first link. It is useful and clear.
Source: http://www.pcre.org/pcre.txt
\ general escape character with several uses
^ assert start of string (or line, in multiline mode)
$ assert end of string (or line, in multiline mode)
. match any character except newline (by default)
[ start character class definition
| start of alternative branch
( start subpattern
) end subpattern
? extends the meaning of (
also 0 or 1 quantifier
also quantifier minimizer
* 0 or more quantifier
+ 1 or more quantifier
also "possessive quantifier"
{ start min/max quantifier

\d any decimal digit
\D any character that is not a decimal digit
\h any horizontal whitespace character
\H any character that is not a horizontal whitespace character
\s any whitespace character
\S any character that is not a whitespace character
\v any vertical whitespace character
\V any character that is not a vertical whitespace character
\w any "word" character
\W any "non-word" character

I am now using this:
PHP:
preg_match( "/[a-zA-Z0-9]!@$_=-+\s\h\v\H\V/", $msg )
it still doesn't accept spaces. Gonna try \w now (any "word" character) (nope it accepts to much)

The thing is that i want it that it only accepts characters that can be used in a php image string message.
Edit: I think /s doesn't work in php version that x10 got.
The \v escape that was in the Perl documentation for a long time was never in fact recognized. However, the character itself was treated as whitespace at least up to 5.002. In 5.004 and 5.005 it does not match \s.
 
Last edited:

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
Read up on PHP strings, specifically the difference between single and double quotes. The backslash is special in strings: it starts what's called an escape sequence, which is basically a way of specifying characters you can't otherwise type into the string (such as double quotes within a double quoted string). If you want to be certain of a literal backslash in a double quoted string, repeat the backslash. In single quoted strings, you only need to double a backslash if it precedes another backslash (this also covers single quotes, because you must escape single quotes).

As for the regular expressions, I can tell you all of them are most likely wrong, but can't tell you what's right because you still haven't described exactly what your goal is. What are you trying to do, verify that user input doesn't contain illegal characters? As far as the pattern is concerned, do you want to match a sequence of characters, or only one? If a sequence, do you want the elements to be homogeneous (all of the same type) or heterogeneous (firsts characters from some class, then others of a different class &c)?

"\w" matches "word" characters; it's equivalent to the character class [_a-zA-Z0-9], though it's not a literal equivalent, since you can use it within a character class. "\s" works fine; it's equivalent to [ \t\n\r\v\f].
 
Last edited:

Mitch

New Member
Messages
908
Reaction score
0
Points
0
I want that when you input a invalid character that it doesn't save your message:

PHP:
if(!preg_match( "/[a-zA-Z0-9]!@$_=-+\s/", $msg ))
{
	echo "invalid characters";
}
 
Last edited:

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
To better understand regular expressions, I recommend learning the connection between REs and non-deterministic finite state machines (aka nondeterministic finite automata, or NFA). The subject is illuminating, though it should be noted that Perl-compatible regular expressions (PCREs, which is what the preg_* functions use) are more expressive and powerful than classic REs.

I want that when you input a invalid character that it doesn't save your message:

In that case, you don't want to negate the result of the preg_match, to start with. What will happen is that `preg_match` will do its best to find a match. As long as there's a valid character, the match would succeed, causing the invalid test to fail. The only way your test would succeed is if the input contains no valid characters.

Note: you've changed what punctuation is valid. In you first post, you specified [!@?.:$=&]. In your last, you're looking for [!@$_=-+]. I'm going to assume you want the union of these classes, plus a few others.

Fortunately, all you need is a single character class. You can either have a character class for valid characters, and have that class match the entire string, or have a character class for invalid characters and have that match at least once. Defining the valid characters as a class is easy, and negating a character class is easier: just place a carat ("^") immediately after the opening square bracket. That is, [^0-9] is the negation of [0-9]. The preceding classes are equivalent to "\D" and "\d", respectively. Similarly, "\w" and "[_a-zA-Z0-9]" are the opposite of "\W" and "[^_a-zA-Z0-9]".

Note that if you put two opposite special characters (i.e. those that begin with a backslash) in a character class, the result will match every character, because every character will match either one special character or the opposite special character. That is, classes including "\w\W", "\s\S", "\d\D" &c are equivalent to ".", which matches every character.

What you want, if you haven't guessed, is:
PHP:
if (preg_match('/[^-\s\w.!?,:;@$=&]/', $msg)) {
    // message contains invalid characters; print error message and process no further.
    ...
} else {
    // message contains only valid characters; process it.
}
Until you're familiar with backslash escaping in strings, you should probably escape the '\' in the above RE as a reminder, even thought it's not necessary. If you used double quotes, the pattern would need to be "/[^-\\s\\w.!?,:;@\$=&]/".

Earlier I mentioned you could checking that all the characters are valid (rather than that at least one is invalid); the pattern for this is /^[-\s\w!@?.:$=&]*$/ (if you also want to check that the message has at least one character, change the "*" to a "+"). However, looking for an invalid character is the simpler pattern.

Note all the valid characters are inside the square brackets because they all are supposed to belong to the same character class; if you place them outside the brackets, you are asking them to match subsequent characters. Also, some of the characters (?, . and +, in your case) will have special meanings outside of square brackets. For example, '/[a-zA-Z0-9]!@?.$_=-+\s/' will match:
  • 'a!^$_=- '
  • 'Q!@($_=- '
  • '9!@.$_=----- '
But not:
  • 'a!^$_= ' (no "-")
  • 'Q!@@($_=- ' (too many "@")
  • '9!@?.$_=----- ' (has a "?")
  • 'A!@?{$_=-+ ' (has a "?" and "+")
Furthermore, if you use double quotes and don't escape the "$", then "$_" will be replaced by the value of the $_ variable, which is most likely empty.
 
Last edited:

Mitch

New Member
Messages
908
Reaction score
0
Points
0
Thanks, i works now fine :)

http://mitch.exofire.net/sig/

I don't know if it blocks now all characters that aren't supported by php gd, because a friend of me used a character that wasn't &#***;
 
Last edited:

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
HTML entities aren't unescaped by the GD functions. If you want to unescape them, you'll first need to call html_entity_decode.

It sounds like you actually want a pattern to match all characters that are guaranteed to have glyphs in every font. In other words, you want the printable ASCII characters. The pattern to check for invalid characters should thus be: /[^\x20-\x7F]/. Alternatively, load a font with unicode support and test for non-graphic characters (try the pattern /\p{C}/). I highly recommend the latter approach.

If you want to see which characters a font supports, try:
PHP:
<?php
$font = isset($_REQUEST['font']) ? $_REQUEST['font'] : 5;
$lineHeight = imagefontheight($font)+5;
$charWidth = imagefontwidth($font)+2;
$padding = 5;
$imgWidth=$charWidth * 20 + 2*$padding;
$imgHeight=$lineHeight * 17 + 2*$padding;
$img = imagecreate($imgWidth, $imgHeight);

$bg = imagecolorallocate($img, 255, 255, 255);
$fg = imagecolorallocate($img, 0, 0, 0);

header('Content-type: image/png');
imagerectangle($img, 0,0, $imgWidth-1, $imgHeight-1, $fg);

$padding = 5;
for ($i=0; $i<16; ++$i) {
    imagechar($img, $font, ($i+4)*$charWidth, $padding, dechex($i), $fg);
}
for ($i=0; $i < 16; ++$i) {
    imagestring($img, $font, $padding, ($i+1)*$lineHeight+$padding, dechex($i*16), $fg);
    for ($j=0; $j < 16; ++$j) {
        imagechar($img, $font, ($j+4)*$charWidth, ($i+1)*$lineHeight + $padding, chr($i*16+$j), $fg);
    }
}
imagepng($img);
imagedestroy($img);
?>
 
Last edited:

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
Don't neglect to look for fonts with Unicode support. System fonts are installed in "/usr/share/fonts" and "/usr/share/X11/fonts/", but I don't know if any have Unicode support. Search around for open fonts that support Unicode (or even just open fonts). For TTF fonts, you'll need to use imagettftext or, assuming the FreeType 2 library is installed, imagefttext to draw the text (imageloadfont only works with bitmap fonts).
 
Top