Inserting chinese text into mysql

diabolo

Community Advocate
Community Support
Messages
1,682
Reaction score
32
Points
48
PHP:
<?php
define ('ROOT', '../');
include ROOT.'config.php';

if(isset($_POST['submit'])) {
   $chapters = $_POST['chapters'];
   $class    = $_POST['class'];
   
   $chapters = str_replace('課', '課 <br />', $chapters);
   $chaptersRefined = str_replace('課', '', $chapters);
   $chaptersRefined = str_replace('第', '', $chaptersRefined);
   
   $chaptersArray        = explode('<br />', $chapters);
   //echo '<br />';
   //echo $chaptersArray[0];
   //echo '<br />';
   $chaptersRefinedArray = explode('<br />', $chaptersRefined);
   //echo '<br />';
   //echo $chaptersRefinedArray[0];
   
   $i = 0;
   $max = (count($chaptersArray))-1;
   while($i < $max) {
      if($i == $max-1) {
         $values .= "('".$class."', '".$chaptersRefinedArray[$i]."', '".$chaptersArray[$i]."')";
      } else {
         $values .= "('".$class."', '".$chaptersRefinedArray[$i]."', '".$chaptersArray[$i]."'),";
      }
   $i++;
   }
   //echo $values;
   
   $sql = 'INSERT INTO '.dbPre.'textbook (class, sTitle, fTitle) VALUES '.$values;
   //echo '<br />';
   //echo $sql;
   if((mysql_query($sql)) or die(mysql_error()))  
      echo 'Entered: <br /><br />';
      
   echo '<br />';
   echo '<br />';
   echo $class;
   echo '<br />';
   echo '<br />';
   echo $chapters;
   echo '<br />';
   echo $chaptersRefined;
} else {
   echo '<form name="book" method="post" action="#"><input type="text" name="class" /><br /><textarea cols="50" rows="4" name="chapters"></textarea><br /><input type="submit" name="submit" value="Submit"></form>';
}

?>

here is my code. I enter in the class and chapter list, then the script cleans it all up and prepares it to enter it into the database.

It enters the information into the db, the only problem is: it enters it in as unicode and not in chinese.

my connect.php
PHP:
<?
// connect to the database server
if (!($db = mysql_pconnect($dbPort, $dbUsername , $dbPassword))){
   die("Can't connect to database server.");    
} else {
   // set character encoding
   mb_internal_encoding("UTF-8"); 
   mysql_query('SET NAMES "utf8"');
   // select a database
   if (!(mysql_select_db($dbName, $db))){
      die("Can't connect to database.");
   }
}
?>
 

marshian

New Member
Messages
526
Reaction score
9
Points
0
Chinese characters are only available in Unicode, so storing the data as Unicode should have the effect you need. I don't understand where you have a problem? Does the data come out wrong?
 

diabolo

Community Advocate
Community Support
Messages
1,682
Reaction score
32
Points
48
I'm sorry, totally my mistake I did not fully understand what I was talking about.

I ment to say it is entered as a 'NCR' (Unicode numerical character references)
 

marshian

New Member
Messages
526
Reaction score
9
Points
0
Just to make sure we understand each other, 課 (whatever that may mean xD) would translate into something like , right?

Let's see, troubleshooting time. I've got two suspects at the moment:
1. What's the character set for the field you're storing the data in?
2. HTTP might not be able to work with these characters, causing *any* browser to change the characters to this notation.

But since it's around 2.15 AM for me at the moment I'll look into it tomorrow :)
 

diabolo

Community Advocate
Community Support
Messages
1,682
Reaction score
32
Points
48
Just to make sure we understand each other, 課 (whatever that may mean xD) would translate into something like , right?
not to be cynical, but more politically correct, it would appear as:
Code:
 &# 35506; (without space)

Let's see, troubleshooting time. I've got two suspects at the moment:
1. What's the character set for the field you're storing the data in?
2. HTTP might not be able to work with these characters, causing *any* browser to change the characters to this notation.

I'm not sure what you meant by number 2. But I've added
HTML:
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
to my page, since I usually include it in my header.php

this then lead to my whole script breaking down. :eek4:

I think I have isolated it to this section:
PHP:
// DO NOTE: the Chinese displayed here is actually in NCR form (&# 35506;)
//$chapters = str_replace('課', '課 <br />', $chapters);
//$chaptersRefined = str_replace('課', '', $chapters);
//$chaptersRefined = str_replace('第', '', $chaptersRefined);

that before it would be able to search for "課", but now it can't due to the meta tag I inserted.
So, new problem:
How can I do a str_replace, or similar, for a Chinese character?

LOL I think I solved this on my own already. xD
I was not able to copy and paste the Chinese before into the script, but I copied it from here and it worked. xD so I think the script should be stable.

Just out of curiosity. Is this the best way to do a str_replace with a Chinese character? Have the actual character in the script or is there another more stable way?
 
Last edited:

marshian

New Member
Messages
526
Reaction score
9
Points
0
Haha, good to hear you managed to solve it :)
The script probably broke down because you used UTF-8 to send the data, which should be able to display the Chinese characters without using NCR, while you were using NCR's in your script. Copying the Chinese characters to your PHP probably defaulted to UTF-8, so the two are binary equivalent again.

According to php.net, str_replace is binary-safe, so it shouldn't be a problem to use it for this.

Good luck :)
 

diabolo

Community Advocate
Community Support
Messages
1,682
Reaction score
32
Points
48
well everything runs great, I just need help sanitizing the inputs.
第一課


第二課


第三課


第四課

第五課


第六課


第七課


第八課

第九課


第十課


第十一課


第十二課
that is what I copy into the textarea. so when it enters it into the DB it also enters the spaces. I didnt catch it before because all the extra whitespace gets condesed. so it gets entered in as this:



第二課

PHP:
$sTitle = trim($chaptersRefinedArray[$i], " ");
    $fTitle = trim($chaptersArray[$i], " ");
I tried doing a trim to get rid of the spaces, but that did work.
 

marshian

New Member
Messages
526
Reaction score
9
Points
0
Looking at your explanation it's not completely clear to me what you want to achieve, sorry.

Is my assumption correct that you would want with the above input that the following variables have these values when you fill in $values?
Code:
$chaptersArray = array("第一課", "第二課", "第三課", "第四課", "第五課", "第六課", "第七課", "第八課", "第九課", "第十課", "第十一課", "第十二課");
$chaptersRefinedArray = array("一", "二", "三", "四", "五", "六", "七", "八", "九", "十", "十一", "十二");
 

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
trim works fine for me:
PHP:
$chapters = str_replace('課', '課 <br />', $chapters);
$chaptersRefined = str_replace(array('課', '第'), '', $chapters);

$chaptersArray = array_map('trim', explode('<br />', $chapters));
$chaptersRefinedArray = array_map('trim', explode('<br />', $chaptersRefined));

If you remove the space from the replacement string and change the str_replace to a preg_replace, you can do away with the trim:
PHP:
$chapters = preg_replace('/課\n*/', "課<br />\n", $chapters);
$chaptersRefined = preg_replace('/[課第]/', '', $chapters);
...
$chaptersArray = explode("<br />\n", $chapters);
$chaptersRefinedArray = explode("<br />\n", $chaptersRefined);

$i = 0;
$maxI = (count($chaptersArray))-1;
// last elements should always be empty, but check before popping them just to be safe.
if (empty($chaptersArray[$maxI])) {
    array_pop($chaptersArray);
    array_pop($chaptersRefinedArray);
    --$maxI;
}

I'm not certain which of preg_replace or str_replace(array(... performs better for creating $chaptersRefined (only measuring will tell), but both should be better than two calls to str_replace.
 
Last edited:

diabolo

Community Advocate
Community Support
Messages
1,682
Reaction score
32
Points
48
Looking at your explanation it's not completely clear to me what you want to achieve, sorry.

Is my assumption correct that you would want with the above input that the following variables have these values when you fill in $values?
Code:
$chaptersArray = array("第一課", "第二課", "第三課", "第四課", "第五課", "第六課", "第七課", "第八課", "第九課", "第十課", "第十一課", "第十二課");
$chaptersRefinedArray = array("一", "二", "三", "四", "五", "六", "七", "八", "九", "十", "十一", "十二");
I am trying to get rid of excess whitespace when I copy in the chapter titles.
Because here is what I copy in.
Code:
[FONT=TSC UKai M TT]第一課[/FONT]
                               [FONT=TSC UKai M TT]第二課[/FONT]
                               [FONT=TSC UKai M TT]第三課[/FONT]
                               [FONT=TSC UKai M TT]第四課[/FONT]
                                         [FONT=TSC UKai M TT]第五課[/FONT]
                               [FONT=TSC UKai M TT]第六課[/FONT]
                               [FONT=TSC UKai M TT]第七課[/FONT]
                               [FONT=TSC UKai M TT]第八課[/FONT]
                                         [FONT=TSC UKai M TT]第九課[/FONT]
                               [FONT=TSC UKai M TT]第十課[/FONT]
                               [FONT=TSC UKai M TT]第十一課[/FONT]
                               [FONT=TSC UKai M TT]第十二課[/FONT]
And yes that is what I want to input in.

trim works fine for me:
PHP:
$chapters = str_replace('課', '課 <br />', $chapters);
$chaptersRefined = str_replace(array('課', '第'), '', $chapters);

$chaptersArray = array_map('trim', explode('<br />', $chapters));
$chaptersRefinedArray = array_map('trim', explode('<br />', $chaptersRefined));
If you remove the space from the replacement string and change the str_replace to a preg_replace, you can do away with the trim:
PHP:
$chapters = preg_replace('/課\n*/', "課<br />\n", $chapters);
$chaptersRefined = preg_replace('/[課第]/', '', $chapters);
...
$chaptersArray = explode("<br />\n", $chapters);
$chaptersRefinedArray = explode("<br />\n", $chaptersRefined);

$i = 0;
$maxI = (count($chaptersArray))-1;
// last elements should always be empty, but check before popping them just to be safe.
if (empty($chaptersArray[$maxI])) {
    array_pop($chaptersArray);
    array_pop($chaptersRefinedArray);
    --$maxI;
}
I'm not certain which of preg_replace or str_replace(array(... performs better for creating $chaptersRefined (only measuring will tell), but both should be better than two calls to str_replace.
I tried to do the preg_replace but that still did not succeed. I left out the $maxI part, because I did not understand it. xP I was just lazy last night.

Also I have never really learned regular expressions so I have no idea what's wrong.
 

marshian

New Member
Messages
526
Reaction score
9
Points
0
Try this bit of code:
(I'm not sure about the 3rd line, the \s might be picked up by the PHP parser. If it complains, change \s to \\s.)
PHP:
// \s can be any whitespace character (\n, \t, " "), the + means once or more.
// Whenever any whitespace is found in the entire string, it's removed.
$chapters = preg_replace("/\s+/", "", $chapters);
$chapters = str_replace("課", "課\n");
$chaptersRefinedArray = str_replace(array("課", "第"), array("", ""), $chapters);

$chaptersArray = explode("\n", $chapters);
$chaptersRefinedArray = explode("\n", $chaptersRefined);
 

diabolo

Community Advocate
Community Support
Messages
1,682
Reaction score
32
Points
48
Try this bit of code:
(I'm not sure about the 3rd line, the \s might be picked up by the PHP parser. If it complains, change \s to \\s.)
PHP:
// \s can be any whitespace character (\n, \t, " "), the + means once or more.
// Whenever any whitespace is found in the entire string, it's removed.
$chapters = preg_replace("/\s+/", "", $chapters);
$chapters = str_replace("課", "課\n");
$chaptersRefinedArray = str_replace(array("課", "第"), array("", ""), $chapters);

$chaptersArray = explode("\n", $chapters);
$chaptersRefinedArray = explode("\n", $chaptersRefined);

Thank you very much! this saved me a lot of time from having to manually enter in the data!
 
Top