If you use str_split()
on a string that has characters that contain more than one byte you will notice the results are odd. For example, when using UTF-8, the accented character in the word Café takes up two bytes. If you run str_spit()
on this you will get this following array: [0] => C [1] => a [2] => f [3] => � [4] => �
. This is because str_split()
is not multibyte-safe.
Note that if you don't understand why you get these results read my blog post “What you need to know about PHP’s internal character encoding”.
Here is a UTF-8 multibyte-safe version of str_split()
:
function split($str, $len = 1) {
$arr = [];
$length = mb_strlen($str, 'UTF-8');
for ($i = 0; $i < $length; $i += $len) {
$arr[] = mb_substr($str, $i, $len, 'UTF-8');
}
return $arr;
}
print_r(str_split('Café')); // Wrong! Array ( [0] => C [1] => a [2] => f [3] => � [4] => � )
print_r(split('Café')); // Right! Array ( [0] => C [1] => a [2] => f [3] => é )
print_r(split('Café', 3)); // Array ( [0] => Caf [1] => é )
Tim Bennett is a freelance web designer from Leeds. He has a First Class Honours degree in Computing from
Leeds Metropolitan University and currently runs his own one-man web design company, Texelate.