Simple PHP Function to Remove Invalid Characters from a String

I’ve run into situations where I’ve installed content management systems for customers who like to add their own content and/or copy content from documents they’ve created. Often, this results in them copying non-ASCII characters such as smart quotes, elipsis, or em dashses. I’m not sure why (maybe someone can educate me by posting a comment below) that PHP can’t handle these characters, but I’ve come up with a way to replace these characters with characters or character sequences that PHP understands. The function is below.

function cleanString($string) {
  $find[] = '“';  // left side double smart quote
  $find[] = '”';  // right side double smart quote
  $find[] = "‘";  // left side single smart quote
  $find[] = "’";  // right side single smart quote
  $find[] = '…';  // elipsis
  $find[] = '—';  // em dash
  $find[] = '–';

  $replace[] = '"';
  $replace[] = '"';
  $replace[] = "'";
  $replace[] = "'";
  $replace[] = '...';
  $replace[] = '-';
  $replace[] = '-';

  return str_replace($find, $replace, $string);
}

The function essentially is a very simple string replacement that attempts to match an invalid character with a valid character and output the change.  This will prevent the weird diamonds or boxes that you may be seeing in text output using “echo” in php.



One Response to “Simple PHP Function to Remove Invalid Characters from a String”

  1. parveen sharma says:

    How do you remove characters such as “�” from this string, as well as the html code?

Leave a Reply