Search the blog

So you have HTML stored in your database. How do you create a plain text introduction from it? Here’s how.

This tutorial assumes you are using the UTF-8 charset
function getplaintextintrofromhtml($html, $numchars) {

    // Remove the HTML tags
    $html = strip_tags($html);

    // Convert HTML entities to single characters
    $html = html_entity_decode($html, ENT_QUOTES, 'UTF-8');

    // Make the string the desired number of characters
    // Note that substr is not good as it counts by bytes and not characters
    $html = mb_substr($html, 0, $numchars, 'UTF-8');

    // Add an elipsis
    $html .= "…";

    return $html;

}

The reason strip_tags only won’t do is because otherwise things like & will output as &. If you’re outputting to a web page you may think you don’t need to destrong the entities but what if your string cut-off point is part way through an entity? This function converts HTML to pure plain text so you can do what you want with it. That does mean if you are outputting to HTML you will need to call htmlspecialchars or htmlentities.

Tim Bennett is a freelance web designer from Leeds. He has a First Class Honours degree in Computing from Leeds Metropolitan University and currently runs his own one-man web design company, Texelate.