@shakedko
IF AN EXPERT SAYS IT CAN'T BE DONE GET ANOTHER EXPERT.
- DAVID BEN-GURION

Wordpress - avoiding wpautop method

Long time ha?

So I`m working on a Wordpress plugin for a while, and I found some common issues with plugins that need to be integrated with Wordpress's posts.

One of the main issues is the wpautop method.

Description

From Wordpress's website:

Changes double line-breaks in the text into HTML paragraphs (<p>...</p>).

WordPress uses it to filter the content and the excerpt.

Now lets dig a bit more inside ha?

<?php
function wpautop($pee, $br = 1) {
    if ( trim($pee) === '' )
            return '';
    $pee = $pee . "\n"; // just to make things a little easier, pad the end
    $pee = preg_replace('|<br />\s*<br />|', "\n\n", $pee);
    // Space things out a little
    $allblocks = '(?:table|thead|tfoot|caption|col|colgroup|tbody|tr|td|th|div|dl|dd|dt|ul|ol|li|pre|select|option|form|map|area|blockquote|address|math|style|input|p|h[1-6]|hr|fieldset|legend|section|article|aside|hgroup|header|footer|nav|figure|figcaption|details|menu|summary)';
    $pee = preg_replace('!(<' . $allblocks . '[^>]*>)!', "\n$1", $pee);
    $pee = preg_replace('!(</' . $allblocks . '>)!', "$1\n\n", $pee);
    $pee = str_replace(array("\r\n", "\r"), "\n", $pee); // cross-platform newlines
    if ( strpos($pee, '<object') !== false ) {
            $pee = preg_replace('|\s*<param([^>]*)>\s*|', "<param$1>", $pee); // no pee inside object/embed
            $pee = preg_replace('|\s*</embed>\s*|', '</embed>', $pee);
    }
    $pee = preg_replace("/\n\n+/", "\n\n", $pee); // take care of duplicates
    // make paragraphs, including one at the end
    $pees = preg_split('/\n\s*\n/', $pee, -1, PREG_SPLIT_NO_EMPTY);
    $pee = '';
    foreach ( $pees as $tinkle )
            $pee .= '<p>' . trim($tinkle, "\n") . "</p>\n";
    $pee = preg_replace('|<p>\s*</p>|', '', $pee); // under certain strange conditions it could create a P of entirely whitespace
    $pee = preg_replace('!<p>([^<]+)</(div|address|form)>!', "<p>$1</p></$2>", $pee);
    $pee = preg_replace('!<p>\s*(</?' . $allblocks . '[^>]*>)\s*</p>!', "$1", $pee); // don't pee all over a tag
    $pee = preg_replace("|<p>(<li.+?)</p>|", "$1", $pee); // problem with nested lists
    $pee = preg_replace('|<p><blockquote([^>]*)>|i', "<blockquote$1><p>", $pee);
    $pee = str_replace('</blockquote></p>', '</p></blockquote>', $pee);
    $pee = preg_replace('!<p>\s*(</?' . $allblocks . '[^>]*>)!', "$1", $pee);
    $pee = preg_replace('!(</?' . $allblocks . '[^>]*>)\s*</p>!', "$1", $pee);
    if ($br) {
            $pee = preg_replace_callback('/<(script|style).*?<\/\\1>/s', '_autop_newline_preservation_helper', $pee);
            $pee = preg_replace('|(?<!<br />)\s*\n|', "<br />\n", $pee); // optionally make line breaks
            $pee = str_replace('<WPPreserveNewline />', "\n", $pee);
    }
    $pee = preg_replace('!(</?' . $allblocks . '[^>]*>)\s*<br />!', "$1", $pee);
    $pee = preg_replace('!<br />(\s*</?(?:p|li|div|dl|dd|dt|th|pre|td|ul|ol)[^>]*>)!', '$1', $pee);
    if (strpos($pee, '<pre') !== false)
            $pee = preg_replace_callback('!(<pre[^>]*>)(.*?)</pre>!is', 'clean_pre', $pee );
    $pee = preg_replace( "|\n</p>$|", '</p>', $pee );
    return $pee;
}

You may find more at http://core.trac.wordpress.org/browser/tags/3.3.1/wp-includes/formatting.php

As you can see, this function changes a lot of things in Wordpress's posts, such as:

  • Adding \n after (almost) each HTML tag
  • Fixing new lines to cross platform newlines (\r\n to \n)
  • Changing the \<object> tag
  • Adding \<p> tags.
  • Cleaning empty \<pre> tags

So I marked the most important issue about this function "Adding \<p> tags".

Why is it so important for me?

As I wrote before it seems that one of the biggest problems for developers that develop Wordpress plugins is the wpautop plugin. When they are trying to modify the user's posts, they always need to struggle with the \<p> tags and sometimes they don't know how to handle them right.

Suggestions...

What to do

If I remember correctly , wpautop priority is set to 11 by default.

I have split my solution into 4 steps:

  1. Mark your posts with your own "signature"
  2. Create your own HTML template and don't use \<p> tags
  3. Changing the_posts
  4. Removing the \<p> tags from my plugin's HTML
How?
Marking your posts:

Include special statement inside post_content, for example when adding new post you may add a text "my-plugin-text-needs-to-be-manipulated", so the user's post will look like:

This is a new post

my-plugin-text-needs-to-be-manipulated

Thank you all!

Changing the posts:

http://pastebin.com/w58WqjHP

<?php
/**
 * Require more work but does the job.
*/
class Posts {
        const CATCH_POST_PRIORITY = -32767;
        const TEXT_TO_CHANGE = 'my-plugin-text-needs-to-be-manipulated';
        /*
         * Manipulate our plugin's text\data
         * @param array $posts
         * @return array
         */
        public function manipulate($posts){
            $newPosts = array();
            foreach ($posts AS $post) {
                if (false === strpos($post->post_content,self::TEXT_TO_CHANGE)
                     || !self::IsPreviewMode($post->ID)){
                    $newPosts[] = $post
                    continue ;
                }
            //remove new lines
            //TODO: extract your data to your HTML (real plugin uses a View object and render's the data)
            $html = str_replace("\n","",file_get_content("some.html"));
            //assign back to post content
            $post->post_content = str_replace(self::TEXT_TO_CHANGE,$html,$post->post_content);
            $newPosts[] = $post;
        }
        return $newPosts;
    }
    /**
     * Check if we are in preview mode of current post id
     * @param int $postID
     * @return bool
    */
    private static function IsPreviewMode($postID){
        return isset($_GET['preview']) && isset($_GET['p']) && $_GET['p'] != $postID;
    }
}
$posts = new Posts;
add_action('the_posts' , array($posts, 'manipulate'), Posts::CATCH_POST_PRIORITY);
Removing the \<p> tags

http://pastebin.com/afk9SJNk

<?php
/**
 * DOM Handler
*/
class Extended_DOMDocument extends DOMDocument {
    /**
     * @var DOMElement $pluginStauts
    */
    private $pluginStatus = false;
    const CONTAINER = "our_plugins_class_wrapper";
    const UTF8_META = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />';
    public function __construct($post){
        parent::__construct("1.0", "UTF-8");
        if (!@$this->loadHTML(self::UTF8_META.$post)){
            return ;
        }
        $this->isPluginInvolved();
    }
    private function isPluginInvolved(){
        $this->pluginStatus = $this->getElementByClassName(self::CONTAINER);
    }
    /**
     * Remove p tags
     * @return string
     */
    public function removeWpAutoP(){
        $content = $this->saveHTML();
        if (!$this->pluginStatus){
            return $content;
        }
        $html = self::DOMinnerHTML($this->pluginStatus);
        return str_replace($html,preg_replace('#<p>(.*?)<\/p>#','$1',$html),$content);
    }
    /**
     * @see http://stackoverflow.com/questions/5404941/php-domdocument-outerhtml-for-element
     */
    public static function DOMinnerHTML($n, $outer=true) {
        $d = new DOMDocument('1.0');
        $b = $d->importNode($n->cloneNode(true),true);
        $d->appendChild($b); $h = $d->saveHTML();
        // remove outter tags
        if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
        return $h;
    }
    /**
    * Get all elements that have a tag of $tag and class of $className
    *
    * @param string $className The class name to search for
    * @param string $tag       Tag of the items to search
    * @return array            Array of DOMNode items that match
    */
    public function getElementsByClassName($className, $tag="*") {
        $nodes = array();
        $domNodeList = $this->getElementsByTagName($tag);
        for ($i = 0; $i < $domNodeList->length; $i++) {
            $item = $domNodeList->item($i)->attributes->getNamedItem('class');
            if ($item) {
                $classes = explode(" ", $item->nodeValue);
                for ($j = 0; $j < count($classes); $j++) {
                    if ($classes[$j] == $className) {
                        $nodes[] = $domNodeList->item($i);
                    }
                }
            }
        }
        return $nodes;
    }
    /**
     * Convenience method to return a single element by class name when we know there's only going to be one
     *
     * @param string $className The class name to search for
     * @param string $tag       Tag of the items to search
     * @return array            Array of DOMNode items that match
     */
    public function getElementByClassName($className, $tag="*") {
        $nodes = $this->getElementsByClassName($className, $tag);
        return count($nodes) == 1 ? $nodes[0] : $nodes;
    }
}
// setting this filter to priority 12
add_filter('the_content' , array('removeWpAutoP'), 12);
/**
 * remove auto wp p tags
 * @param string $content
 * @return string
 */
function rm_wpautop($content) {
    $dom = new Extended_DOMDocument($content);
    $content = $dom->removeWpAutoP();
    return $content;
}

What not to do

It's pretty easy -don't remove the filter.

Why?

Because removing the filter most of the time creates bugs and problems with other posts and plugins. For example, when disabling the function you would notice that sometimes the post won't have any newlines at all.

Summarize

Using Wordpress's wpautop making our life very hard. As I see it, Wordpress should give their developers more freedom while writing their plugins. This might not be the most useful way to write this fix, but then you may ask if its useful to use Wordpress :)

Some of the functions were found on Google or in different websites, I didn't write them all although I did write the "logic". I have tried to keep my sources written in the code's comments (such as stackoverflow), so if you would find something that belongs to you, please contact me to add your name or remove the script.

Hope this guide will help other as it helped me.

-Shak

Work In Progress 🚧
Discipline