How I fixed Yoast SEO sitemaps on a large WordPress site

One of my Covered Web Services clients recently came to me with a problem: Yoast SEO sitemaps were broken on their largest, highest-traffic WordPress site. Yoast SEO breaks your sitemap up into chunks. On this site, the individual chunks were loading, but the sitemap index (its “table of contents”) would not load, and was giving a timeout error. This prevented search engines from finding the individual sitemap chunks.

Sitemaps are really helpful for providing information to search engines about the content on your site, so fixing this issue was a high priority to the client! They were frustrated, and confused, because this was working just fine on their other sites.

Given that this site has over a decade of content, I figured that Yoast SEO’s dynamic generation of the sitemap was simply taking too long, and the server was giving up.

So I increased the site’s various timeout settings to 120 seconds.

No good.

I increased the timeout settings to 300 seconds. Five whole minutes!

Still no good.

This illustrates one of the problems that WordPress sites can face when they accumulate a lot of content: dynamic processes start to take longer. A process that takes a reasonable 5 seconds with 5,000 posts might take 100 seconds with 500,000 posts. I could have eventually made the Yoast SEO sitemap index work if I increased the timeout high enough, but that wouldn’t have been a good solution.

  1. It would have meant increasing the timeout settings irresponsibly high, leaving the server potentially open to abuse.
  2. Even though it is search engines, not people, who are requesting the sitemap, it is unreasonable to expect them to wait over 5 minutes for it to load. They’re likely to give up. They might even penalize the site in their rankings for being slow.

I needed the sitemap to be reliably generated without making the search engines wait.

When something intensive needs to happen reliably on a site, look to the command line.

The Solution

Yoast SEO doesn’t have WP-CLI (WordPress command line interface) commands, but that doesn’t matter — you can just use wp eval to run arbitrary WordPress PHP code.

After a little digging through the Yoast SEO code, I determined that this WP-CLI command would output the index sitemap:

wp eval '
$sm = new WPSEO_Sitemaps;
$sm->build_root_map();
$sm->output();
'

That took a good while to run on the command line, but that doesn’t matter, because I just set a cron job to run it once a day and save its output to a static file.

0 3 * * * cd /srv/www/example.com && /usr/local/bin/wp eval '$sm = new WPSEO_Sitemaps;$sm->build_root_map();$sm->output();' > /srv/www/example.com/wp-content/uploads/sitemap_index.xml

The final step that was needed was to modify a rewrite in the site’s Nginx config that would make the /sitemap_index.xml path point to the cron-created static file, instead of resolving to Yoast SEO’s dynamic generation URL.

location ~ ([^/]*)sitemap(.*).x(m|s)l$ {
    rewrite ^/sitemap.xml$ /sitemap_index.xml permanent;
    rewrite ^/([a-z]+)?-?sitemap.xsl$ /index.php?xsl=$1 last;
    rewrite ^/sitemap_index.xml$ /wp-content/uploads/sitemap_index.xml last;
    rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 last;
}

Now the sitemap index loads instantly (because it’s a static file), and is kept up-to-date with a reliable background process. The client is happy that they didn’t have to switch SEO plugins or install a separate sitemap plugin. Everything just works, thanks to a little bit of command line magic.

What other WordPress processes would benefit from this kind of approach?


Do you need WordPress services?

Mark runs Covered Web Services which specializes in custom WordPress solutions with focuses on security, speed optimization, plugin development and customization, and complex migrations.

Please reach out to start a conversation!

The 4 best WordPress hosts of 2016

As a seasoned WordPress developer, I am frequently asked what WordPress web hosts I recommend. There are so many solid choices now! The WordPress ecosystem is truly a bounty of choice in 2016. I could write an exhaustive comparison of all of the options, but these are called “exhaustive comparisons” for a reason. Let’s skip that, and I’ll just tell you the four WordPress hosts I recommend in five distinct tiers.

Note that many of these hosts target a range of sites, from starter sites to enterprise sites, so I am picking the hosts that I think fit each tier of site best, even though they might also work for other kinds of sites.

Starter Site

SiteGround is one of my favorite WordPress hosting companies. They offer a range of hosting solutions, but their WordPress-tailored plans are a tremendously good value and have many WordPress-specific perks. Ask around the WordPress community — SiteGround is a well-respected company that works hard to win and retain the business of WordPress customers. Their plans start as low as $3.95 a month, which is an incredibly good deal. If you aren’t sure what you need, SiteGround is what I would choose.

Take a look at SiteGround’s WordPress hosting plans.

Developer-Friendly Site

What if you know your way around WordPress, want things like Git and WP-CLI access, or want advanced WordPress-friendly caching for your site? SiteGround has you covered there, too. Their GoGeek plan (currently $14.95 a month) offers all of these perks, unlimited sites, WordPress staging sites, and so much more. I love working with GoGeek-level SiteGround sites, because they work really well and give me access to all the tools that I need as a developer. Or, if you’re not a developer, but have hired one to work on your site, you may want to upgrade to GoGeek hosting so she can work at maximum efficiency.

Go sign up for SiteGround’s GoGeek WordPress hosting.

Intermediate Site

WP Engine has been around since 2010, focuses entirely on WordPress hosting, and has established themself as a solid choice in the intermediate range. Their plans start at $29/month and include a 60-day money-back guarantee and free automated migration of your existing WordPress site. WP Engine also has more advanced hosting options, so they’re an option that could grow with you.

Sign up for WP Engine using this link and you’ll save 20% off your first payment.

Professional Site

Pantheon got their start as a Drupal host, but have taken their innovative container-based hosting technology to the WordPress market. As a developer, I appreciate their Git-based development flow, their powerful “Terminus” command line client, and their built-in and dead-simple dev/test/live environments. On the higher level plans, you get “Multidev” which lets you spin up a sandboxed development environment for a specific Git branch. This means you can send clients and testers URLs for testing new features in isolation, before they are merged back into the main code branch. Awesome.

Their professional tier starts at $100/month, which isn’t cheap, but your developers will love their deployment tools, their dev/test/live code staging flows, and their Git-based deploys to the development environment. Pantheon is a great choice for professional WordPress sites that have a developer on staff or on retainer.

Check out Pantheon’s professional WordPress hosting plans.

Enterprise Site

Pagely has been around since 2006! They started the whole WordPress-dedicated hosting marketplace. When they started, they targeted a range of WordPress sites, but now they focus on enterprise hosting. This is where big brands go for custom WordPress hosting solutions. The folks at Pagely know WordPress well, and will be an excellent hosting partner for your enterprise WordPress site. Their VPS solutions start at $499/month, but they also have a shared server plan called Neutrino for $99/month.

Get started with Pagely enterprise hosting.

How I Picked

My method here was simple. I thought about how I answer if a friend or a client asks me for hosting advice. I found that I regarded sites as fitting in one of five categories. Then, I considered which hosts offer the best service and value in those categories, and picked these four hosts. After I had made my picks and written about their benefits, I went to see which of my picks had affiliate programs. Three of them did, and one did not. I used affiliate links for those that offered them, and a direct link for the one that did not. Using affiliate links to sign up for their service will earn me some money, but you can of course just go directly to their sites if you like. I stand by these recommendations, either way. I’ll write a new post in 2017 with my new picks. Let me know on Twitter what hosts are your favorites, and why!

Tips for Hosting WordPress on Pantheon

Pantheon has long been hosting Drupal sites, and their entry into the WordPress hosting marketplace is quite welcome. For the most part, hosting WordPress sites on Pantheon is a dream for developers. Their command line tools and git-based development deployments, and automatic dev, test, live environments (with the ability to have multiple dev environments on some tiers) are powerful things. If you can justify the expense (and they’re not cheap), I would encourage you to check them out.

First, the good stuff:

Git-powered dev deployments

This is great. Just add their Git repo as a remote (you can still host your code on GitHub or Bitbucket or anywhere else you like), and deploying to dev is as simple as:

git push pantheon-dev master

Command-line deployment to test and live

Pantheon has a CLI tool called Terminus that can be used to issue commands to Pantheon (including giving you access to remote WP-CLI usage).

You can do stuff like deploy from dev to test:

terminus site deploy --site=YOURSITE --env=test --from=dev --cc

Or from test to live:

terminus site deploy --site=YOURSITE --env=live --from=test

Clear out Redis:

terminus site redis clear --site=YOURSITE --env=YOURENV

Clear out Varnish:

terminus site clear-caches --site=YOURSITE --env=YOURENV

Run WP-CLI commands:

terminus wp option get blogname --site=YOURSITE --env=YOURENV

Keep dev and test databases & uploads fresh

When you’re developing in dev or testing code in test before it goes to live, you’ll want to make sure things work with the latest live data. On Pantheon, you can just go to Workflow > Clone, and easily clone the database and uploads (called “files” on Pantheon) from live to test or dev, complete with rewriting of URLs as appropriate in the database.

No caching plugins

You can get rid of Batcache, W3 Total Cache, or WP Super Cache. You don’t need them. Pantheon caches pages outside of WordPress using Varnish. It just works (including invalidating URLs when you publish new content). But what if you want some control? Well, that’s easy. Just issue standard HTTP cache control headers, and Varnish will obey.

<?php

function my_pantheon_varnish_caching() {
	if ( is_user_logged_in() ) {
		return;
	}
	$age = false;

	// Home page: 30 minutes
	if ( is_home() && get_query_var( 'paged' ) < 2 ) {
		$age = 30;
	// Product pages: two hours
	} elseif ( function_exists( 'is_product' ) && is_product() ) {
		$age = 120;
	}

	if ( $age !== false ) {
		pantheon_varnish_max_age( $age );
	}
}

function pantheon_varnish_max_age( $minutes ) {
	$seconds = absint( $minutes ) * 60;
	header( 'Cache-Control: public, max-age=' . $seconds );
}

add_action( 'template_redirect', 'my_pantheon_varnish_caching' );

And now, some unclear stuff:

Special wp-config.php setup

Some things just aren’t very clear in Pantheon’s documentation, and using Redis for object caching is one of them. You’ll have to do a bit of work to set this up. First, you’ll want to download the wp-redis plugin and put its object-cache.php file into /wp-content/.

Update: apparently this next step is not needed!

Next, modify your wp-config.php with this:

// Redis
if ( isset( $_ENV['CACHE_HOST'] ) ) {
	$GLOBALS['redis_server'] = array(
		'host' => $_ENV['CACHE_HOST'],
		'port' => $_ENV['CACHE_PORT'],
		'auth' => $_ENV['CACHE_PASSWORD'],
	);
}

Boom. Now Redis is now automatically configured on all your environments!

Setting home and siteurl based on the HTTP Host header is also a nice trick for getting all your environments to play, but beware yes-www and no-www issues. So as to not break WordPress’ redirection between those variants, you should massage the Host to not be solidified as the one you don’t want:

// For non-www domains, remove leading www
$site_server = preg_replace( '#^www\.#', '', $_SERVER['HTTP_HOST'] );

// You're on your own for the yes-www version 🙂

// Set URLs
define( 'WP_HOME', 'http://'. $site_server );
define( 'WP_SITEURL', 'http://'. $site_server );

So, those environment variables are pretty cool, huh? There are more:

// Database
define( 'DB_NAME', $_ENV['DB_NAME'] );
define( 'DB_USER', $_ENV['DB_USER'] );
define( 'DB_PASSWORD', $_ENV['DB_PASSWORD'] );
define( 'DB_HOST', $_ENV['DB_HOST'] . ':' . $_ENV['DB_PORT'] );

// Keys
define( 'AUTH_KEY', $_ENV['AUTH_KEY'] );
define( 'SECURE_AUTH_KEY', $_ENV['SECURE_AUTH_KEY'] );
define( 'LOGGED_IN_KEY', $_ENV['LOGGED_IN_KEY'] );
define( 'NONCE_KEY', $_ENV['NONCE_KEY'] );

// Salts
define( 'AUTH_SALT', $_ENV['AUTH_SALT'] );
define( 'SECURE_AUTH_SALT', $_ENV['SECURE_AUTH_SALT'] );
define( 'LOGGED_IN_SALT', $_ENV['LOGGED_IN_SALT'] );
define( 'NONCE_SALT', $_ENV['NONCE_SALT'] );

That’s right — you don’t need to hardcode those values into your wp-config. Let Pantheon fill them in (appropriate for each environment) for you!

And now, some gotchas:

Lots of uploads = lots of problems

Pantheon has a distributed filesystem. This makes it trivial for them to scale your site up by adding more Linux containers. But their filesystem does not like directories with a lot of files. So, let’s consider the WordPress uploads folder. Usually this is partitioned by month. On Pantheon, if you start approaching 10,000 files in a directory, you’re going to have problems. Keep in mind that crops count towards this limit. So one upload with 9 crops is 10 files. 1000 uploads like that in a month and you’re in trouble. I would recommend splitting uploads by day instead, so the Pantheon filesystem isn’t strained. A plugin like this can help you do that.

Sometimes notices cause segfaults

I honestly don’t know what is going on here, but I’ve seen E_NOTICE errors cause PHP segfaults. Being segfaults, they produce no useful information in logs, and I’ve had to spend hours tracking down the code causing the issue. This happens reliably for given code paths, but I don’t have a reproducible example. It’s just weird. I have a ticket open with Pantheon about this. It’s something in their custom error handling. Until they get this fixed, I suggest doing something like this, in the first line of wp-config.php:

// Disable Pantheon's error handler, which causes segfaults
function disable_pantheon_error_handler() {
	// Does nothing
}

if ( isset( $_ENV['PANTHEON_ENVIRONMENT'] ) ) {
	set_error_handler( 'disable_pantheon_error_handler' );
}

This just sets a low level error handler that stops errors from bubbling up to PHP core, where the trouble likely lies. You can still use something like Debug Bar to show errors, or you could modify that blank error handler to write out to an error log file.

Have your own tips?

Do you have any tips for hosting WordPress on Pantheon? Let me know in the comments!

Ask Mark Anything

People ask me a lot of questions. About WordPress and web development for sure, but also about other topics. I’ve decided to try a little experiment: a public way to ask me questions. Zach Holman from GitHub had the idea to use a GitHub issue tracker for this very purpose, and I think it looks like a splendid idea.

Benefits:

  • Allows for more in-depth discussions than Twitter (but you can still talk to me on Twitter for quick questions).
  • Is public (as opposed to e-mail).
  • Forces me to deal with questions.

Now, note that this doesn’t mean I want you to treat me like your personal Google-searcher or WordPress code grepper! But if you think there is a WordPress (or other) topic that I am uniquely qualified to address, just ask.

Fragment Caching in WordPress

Fragment caching is useful for caching HTML snippets that are expensive to generate and exist in multiple places on your site. It’s like full page HTML caching, but more granular, and it speeds up dynamic views.

I’ve been using this fragment caching class for a few years now. I optimized it around ease of implementation. I wanted, as much as possible, to be able to identify a slow HTML-outputting block of code, and just wrap this code around it without having to refactor anything about the code inside.


<?php
/*
Usage:
$frag = new CWS_Fragment_Cache( 'unique-key', 3600 ); // Second param is TTL
if ( !$frag->output() ) { // NOTE, testing for a return of false
functions_that_do_stuff_live();
these_should_echo();
// IMPORTANT
$frag->store();
// YOU CANNOT FORGET THIS. If you do, the site will break.
}
*/
class CWS_Fragment_Cache {
const GROUP = 'cws-fragments';
var $key;
var $ttl;
public function __construct( $key, $ttl ) {
$this->key = $key;
$this->ttl = $ttl;
}
public function output() {
$output = wp_cache_get( $this->key, self::GROUP );
if ( !empty( $output ) ) {
// It was in the cache
echo $output;
return true;
} else {
ob_start();
return false;
}
}
public function store() {
$output = ob_get_flush(); // Flushes the buffers
wp_cache_add( $this->key, $output, self::GROUP, $this->ttl );
}
}

view raw

gistfile1.aw

hosted with ❤ by GitHub

Implementation is pretty easy, and you can reference the comment at the start of the code for that. The only thing to consider is that any variables that alter the output need to be build into the key. It should also be noted that this code assumes you have a persistent object cache backend.

Translating WordPress Plugins and Themes: Don’t Get Clever

When you use the WordPress translation functions to make your plugin or theme translatable, you pass in a text domain as a second parameter, like so:

<?php _e( 'Some Text String', 'my-plugin-name' ); ?>

This text domain is just a unique string (usually your plugin’s WordPress.org repository slug). Well, many plugin developers see code like this:

<?php _e( 'Another Text String', 'my-plugin-name' ); ?>
<?php _e( 'Yet Another Text String', 'my-plugin-name' ); ?>
<?php _e( 'Gosh, So Many Text Strings!', 'my-plugin-name' ); ?>

And they think to themselves “hm, I sure am typing the 'my-plugin-name' string a lot. I’ll apply the DRY (Don’t Repeat Yourself) principle and throw that string into a variable or a constant!”

Stop! You’re being too clever! That won’t work!*

See, PHP isn’t the only thing that needs to parse out your translatable strings. GNU gettext also needs to parse out the strings in order to provide your blank translation file to your translators. GNU gettext is not a PHP parser. It can’t read variables or constants. It only reads strings. So your text domain strings needs to stay hardcoded as actual quoted strings. 'my-plugin-name'.

Happy coding!

* Well, it won’t break your plugin, but it could make it harder to be used with automated translation tools. And trust me, you don’t want to be manually managing your translation files… we have a better solution coming.

How to write a WordPress plugin that I’ll use

I tend to be very fastidious about the WordPress plugins that I’ll install. I’ll often write my own simple version of a plugin rather than install one from someone else that does a bunch of stuff I don’t need. Here is my philosophy behind writing WordPress plugins, best witnessed through the plugins I’ve written lately, like Markdown on Save, Login Logo, Monitor Pages, and WP Help.

Fewer features as a feature

There are diminishing returns as you add features. That is, the more you add, the more likely you’re adding something that X % of your plugin’s users won’t ever use. Stick to the basics. I’ll often release a “0.1” version of my plugin with really obvious features missing. When I get a flurry of “You should add Y!” messages, that validates my assumption that Y is necessary. Start with the smallest version that gets the core job done. Iterate as needed.

Code the hell out of it

The best part of starting small is that you can code the hell out of the plugin. Do it right. Make each line of code beautiful. Make sure you’re using WordPress APIs properly, and while you’re at it, add i18n support (WP Help 0.2 shipped with support for Bulgarian, German, Spanish, Mexican Spanish, Macedonian, Dutch, Brazilian Portuguese, and Russian!)

Reduce UI

If you can do without UI, don’t make it. Make every bit of UI prove its necessity. As an example, look at my Login Logo plugin. It has zero UI. It looks for the presence of a file named login-logo.png in the wp-content directory. The rest is “magic.” It measures the image, generates appropriate CSS, and gives you an instantly and easily customized login screen. The plugin is invisible. It’s completely out of sight, and out of mind. Finally, UI screens are generally where plugin authors make security mistakes. By skipping them, you make it much more likely that your plugin is secure.

Code it for the future

Don’t use deprecated APIs. Plan features in future-forward ways. Implement it in such a way that a site that is using the plugin doesn’t break if the plugin suddenly goes away. One example of this is my Markdown on Save plugin, which offers per-post Markdown formatting. First, I decided that for performance reasons, I wanted to parse Markdown then the post was updated, not on display. The obvious place to store the generated HTML was in the post_content_filtered column that WordPress provides (but does not use). But then I considered what would happen if someone deactivated the plugin or deleted the plugin. The code that accessed post_content_filtered would not work. Their blog would spit out raw Markdown. And any exports they made would export raw Markdown. What if they were exporting to WordPress.com which doesn’t support Markdown? So I decided to store the Markdown in post_content_filtered, and store the generated HTML in post_content. When you edit a Markdown-formatted post, it swaps in the Markdown, so you can edit that. But if you deactivated the plugin, it would fall back to the HTML. So you can feel free to use this plugin and know that if one day you wake up and you hate Markdown, all you have to do is deactivate the plugin and all of your posts are back to HTML.

Secure it

Writing secure WordPress plugins isn’t hard. It just takes awareness. Take the time to do your research and code a plugin that will be an asset to its users, not a liability.

Developing on WordPress using Git

WordPress uses Subversion (SVN) for revision management. Before Subversion, it used CVS. Right now, Git is a hot option in the SCM category. It offers really nice features such as decentralization, speed, fast and cheap local branching, better merging, more offline capabilities, staging of commits, and lots more. It’s premature to talk about moving WordPress core and plugins to another SCM system — we have a lot invested with Subversion and Trac. But be of good cheer. You can have your Git and commit to Subversion too! Here’s how I do it.

First, tools. You’ll need Git, obviously. But you’ll also need git-svn-diff, a Bash script that generates Subversion-compatible diffs.

Download git-svn-diff, put it somewhere in your path, and make it executable. Like this:

curl -L http://rkj.me/a1 > /usr/local/bin/git-svn-diff
sudo chmod +x /usr/local/bin/git-svn-diff

Next, to enable you to do git svn-diff instead of git-svn-diff, edit ~/.gitconfig and add this:

[alias]
	svn-diff = !git-svn-diff

This next step is going to take a while. You’re going to pull down WordPress’ SVN history using Git’s SVN support.

git svn clone -t tags -b branches -T trunk http://core.svn.wordpress.org/

You might want to let that run overnight. Really. It’s going to go through each changeset.

Once you’re done, you should be in the Git master branch, which corresponds to WordPress SVN’s trunk. WordPress’ branches are in remotes/{name}

To pull in the latest changes from SVN, use git svn rebase. Important rule: never modify the SVN branches (remotes/{name}). Instead, create a new topic branch.

For example, say that I’m going to work on a ticket for trunk. I’d create a new branch from remotes/trunk like this:

git checkout -b ticket-12345 remotes/trunk

That will create a new local Git branch called ticket-12345 based on SVN’s trunk, and then check it out (i.e. switch to it).

If you’re working on a WordPress SVN branch, you can do something like this:

git checkout -b ticket-12345 remotes/3.1

Do your work in the branch you created. You can make multiple local Git commits if you want, to break up your work into smaller chunks that make sense to you.

When you’re ready to submit your patch, use git-svn-diff to produce it.

git svn-diff > ~/12345.diff

If you have commit access, you can commit to Subversion from this topic branch. But be careful! First you should do git svn rebase to bring your patch up to date. Next, you should squash your local git commits, otherwise each one of them will be individually committed to SVN (hello, flood). So rebase your commits into one commit, like so:

git rebase -i remotes/trunk

Use “reword” on the first commit. Use “fixup” on the subsequent ones. That will roll the commits up into one. You’ll then be prompted to enter your amended commit message for that commit amalgam.

Ready? You can now commit to SVN using:

git svn dcommit

Git knows which remote SVN branch it came from when you checked out your topic branch. You can verify which one it is attached to by doing:

git svn info

A few tips:

Create a .gitignore file. This lists files or directories that you want Git to ignore. First, you want Git to ignore the .gitignore file itself! Next, you want Git to ignore your local wp-config.php Finally, you want to ignore any additional plugins, must-use plugins, themes, uploads, etc. Just do a git status and add anything that you don’t want to commit to WordPress or put in your patches.

I hope you found this helpful! Let me know if you have any questions.

Just you and your thoughts

In 2007, I wrote this about the job of software:

That’s when I know WordPress is doing its job: when people aren’t even aware they’re using it because they’re so busy using it!

I cited that more as a direction, than a goal. If the job of software is to get out of the way, it never completely reaches it — it just gets closer and closer. Sort of how dividing a number in half an infinite number of times never quite gets you to zero.

Today, in 2011, I took this screenshot of the Distraction-Free Writing interface for the upcoming WordPress 3.2:

screenshot of WordPress Distraction Free Writing interface. A title, and a body.

How’s that for getting out of your way?

WordPress Q & A: Week of September 27

Ricky asks:

Thanks for your time. I’m working on a site where I’d like members to be able to submit posts, but I’d like to be able to moderate them first before they go live.

Kinda similar to what WP can do for comments, I’d like to do for posts. Is that possible?

Certainly! What you want is to open up registration and make the default role for new users “Contributor” instead of “Subscriber.” Contributors can submit posts for review, but not publish them. They’ll show up as “pending review” in the backend, and will require an Editor or Administrator to publish them. There are even plugins available to facilitate posting from the front end, such as Gravity Forms ($39 and up, GPL).

Allan asks:

I have hit an incredibly frustrating hitch with WP, and that is getting a text file with the content of my posts. I need a single file I can load into page layout software. I know about Blog Booker and Blurb but would like more layout control than those services offer.

If you need this for a bunch of posts on an ongoing basis, I’d create a custom page template and just have it do query_posts('posts_per_page=9999'); (or however many posts you want), and then do a basic loop. Look at a simple theme for inspiration on the template tags… it all depends on how you need it formatted.

Import a Vox blog into WordPress (or almost anything else)

Six Apart is closing the doors on Vox, a blogging service they launched three and a half years ago. You have until September 30th to export your content from Vox, or you’ll lose access to it. Yikes!

They helpfully included a link to WordPress.com’s importer help page. WordPress.com has a Vox importer. What isn’t immediately obvious is that you can use WordPress.com as an intermediary on your way to a final destination. That is, you can import your Vox blog to a temporary WordPress.com blog, and then do an export from WordPress.com. Now you’ll have gold: a WordPress export file. You can take this file and import it into a standalone WordPress site, or a plethora of other blogging tools or services.

I recommend that everyone who has Vox content they want to save do this. Mark your WordPress.com blog as private if you don’t want that to be its final destination — just do it (and soon!) so that you have a copy of your site in a useful and portable format.

APC Object Cache Backend for WordPress

I wrote the original APC Object Cache backend for WordPress back in 2006. Shared it via a link on the wp-hackers list, and until a few days ago, hadn’t touched it since.

It has now been updated to version 2.0.1, and should work more efficiently. It supports increment/decrement, and you can now use it to power Batcache, the whole-page caching engine used on WordPress.com! In fact, for single-server sites, it should perform a lot better than Batcache + Memcached (because Memcached is a separate application and connections from PHP have some latency).

If you have no idea what I’m talking about, read on.

WordPress has a built-in object caching API. It is usually used to store complex data objects or HTML structures which are computationally expensive to create on the fly. By default, the engine is implemented in PHP memory. That means that objects don’t persist in-between requests. The only benefit there is if you’re accessing the same objects multiple times on one request.

Enter persistent object cache backends. These backends store the data objects between page loads, which saves your server a lot of time and speeds up the user experience. APC is both an opcode cache (which speeds up PHP in general) and an in-memory key/value store which persists between page loads. If you have it installed (or can install it), I highly recommend that you do so. Unfortunately this probably won’t be available to anyone on shared hosting — VPS/dedicated only. Once you have APC up and running, as confirmed by phpinfo();, install the APC Object Cache in your WordPress content directory. It is a special kind of plugin, and will not work if placed anywhere else. It also needs to keep its name: object-cache.php

You should notice improved page load times! I’ve seen it improve page generation time 10x.

And what about Batcache? Batcache is a whole-page or “HTML output” caching engine. It stores complete web pages and saves them for later. It needs a persistent object cache backend to function. Furthermore, it needs an object cache backend that supports incrementation (counters, to measure how much traffic each URL is getting), and that does its own object expiration. Memcached has fit the bill, and is the preferred solution for multi-web-server WordPress installs. Now with the new version of the APC Object Cache, there is a preferred single server solution as well.

Note: W3 Total Cache users need not apply — it handles its own object caching, including support for APC.

Why WordPress Themes are Derivative of WordPress

In the past few days, WordPress has become entangled in a debate about WordPress theme licensing. It was specifically centered around Thesis, one of the last notable proprietary theme holdouts. Chris Pearson, who develops and sells Thesis, refuses to license Thesis under the GNU General Public License that applies to WordPress and all WordPress-derived code.

There are many aspects to the ongoing WordPress vs. Thesis brouhaha. Things like respect, copyright, freedom, GPL, ownership. There is no doubt that Thesis lifted lines of code, and even whole sections of code, directly from WordPress. The Thesis developer who did some of the lifting confessed to it (much to his credit).

Some people have changed their opinion on the matter when they learned of that wholesale code copying. That is, they changed their mind about Thesis specifically. But those revelations happened later in the fray, after Chris Pearson and Matt Mullenweg’s discussion on Mixergy about the matter. It is the position of the WordPress core developers that themes cannot be considered wholly original creations even when they don’t copy large sections of code in from WordPress. Theme code necessarily derives from WordPress and thus must be licensed under the GPL if it is distributed.

What is a WordPress theme?

WordPress themes are a collection of PHP files that are loaded together with WordPress and use WordPress functions and access WordPress core data in order to deliver HTML output. A theme may (and almost always does) include CSS files, JavaScript files, and image files. Note that WordPress theme PHP files are not “templates” or “documents” in the way that most people think of those words (though the word “template” is sometimes used, it is not strictly accurate). They are PHP script files that are parsed and run on the same exact level and by the same PHP process as all the core WordPress files.

Is a theme separate from WordPress?

There is a tendency to think that there are two things: WordPress, and the active theme. But they do not run separately. They run as one cohesive unit. They don’t even run in a sequential order. WordPress starts up, WordPress tells the theme to run its functions and register its hooks and filters, then WordPress runs some queries, then WordPress calls the appropriate theme PHP file, and then the theme hooks into the queried WordPress data and uses WordPress functions to display it, and then WordPress shuts down and finishes the request. On that simple view, it looks like a multi-layered sandwich. But the integration is even more amalgamated than the sandwich analogy suggests.

Here is one important takeaway: themes interact with WordPress (and WordPress with themes) the exact same way that WordPress interacts with itself. Give that a second read, and then we’ll digest.

The same core WordPress functions that themes use are used by WordPress itself. The same action/filter hook system that themes use is used by WordPress itself. Themes can thus disable core WordPress functionality, or modify WordPress core data. Not just take WordPress’ ultimate output and change it, but actually reach into the internals of WordPress and change those values before WordPress is finished working with them. If you were thinking that theme code is a separate work because it is contained in a separate file, also consider that many core WordPress files work the same way. They define functions, they use the WordPress hook system to insert themselves at various places in the code, they perform various functions on their own but also interact with the rest of WordPress, etc. No one would argue that these core files don’t have to be licensed under the GPL — but they operate in the same way that themes do!

It isn’t correct to think of WordPress and a theme as separate entities. As far as the code is concerned, they form one functional unit. The theme code doesn’t sit “on top of” WordPress. It is within it, in multiple different places, with multiple interdependencies. This forms a web of shared data structures and code all contained within a shared memory space. If you followed the code execution for Thesis as it jumped between WordPress core code and Thesis-specific code, you’d get a headache, because you’d be jumping back and forth literally hundreds of times. But that is an artificial distinction that you’d only be aware of based on which file contained a particular function. To the PHP parser, it is all one and the same. There isn’t WordPress core code and theme code. There is merely the resulting product, which parses as one code entity.

But it’s still in separate files!

I don’t think that matters. Linux Kernel Modules likewise are in separate files, but they’re considered by most to be derivative. If that argument doesn’t convince you, then note that the vast majority of themes derive from the original WordPress core themes. How they load different PHP subfiles, loop through posts, and get and interact with WordPress data is all covered by the original WordPress core themes, which are explicitly GPL. But I don’t think we need to resort to that argument, because of the way that themes combine with WordPress to form a modified work.

On APIs

WordPress has many external APIs that spit out data. Interacting with these APIs does not put your code on the same level as core WordPress code. These APIs include Atom, RSS, AtomPub, and XML-RPC. Something that interacts with these APIs sits entirely outside of WordPress. Google Reader doesn’t become part of WordPress by accessing your feed, and MarsEdit doesn’t become part of WordPress when you use it to publish a post on your WordPress blog. These are separate applications, running separately, on separate codebases. All they are doing is communicating. Applications that interact with WordPress this way are separate works, and the author can license them in any way they have authority to do so.

This is a wholly different model of interaction than with themes. Themes are not standalone applications. They are scripts that become part of WordPress itself, and interact with WordPress on the same level that WordPress interacts with itself.

Not just our opinion

Drupal and Joomla, both GPL PHP web publishing applications, have come to the same conclusion. To quote from Drupal:

Drupal modules and themes are a derivative work of Drupal. If you distribute them, you must do so under the terms of the GPL version 2 or later.

The Software Freedom Law Center did a specific analysis of the WordPress theme system and determined that WordPress theme code must inherit the GPL.

The PHP elements, taken together, are clearly derivative of WordPress code. The template is loaded via the include() function. Its contents are combined with the WordPress code in memory to be processed by PHP along with (and completely indistinguishable from) the rest of WordPress. The PHP code consists largely of calls to WordPress functions and sparse, minimal logic to control which WordPress functions are accessed and how many times they will be called. They are derivative of WordPress because every part of them is determined by the content of the WordPress functions they call. As works of authorship, they are designed only to be combined with WordPress into a larger work.

This is also the interpretation of the Free Software Foundation, which wrote the GPL. Here is what the FSF has to say in the GPL v2 FAQ. They wrote the license, so you should consider carefully what they have to say when trying to determine the spirit of the GPL:

If the modules are included in the same executable file, they are definitely combined in one program. If modules are designed to run linked together in a shared address space, that almost surely means combining them into one program.

The bolded part is how themes work in WordPress. The fact that themes don’t come bundled with WordPress is merely a result of WordPress’ flexible plugin/theme architecture and how non-compiled scripting languages like PHP work. The PHP opcode is compiled on the fly, so you can distribute portions of a WP+Plugins+Theme application piece by piece, and they’ll dynamically combine into one application.

FAQs

What about making money?

The GPL does not prohibit developers from charging for their code! The overwhelming majority of premium theme developers sell their themes under the GPL (and are making quite a lot of money doing it).

My JS/CSS/Images are 100% original. Do they have to be GPL?

No, they don’t. If they aren’t based on GPL’d JavaScript, CSS, or images, you are not forced to make them GPL. What you could do is offer a theme under a split license. The PHP would be under the GPL, but other static resources could be under some other license.

Won’t people just take my theme without paying for it?

You don’t have to offer it as an open download to everyone. You can put it behind a  pay wall. With regards to people getting it from a friend, or a torrent site, etc — this would happen even with a proprietary license. Do a Google search for “{proprietary theme name} torrent download” and you’ll see what I mean. What people are not able to get without paying is your support. This is a huge part of why people pay for premium themes — they have a knowledgable resource to call on if something goes wrong. That’s something that is irreplaceable, because no one knows the theme better than you do!

What about a developer license?

Theme PHP code must be GPL. You could do a split license on the CSS and JS and other non-GPL static resources. You can license those elements to one site, but let people with a developer license use them on multiple sites. Or, you could have an affiliate program that is only open to people with a developer license. You could seed betas to developers early. You could offer an exclusive developer forum. You could offer enhanced support. There are tons of possible models. The only thing you can’t do is restrict how people can use the GPL-derivative code.

Does this mean that the custom theme I developed for a client is GPL?

No. The GPL is triggered by distribution. Work-for-hire for a client is not distribution. In this case, they would have the copyright on the code. Distributing it would be up to them. As long as they didn’t distribute it, the GPL wouldn’t kick in. Your clients needn’t worry.

Can my employees take my custom theme and distribute it?

No, allowing access to GPL’d code by an employee is not distribution, and the company would have the copyright on the code. A company is considered one entity, so transferring the theme within that entity is not distribution.

What about plugins?

Everything in here applies to plugins as well as themes. The only difference between the two is that you generally have one theme active, but might have multiple plugins active. As far as interactions with WordPress go, they work the same way.

Conclusion

Theme code combines with WordPress code in a way that makes them one functional unit. This is what makes WordPress themes so powerful and flexible. Themes cannot stand by themselves, and are dependent on deeply integrating their code with WordPress core code in order to function at all. They cannot reasonably be considered original works. As such, theme PHP code (and any CSS/JS code that is derived from GPL’d code) must be licensed under the GPL if it is distributed.

If you refuse to abide by that licensing requirement, you forfeit your right to distribute WordPress extensions (themes or plugins). WordPress is copyrighted. If you do not accept WordPress’ license, you still have to respect our copyright. There are other platforms that do not have this licensing restriction (Drupal, Joomla, Movable Type Open Source, and Textpattern are all GPL, so you’d have the same issue with them).

Discussion

Please remember that this is about copyright and respecting the license that the WordPress copyright holders have chosen. It isn’t about money. Premium themes are fine. It is non-GPL themes (a.k.a. proprietary themes) that are the issue. Also realize that WordPress cannot change its license. It is forever locked to the GPL (version 2). Arguments along the lines of “WordPress should allow proprietary themes because of X,” are pointless. We are as much bound by the license as theme developers!