Development

Let’s Talk Cron

We’ve all heard about it, and we’ve all had to deal with it: Missed posts! It’s especially painful if you depend on your post content to get out on time, or worse, if you’re scheduling your posts to a social media outlet of some sort! In this post I hope to show you how WP_Cron works, and how it’s kind of weird!  Warning: this gets kinda devvy…brace yourself!

What exactly is WP_Cron?

WP_Cron is not a ‘cron’ at all! A cron is a time-based unix scheduling system that is autonomous–set a scheduled time/set the command and leave it be because the system handles it from here. (If you’ve come here from a search engine, it’s likely that you already know WordPress’ WP_Cron is not a cron.)

So let’s start digging shall we?

So how does it work?

This is the part most don’t go into detail about. You’re usually left wanting more, or at least I was, and I really hate feeling helpless hanging on someone else’s word that WP_Cron is just crap and I should install WP Missed Schedule to fix it. I, for one, dislike the use of plugins, because as a developer I want to know what’s wrong, why, and how to fix it. Using plugins takes away from the fun of it; granted, time is a deciding factor, but if it were a perfect world I would have infinite time.

Now I’m not going to go into how to use WP_Cron for scheduled events, but Chris has a great article titled, Using wp_cron (or “How to cron a cron that’s not really a cron”) which is where you should start if you want to learn how to use cron.

WP_Cron’s life begins in wp-settings.php where wp-includes/cron.php is included, this settings file is included in your wp-config to load all the stuffs! The cron.php file is the main cron API, and is where you’ll find ALL the functions you need to manipulate schedules. The main thing we’ll look at here is the spawn_cron() and wp_cron() methods, this is because these two is where all the magic happens!

The main wp_cron() function, is the calling function for spawn_cron(). The calling function is setup to fire on init every time in wp-includes/default-filters.php via an add_action hook, which is loaded via wp-settings.php as well! We need to break down wp_cron now. So, let’s do that!

Breaking down wp_cron()

To follow along, you’ll need to review the core code on trac. You’ll see that wp_cron first checks to make sure the server’s not requesting wp-cron.php, and goes on to ensure the constant DISABLE_WP_CRON isn’t true. We know that right here, if we set the constant to true in our wp_config.php file, we cannot use this method–something important to keep in mind.

After that check is complete, we see that it gets the cron array, which does a get_option() call for the ‘cron’ option name and loops over them, but only if the first key should be processed, it otherwise skips the entire process. To do this, you see the use of PHP’s function microtime( true ). By setting the float variable to true, this grabs the most accurate time down to the microseconds. To continue to understand this process, you’ll need an example of what the array looks like.

The cron option in the options table is a serialized array, so you will need to get the data yourself (or you can just use mine as an example). To get the data yourself, you’ll need to run this query in your database: select option_value from wp_options where option_name = 'cron' You can un-serialize it online if you don’t feel like writing a quick php script to do it for you. I like to use functions-online for quick things like this.

Now that we have our un-serialized array, chances are it looks something like this (though maybe a bit longer):

array ( 
  1437341283 => array (
    'wp_scheduled_delete' => array (
      '40cd750bba9870f18aada2478b24840a' => array (
        'schedule' => 'daily',
        'args' => array (),
        'interval' => 86400,
      ),
    ),
  ),
  // Maybe more ....
)

The first check is run against the first array key. Cron options are stored in a keyed manner; that first key IS the Unix time stamp of the next event. Since PHP stores array keys in a numerical format, and by default sorts them from lowest to highest, this is a perfect. The check sees if the time stamp (the key) is due for execution by checking against that microtime() method I spoke of before. If time has not passed, then it stops right here. Since this is fired on init every time, we know we’ll hit the time eventually!

At this point we’ve passed the checks, and we know that something has to occur to fire the schedule. So, WordPress starts to loop over the time stamps by using a foreach loop. This loop checks first, if the time stamp it’s currently looping over is greater than the microtime() that was previously set, if so, it breaks the loop, which will only occur after we finish the first event. Now this is where spawn_cron() comes into play. WordPress loops over each of the hooks in the current time stamp. In this case there is only one, wp_scheduled_delete. At this point, aside from checking for callbacks, it doesn’t do much with the hook, but just an FYI, the hook will match your add_action() in your own plugin. So, now on to spawn_cron()

Breaking down spawn_cron()

To follow this section please look at core where you can see the source code. This may be a bumpy ride, so fasten your seat belts! spawn_cron does a couple of things; we first have a few checks for the DOING_CRON constant, or the doing_wp_cron url variable. This is to ensure we’re not in a loop again, but not before making sure we set the microtime() variable for use later. Once we pass that stuff, we see that there’s a sort of locking mechanism in use via transients, more on that in a bit, which checks if the lockout time has lasted longer than 10 minutes past the current time. If so, it resets the lock to zero.

spawn_cron then makes a second check to compare the transient lock time, plus sixty seconds, against the microtime() value I spoke about earlier. If at this point the process has ran too long, it will bail. This is so there is not a function trying to run over top of another function.

At this point, WordPress grabs the cron option from the database again (yes, a second call), makes sure it’s in an array context, and proceeds to do pretty much the exact checks that wp_cron() does already, making sure the first time stamp is available for execution. If it’s not, it will bail. If you’ve enabled ALTERNATE_WP_CRON it will come into play now. From what I can tell all the alternate cron does is check a few extra items, in this case if $_POST isn’t empty or if an AJAX or XML RPC call is in progress, the process will bail. Otherwise alternate cron will redirect you to the URL that was originally requested, and include wp-cron.php

If you’ve not setup alternate cron, these additional checks won’t happen, and we can continue. As you can see in the alternate cron, and outside of it (if not using the alternate cron), the lock that WordPress does is purely by setting the doing_cron time stamp to the current time stamp, provided by the microtime() variable that was set prior, so, you’ll want to be specific. This code here: $doing_wp_cron = sprintf( '%.22F', $gmt_time ); stores the microtime() variable in a transient. When this function is called again, it’s not called on top of itself–following the previous checks up to this point.

At this point we FINALLY get to look at wp-cron.php. We’ve completed all previous checks and balances and WordPress now makes a non-blocking remote post to itself, specifically the wp-cron.php file! The main parameter you’ll want to take note of is the URL key; it adds a URL parameter of doing_wp_cron. (This is our transient time stamp, remember?)

wp-cron.php – Final Destination

This is the meat and potatoes of WP_Cron. As you can tell from the previous sections, this file is the final point for WP_Cron! To follow from here out, you will need to see the wp-cron.php file (or you can open it locally, of course).

First, you’ll see this nifty little gadget called ignore_user_abort(), which will allow the script to continue even if a session is ended (closing the browser). The second check is again to make sure we don’t end up in a loop, similar to previous checks, in addition to setting the DOING_CRON constant and loading WordPress core files. You’ll also see a function called _get_cron_lock(). This will get our lock that was set in the previous sections.

At this point, WordPress AGAIN grabs the cron array from the database, but this time, if for some reason it fails, the current script dies! To be clear, up to this point we’ve made two requests to this data; now this makes three. WordPress checks if the first timestamp is greater than the current microtime(). Not to be confused with other microtime() calls in previous sections, this one is a new variable, and therefore a new time.

Now this is where the transient comes into play. Remember how I said the doing_cron transient IS our time-stamp? This is where it checks against the doing_wp_cron URL key from the spawn_cron() wp_remote_post call, remember? If that key is empty, it’ll set it in the empty checks between lines seventy and eighty.

So we’ve reached the foreach loop at this point and passed all necessary checks, time to FINALLY process the action we requested! The first foreach does exactly what the others do: checks if the timestamp in the cron array is up for execution and then passes on to the sub-loop walking over each hook that’s supposed to fire. Again, this is where your hook name comes into play for your add_action() calls.

If your add_action() call is on a schedule, custom or otherwise, it will first re-schedule the event on the interval specified. WordPress then goes on to un-schedule the current event via wp_unschedule_event() before running it via do_action_ref_array(). However, if the process takes too long (> 11 minutes), it will skip the current task, so newer tasks can run via the next check!

After all this magic is done, it will delete the doing_cron transient, and that’s it!

The Problem

Warning: The following statement is purely speculative, and I’m definitely open to opinions! The wp-cron.php file is where CRON itself fails at times. I mean think about it in these steps:

  1. Reschedule event
  2. Un-schedule current event
  3. NOW process the task

Logically speaking, why would you re-schedule an event, or even un-schedule an event BEFORE you know if the event completed successfully or not? You wouldn’t, would you?

So here’s what happens on high traffic sites. You get a sudden load spike of visitors, let’s say 2,000+, and your site starts to slow down. Your WordPress site has already un-scheduled your currently waiting task, and possibly re-scheduled it based on it’s schedule, then do_action_ref_array() fires from the cron file sending your hook name and arguments.

The Solution – I think!

This method, located in wp-includes/plugin.php uses call_user_func_array(), calls the user’s function and, according to the documentation of call_user_func_array from php.net, it also returns the data from the calling function. This is where we can get hung up, so to recap: we’re un-scheduling and re-scheduling the tasks BEFORE the task is even ran! If an action that is scheduled to run hangs for some reason, we will never know, and our task is removed. This IS the reason for missed posts, as far as I can tell. So why not get the return value of call_user_func_array() from our do_action_ref_array() call?

Anyhow, at this point I could continue to rant on what I believe is the solution, the caveats of said solutions and even the benefits, but I would REALLY love for you guys to share your thoughts. I personally believe we should be returning the function’s return value that we’re calling, and leave it up to the dev to determine if their task has completed or not. What do you think?

Have a comment?

Your email address will not be published. Required fields are marked *