Khanh Hoang - Kenn
Kenn is a user experience designer and front end developer who enjoys creating beautiful and usable web and mobile experiences.
If you successfully set up Drupal cron to run regularly, hook_cron() provides a powerful, simple, and useful tool for background task processing independent of page requests. However, the cron API can be easily abused, causing performance or data integrity issues on your site.
Here are some hard-earned best practices for hook_cron() we have learned:
Every call in hook_cron() should be wrapped in a simple variable check that by default will run the process. Doing this lets you disable the process without a code push just by creating and setting a Drupal variable to FALSE. This is very helpful if your process grows out of control and starts consuming system resources.
/**
* Implements hook_cron().
*/
function example_cron() {
if (variable_get('example_process_users_during_cron', TRUE)) {
module_load_include('inc', 'example');
example_process_users();
}
}
Drupal is heavy, man: It loads every .module file of every enabled module on every page request. Only code that has to run on every page request should live in the .module file. Your cron processing code runs infrequently and should never live in the .module file. This is also a good general Drupal module development best practice.
(Hint: This is a really easy way to quickly know if a contrib module developer knows Drupal.)
Even if you do not plan to use Drush to run the cron process you should always define a simple custom Drush command to call your process. Why? Because if (when) your process grows in size you can disable the process during cron and set up a crontask to call your drush command directly, on whatever schedule you need, free from the overhead of every of cron process.
Doing this is like buying a snow blower in Boston: If you don’t buy one you will need it; if you do, you won’t. (Just kidding! It snows a bunch here and you will always need it. Please send help, it’s still cold here.)
You would do something like this in a file called example.drush.inc file:
/**
* Implements hook_drush_command().
*/
function example_drush_command() {
return array(
'example-process-users' => array(
'description' => dt('Process the user accounts.'),
'alias' => array('epu'),
),
);
}
/**
* Process user accounts.
*/
function drush_example_example_process_users() {
module_load_include('inc', 'example');
example_process_users();
}
When you plan the routine you are going to write to do your processing don’t think about the amount of data you have to process today or next month, think about the amount of you data you have to process in a year. Then double it. Make sure your process is going to scale with time.
This usually means writing code that will process chunks of records at a time (set the amount as a variable so you can change this later without a code push), or even better, populate and process one record at a time from a queue.
Drupal’s hook_cron() is a great place to handle simple, regular tasks. But if you are looking at processing large amounts of data or doing anything complex, you should use a tool designed to solve that problem. Consider Drupal 7’s Queue system or a better, non-Drupal tool like Jenkins CI.