Writing “clean” code is something we talk about a lot in software development. Developing for the web is no different in the pursuit of code that is “clean” than any other type of programming. It’s also something that you could write a book about (and people have). Without getting into a ridiculous amount of detail, there are some pretty simple best practices to live by that will make your code cleaner right now.
What is “clean” code?
Code that is “clean” is readable, documented, and easy-to-understand. If you are staring at a block of code and you have no freaking clue what it’s doing or how it works, it’s probably not you–it’s the code. “Messy” code can come in many forms, but the result is usually the same: code that is incomprehensible to other developers (possibly even the one who wrote it).
You may be familiar with the phrase “spaghetti code.” Spaghetti code is the kind of code that you come across where, as soon as you start picking at one piece of code, everything else starts falling apart. It’s like opening your kitchen cupboard and having all the contents fall on top of you.
Spaghetti code is frequently far too complex for its own good and may have been patched together with solutions to problems that don’t really address the underlying issue. Hopefully, your code doesn’t suffer from these problems but let’s face it, we’ve probably all been there at one time or another.
What’s that smell?
Sometimes you’ll hear developers talk about code “smell”. WTH? What’s a code smell?
Code smell refers to problems in the code or the software program that belie a much deeper issue. If you’ve been using a computer for ten years or more, you’re probably familiar with this:
You’re going about your normal business on your Windows 95 or 98 PC and suddenly your screen turns blue and gives you an indecipherable error message. You are completely baffled. Not only would the error message be unhelpful in telling you what actually happened, it often seemed like you’d see this screen for no reason at all! Randomly your computer would completely freak out and blue screen…usually when you were in the middle of something extremely important and hadn’t saved yet. It’s something even Bill Gates wasn’t immune to during his historic Windows 98 unveiling.
That could be considered an example of code “smell”…you know something’s wrong, but you might not know exactly what yet, and digging into it only leads to a series of rabbit holes that suck you deeper and deeper. Code smell is when you’re looking at a piece of code and think “Hey something’s not right here.” Spaghetti code has a “smell,” but I’ve seen code that was otherwise written cleanly, and still had a smell because it was trying to do too many things.
Clean code, ideally, won’t have a “smell.” These are a few tips I’ve learned for keeping your code from being smelly.
White space is your friend
One way you can make your code immediately more readable is adding more space between blocks of code. Now, I’m not saying you need to hit return after every
single
variable,
but it’s a known fact in the world of design (I know, not a place we code monkeys spend a lot of time) that more white space makes things feel more open and readable. Compare these two code blocks:
function wds_current_user_can_view_doc($post_id,$user_id) { if (!$post_id){return false;} $post = wds_ce_get_doc($post_id); if ('publish' == $post->post_status){return true;} if (current_user_can('manage_options')){return true;} if (!$user_id){$user_id = get_current_user_id();} if ($post->post_author == $user_id){return true;} if (current_user_can('edit_others_dms_documents')){return true;} if (current_user_can('view_others_dms_documents')){return true;} return false; }
function wds_current_user_can_view_doc( $post_id, $user_id ) { if ( ! $post_id ) { return false; } $post = wds_ce_get_doc( $post_id ); if ( 'publish' == $post->post_status ) { return true; } if ( current_user_can( 'manage_options' ) ) { return true; } if ( ! $user_id ) { $user_id = get_current_user_id(); } if ( $post->post_author == $user_id ) { return true; } if ( current_user_can( 'edit_others_dms_documents' ) ) { return true; } if ( current_user_can( 'view_others_dms_documents' ) ) { return true; } return false; }
What changed? In the first example, we have no line breaks between the different conditionals, and the return values are on the same line as the condition. It saves space, sure, but it also feels cramped and it becomes difficult to read. It feels like there’s a lot of stuff going on and I don’t have the patience to go through every line and figure out what it all does.
All we did in the second example was add some space. Give your functions and conditionals some room to breathe. Here’s another example:
function do_messages( $message_id = 0, $message_string = '' ) { if ( ! $message_id && '' == $message_string ) { return; } switch ( $message_id ) { case 0 : echo '<div class="error clear"><p>' . $message_string . '</p></div>'; break; case 1 : echo '<div class="error clear"><p>' . __( 'Document not deleted. The document could not be found.', 'wds-ce-dms' ) . '</p></div>'; break; case 2 : echo '<div class="error clear"><p>' . __( 'Could not perform the requested action.', 'wds-ce-dms' ) . '</p></div>'; break; case 3 : echo '<div class="error clear"><p>' . __( 'You do not have permission to delete this document.', 'wds-ce-dms' ) . '</p></div>'; break; case 4 : echo '<div class="updated"><p>' . __( 'Document deleted successfully. The document was deleted permanently and cannot be restored.', 'wds-ce-dms' ) . '</p></div>'; break; case 5 : echo '<div class="updated"><p>' . __( 'Document trashed successfully. The document may be deleted permanently by an admin.', 'wds-ce-dms' ) . '</p></div>'; break; default: break; } }
This is a simple function that spits out a message based on a value that’s passed into it. You’ll notice that for each case, everything that is conditional to that particular case is grouped together. We can immediately identify the message that displays if a message ID of 1 is passed to the function, the message that displays if the ID of 2 is passed. It flows and it makes sense and it doesn’t need a lot of documentation to explain what it’s doing. We’re not putting line breaks between every line of code, but there are a good amount of breaks between snippets within a function.
WordPress coding standards say that when passing parameters to a function, you should always put a space between the parenthesis or brackets and the value you are passing. Take the example of the current_user_can
function. I could write a current_user_can
check like this:
current_user_can('manage_options');
That’s perfectly correct, programmatically. Even from a code smell point of view, there’s nothing wrong with that line of code. However, this becomes more readable:
current_user_can( 'manage_options' );
White space is like oxygen. Without breaks or empty space, your code will suffocate. Ever tried to read minified JavaScript or CSS? Good luck. It hurts my head just to look at it. On the other hand, looking at the raw, unminified code is like a breath of fresh air.
Think of each block of code as a paragraph of text. Each line of code is a sentence. Within each sentence, there are phrases — these are your individual functions. There’s a reason why the WordPress mantra is Code is poetry. Just like writing a blog post, your code shouldn’t be full of run on sentences, no punctuation and wordsrunningtogether. Instead, your code can be as beautiful, as elegant and as simple as a haiku.
Keep it simple, stupid
The second fundamental rule of clean code is writing small functions that do one thing. Or, at least, as few things as possible. You have a function and there’s half a dozen nested if/else conditions? You’re __doing_it_wrong()
. One rule of thumb (pulled out of the Clean Code book I linked to at the beginning of this post) is if you can’t explain what your function does in one sentence or less, it’s doing too many things.
It could be argued that the do_messages
function posted above is doing too many things because it’s using a PHP switch
. By default, simply because we’re using a switch
, we know it’s going to be doing a bunch of different things. Sometimes you can’t get around doing a couple different things inside your function. I would argue that, since that do_messages
function is immediately returning a message string and nothing else, it’s still pretty clean. My one sentence to describe the function would be “takes a message ID and returns a corresponding string of text.”
Look, I know it’s hard to write small functions. Especially if you’ve gotten into the habit of writing these sprawling functions that are super complex and do lots of things. And maybe you know exactly how they work and where x plugs into y. But as WordPress developers, we are a community, and we are quite often not the only ones looking at our code. You might understand what that 500 line function does, but I don’t, and when a single function is that big, it makes it harder to read. And the thing is, if you look at that 500 line function, or a 1000 line function, or a (God forbid) 2000 line function, you will start to see places where you could break it up. Take this chunk here and put it into its own function. Not only does it make your code cleaner, but it makes it easier to reuse blocks of code later.
Breaking large functions into smaller, simpler ones also makes it easier for other coders to come in behind you and easily grok what your code is doing. As one of our dev leads, Justin Sternberg said on our developer call just this last week “always think about the dev coming in behind you.” That person might be coming from a completely different place, with different experience, and they might know nothing about your code or what this function is doing. Do them a favor and split your code into smaller, easier-to-digest blocks.
Smaller functions are also better for debugging. Let’s say you did something stupid like what I did last week: you forget to add a !
to check a particular variable. That is going to be a huge pain to try to track down in a 2000 line function, but if your function is only 5 lines, it should be a snap.
Clean code is as minimalist and functional as your IKEA office furniture. You know what sets IKEA furniture apart from other furniture? Fewer parts, simpler construction. That’s what your code should strive to be.
Document your code and write self-documenting code
The first time I heard both these things come out of the same person’s mouth, I thought they were nuts. It seems totally contradictory, right? Either your code is documented, or it’s self-documenting. How can you have both things at the same time? Here’s the actual code from those first two code examples in a project we are working on:
/** * Function to check if the passed user should be able to view a document * * @param int $post_id The document we're editing * @param int $user_id The user we're checking * @return bool */ function wds_current_user_can_view_doc( $post_id = 0, $user_id = 0 ) { // post id is REQUIRED if ( ! $post_id ) { return false; } // get the post object for the passed post id $post = wds_ce_get_doc( $post_id ); // if the post is published, return true if ( 'publish' == $post->post_status ) { return true; } // if the current user is an admin, return true if ( current_user_can( 'manage_options' ) ) { return true; } // if no user id is passed, assume the current user if ( ! $user_id ) { $user_id = get_current_user_id(); } // if the author of the post is the passed (or current) user, they can view if ( $post->post_author == $user_id ) { return true; } // if the current user can edit other users' documents, they can view if ( current_user_can( 'edit_others_dms_documents' ) ) { return true; } // if the current user can view other users' documents, they can view // NOTE: this capability doesn't exist (yet) if ( current_user_can( 'view_others_dms_documents' ) ) { return true; } // all others return false return false; }
It’s easy to see what’s changed, right? There’s inline documentation. There’s a docblock at the top, which tells you what the function does, what the parameters are and what they’re for, and what type of value the function returns. There’s also comments for each check that say what each thing is for. You should be able to walk through the function and figure out pretty quickly what we’re doing here. None of those comments are absolutely essential. You could probably figure them out without the comments. But the comments help you along. There’s even a comment explaining a particular check against a capability that isn’t in use yet — so if and when that feature is added in the future, you (or the next developer) will know how it’s expected to work. Especially with something like this, where there’s a bunch of different conditions, inline documentation help not only future devs looking at your code, but you looking at your code two weeks from now, to understand what your code is doing.
That’s documenting your code, and that’s easy to grasp. “Self-documenting code” might be a bit more challenging to understand. And maybe how I define self-documenting code is totally different than how you think about self-documenting code. That’s okay, too.
Here’ s a super-simple example of self-documenting code:
function __return_true() { return true; }
There’s no question about what this function does. It’s called __return_true
and it returns true
. You’d be surprised if you used the __return_true
function and got a false
, right? That totally wouldn’t make any sense. Self-documenting code tells you what it does so that there should be no question. In my previous code example, my function is named wds_current_user_can_view_doc
. What does it do? Well, probably it figures out whether a user can view a document. It’s implied from the name. I could have named the function user_doc_check
, but if I did, would it be as easy to understand as current_user_can_view_doc
? Probably not.
There was a time when computers weren’t nearly as powerful as they are now, and it was frowned upon to use really long function and variable names in code because memory was limited, and each additional character used up a valuable byte. But even now, sometimes I’ll see stuff like this:
foreach ( $posts as $p ) { echo sprintf( '<li><a href="%1$s">%2$s</a> (%3$s)</li>', $p->guid, $p->post_title, $p->comment_count ); }
What’s the point of using $p
to describe your post variable than $post
? Okay, so maybe if you used $post
, specifically within a WordPress environment, it might clash with the $post
global, but that doesn’t stop you from using something unique, like, say $single_post
. Let’s look at that again using $single_post
:
foreach ( $posts as $single_post ) { echo sprintf( '<li><a href="%1$s">%2$s</a> (%3$s)</li>', $single_post->guid, $single_post->post_title, $single_post->comment_count ); }
Now I can read that block of code like a sentence. I have a group of posts, and for each post, I’m going to echo something about each single post. Variables can be as long as you want to make them. There’s no reason to shorten them.
But, Chris, using $p
is faster to type!
Okay, sure, short functions and variable names might save you a couple seconds typing, but most decent IDEs will recognize when you’ve used a function or variable name in your code before and autofill it for you, so you’re not really losing that much time. And, if you seriously need to shorten your variables to one letter to optimize your coding as much as possible, you’re going way too fast anyway. Just like it takes time to think about and write a good blog post (ahem), writing good code should take time, too. It shouldn’t be let’s crank this out as quickly as possible because that approach leads to more mistakes.
Think about the best documentation you’ve ever seen. What does it look like? Is it the manual that came with your circa-1990s VCR that told you how to program a record time to watch your favorite show? Probably not. For me, the best documentation looks like this:
The best part of this documentation is that there are no words at all. It’s self-documenting. But even so, there’s a clear destination (you know what you’re expected to do), and the steps are broken out into small pieces that are easy to understand. And yet, despite how simplistic the documentation is, you can build complex things like this:
Instead of writing $ds
and assuming you remember what that stands for in six months, be explicit in your naming conventions–write out $death_star
so it’s obvious what you’re referring to.
Get func-y
In line with keeping things simple, when writing functions, try to limit the number of parameters you pass to your function as much as possible, and always set defaults. This:
function wds_my_cool_function( $things = array(), $count = 0 )
instead of this:
function wds_my_uncool_function( $thing_1, $thing_2, $thing_3, $thing_4, $count, $types, $fruit, $animals, $kitchen_sink )
Limiting the number of parameters your functions accept will help you keep your functions simple. The rule of thumb is three is the absolute most, and even then it should be an exception. More than that and your function is probably doing too many things. If you needed to pass as many parameters as there are in wds_my_uncool_function
, you could easily strip that down to one, like this:
function wds_my_cooler_function( $args = array() )
BOOM. Now, I can pass as many variables as I want into that one $args
parameter, so there’s no reason to have a function that takes nine individual parameters.
Giving your function parameters defaults serves two purposes: it allows you to fail or break out if you have missing or weird input and it gives your parameters context. When I look at wds_current_user_can_view_doc( $post_id = 0, $user_id = 0 )
, I know that both the $post_id
and the $user_id
should be integers. I don’t need the docblock to tell me that. I know, just from the function declaration, how to deal with that type of data inside the function and I know how to check for the type of data that I should be getting. If I got a string instead of an integer in one of those variables, I know something is wrong, and this can prevent type problems in your code later.
The default values you give your variables should reflect the data type those variables will be given. This isn’t a requirement in PHP–you can assign a variable a value of one type, and then change it later and it will be perfectly fine.
$thing = 1; /* ... some code ... */ $thing = 'banana';
But this is confusing for developers and can lead to bugs when your code is expecting one type of data and you’re giving it another. However, if you have a function parameter and no value was passed to it, it will result in an error. Setting it to at least some kind of empty value will prevent errors.
The other thing you can do, since now you’re giving values to your function parameters, is set those parameters to whatever you want the default value to be. The following example, pulled from my Book Review Library plugin, is loosely based on the get_the_term_list function in WordPress core:
/** * Get genres * returns a formatted list of genres (comma-separated by default) with links to each * * @since 1.0.0 * * @param $before string string to display before the genre * @param $after string string to display after the genre (comma by default) * @param $forced boolean by default, if the item is the last in the list, the $after variable * doesn't render. If $forced is set to TRUE it will bypass this and render * it anyway (e.g. if passing $before = '<li>' / $after = '</li>') * @return $genre_list sanitized string of the results */ function get_genres( $before = null, $after = ', ', $forced = false ) { global $post; $genres = get_the_terms( $post->ID, 'genre' ); $genre_list = null; if ( $genres && !is_wp_error( $genres ) ) { $genre_out = array(); foreach ( $genres as $genre ) { $genre_out[] = sprintf( '<a href="%s">%s</a>', home_url() . '/?genre=' . $genre->slug, $genre->name ); } $count = 0; foreach ( $genre_out as $out ) { $genre_list .= $before . $out; $count++; if ( ( count( $genre_out ) > 1 ) && ( $after == ', ' ) && ( count( $genre_out ) != $count ) || $forced ) { $genre_list .= $after; } } } if ( $genre_list ) { return wp_kses_post( $genre_list ); } }
Instead of just passing empty values to the function and setting the defaults inside the function, I’m setting the defaults at the same time I’m adding the parameters to the function and maybe changing them later.
Formatting arrays and repetitive variable declarations
This one is purely cosmetic. Before joining WebDevStudios, if I was going to, for example, register a new post type, I might write something like this:
$capabilities = array( 'publish_posts' => 'publish_book-reviews', 'edit_posts' => 'edit_book-reviews', 'edit_others_posts' => 'edit_others_book-reviews', 'delete_posts' => 'delete_book-reviews', 'edit_post' => 'edit_book-review', 'delete_post' => 'delete_book-review', 'read_post' => 'read_book-review' ); $labels = array( 'name' => __( 'Book Reviews', 'book-review-library' ), 'singular_name' => __( 'Book Review', 'book-review-library' ), 'add_new' => __( 'Add New', 'book-review-library' ), 'add_new_item' => __( 'Add New Book Review', 'book-review-library' ), 'edit_item' => __( 'Edit Review', 'book-review-library' ), 'new_item' => __( 'New Book Review', 'book-review-library' ), 'view_item' => __( 'View Book Review', 'book-review-library' ), 'search_items' => __( 'Search Book Reviews', 'book-review-library' ), 'not_found' => __( 'No book reviews found', 'book-review-library' ), 'not_found_in_trash' => __( 'No book reviews found in Trash', 'book-review-library' ), 'menu_name' => __( 'Book Reviews', 'book-review-library' ), ); $args = array( 'labels' => $labels, 'hierarchical' => false, 'description' => 'Book Review', 'supports' => $supports, 'taxonomies' => array( 'genre', 'review-author' ), 'public' => true, 'show_ui' => true, 'show_in_menu' => true, 'menu_position' => 20, 'show_in_nav_menus' => true, 'publicly_queryable' => true, 'exclude_from_search' => false, 'has_archive' => true, 'query_var' => true, 'can_export' => true, 'rewrite' => true, 'capability_type' => 'book-review', 'capabilities' => $capabilities, 'map_meta_cap' => true ); register_post_type( 'book-review', $args );
Now there’s nothing wrong with this at all. But this is a lot easier to read:
$capabilities = array( 'publish_posts' => 'publish_book-reviews', 'edit_posts' => 'edit_book-reviews', 'edit_others_posts' => 'edit_others_book-reviews', 'delete_posts' => 'delete_book-reviews', 'edit_post' => 'edit_book-review', 'delete_post' => 'delete_book-review', 'read_post' => 'read_book-review' ); $labels = array( 'name' => __( 'Book Reviews', 'book-review-library' ), 'singular_name' => __( 'Book Review', 'book-review-library' ), 'add_new' => __( 'Add New', 'book-review-library' ), 'add_new_item' => __( 'Add New Book Review', 'book-review-library' ), 'edit_item' => __( 'Edit Review', 'book-review-library' ), 'new_item' => __( 'New Book Review', 'book-review-library' ), 'view_item' => __( 'View Book Review', 'book-review-library' ), 'search_items' => __( 'Search Book Reviews', 'book-review-library' ), 'not_found' => __( 'No book reviews found', 'book-review-library' ), 'not_found_in_trash' => __( 'No book reviews found in Trash', 'book-review-library' ), 'menu_name' => __( 'Book Reviews', 'book-review-library' ), ); $args = array( 'labels' => $labels, 'hierarchical' => false, 'description' => 'Book Review', 'supports' => $supports, 'taxonomies' => array( 'genre', 'review-author' ), 'public' => true, 'show_ui' => true, 'show_in_menu' => true, 'menu_position' => 20, 'show_in_nav_menus' => true, 'publicly_queryable' => true, 'exclude_from_search' => false, 'has_archive' => true, 'query_var' => true, 'can_export' => true, 'rewrite' => true, 'capability_type' => 'book-review', 'capabilities' => $capabilities, 'map_meta_cap' => true ); register_post_type( 'book-review', $args );
Similarly, if I’m assigning values to a bunch of variables, I might’ve written it like this:
$singular = $args['singular']; // required $plural = $args['plural']; // required $slug = $args['slug']; // required $show_ui = ( isset( $args['show_ui'] ) ) ? $args['show_ui'] : true; $show_in_nav_menus = ( isset( $args['show_in_nav_menus'] ) ) ? $args['show_in_nav_menus'] : true; $tagcloud = ( isset( $args['show_tagcloud'] ) ) ? $args['show_tagcloud'] : true; $hierarchical = ( isset ( $args['hierarchical'] ) ) ? $args['hierarchical'] : true; $name = ( isset( $args['use_singular_labels'] ) && $args['use_singular_labels'] ) ? $singular : $plural;
…but this looks a lot nicer:
$singular = $args['singular']; // required $plural = $args['plural']; // required $slug = $args['slug']; // required $show_ui = ( isset( $args['show_ui'] ) ) ? $args['show_ui'] : true; $show_in_nav_menus = ( isset( $args['show_in_nav_menus'] ) ) ? $args['show_in_nav_menus'] : true; $tagcloud = ( isset( $args['show_tagcloud'] ) ) ? $args['show_tagcloud'] : true; $hierarchical = ( isset ( $args['hierarchical'] ) ) ? $args['hierarchical'] : true; $name = ( isset( $args['use_singular_labels'] ) && $args['use_singular_labels'] ) ? $singular : $plural;
It goes back to the whole white space is oxygen thing. It’s much easier to read (and it just looks nicer) to write your array values or repetitive variable assignments so they are aligned the same. I’d argue that you can actually see what’s going on more with the consistent spacing. What I will usually do is write the longest variable or array key name first, so I know where my =>
or =
is going to be, and then align all my other values around that one normally with spaces.
Spaces are better than tabs for this because different IDEs will handle tabs and spaces differently; using spaces ensures that it will look the same no matter what IDE is being used. Does it take more time to do this? Sure it does. But, again, coding isn’t a race. Writing good code takes time and giving yourself time to write readable code might reveal something that could have gone unnoticed if you rush it.
Return early and often
Finally, a good rule of thumb for writing clean code is to “return early and often”. This means that if you’re doing some check, don’t make your reader have to process a whole bunch of code to get to a failure notice or a return. Take the following example:
if ( $condition_was_met ) { do_something(); } else { // condition was not met return; }
Let’s say I have a lot more going on than just do_something()
. That could be 20 lines of code that only executes if the condition was met. If we’re thinking linearly, it’s natural to want to deal with what happens when the condition is met first. To take Newton’s observation about gravity as an example, it would be counter-intuitive to think about what doesn’t happen when an apple falls from a tree.
However, if the condition isn’t met, I need to read through those 20 lines of code about something that does not apply to me if I’m only concerned with the else
condition. Consider rewriting that function like this:
if ( ! $condition_was_met ) { // return early if check failed return; } // now that we know that the condition was met, // everything that follows is stuff that happens for that condition do_something();
If I’m looking for what happens if the condition isn’t met, I get that right away. But most of the time, I’m more concerned with what happens when the condition is met. Once the failure is out of the way, I know that everything following the ! $condition_was_met
clause is stuff that’s going to happen as long as the condition is met. Additionally, since we’ve gotten what happens when the condition isn’t met out of the way, I don’t have to constantly think back to what my condition was.
Any time you’re inside a curly bracket, you have to think about what that curly bracket represents.
— Justin Sternberg
Returning early means that if the condition isn’t met, we just stop, we’re done, do not pass Go, do not collect $200, and don’t worry about the rest of the code in this function.
Get to coding!
This is by no means an exhaustive list of ways to clean up your code. But all of these are relatively simple tweaks in your process that can go a long way to writing clean code. I owe a lot to Greg Rickaby, whose onboarding in my first week at WebDevStudios forced me to think more about how my code was written, and who recommended–nay, demanded–that I get a copy of Clean Code because it would “change your life.”
In addition to that, I recommend checking out the Code Standards section of the WordPress Core Developer Handbook for further reading. In addition, 10up and the WordPress.com VIP team have written excellent best practice guides for development.