Development

Lesson Learned: PHP Memory Limits

If there is one thing I really love to do, it’s migrations. While the requirements for a migration vary from project to project, the purpose is generally the same: Get data from point A to point B in a reasonable amount of time, and automate it as much as possible. It wasn’t until a few weeks ago that I realized I hadn’t ever recognized the impact of full-on automation, and the implications of forgetting about PHP’s memory.
In this post, I’m going to try and give you some insight on what goes into a migration, the memory implications, and hopefully you can learn from my forgetfulness!


For those of you who don’t know what goes into a migration, let me give you a little insight. You might be saying to yourself, “Getting data from one point to another should be easy!” and you would be partly right. It’s not the moving of data that makes it difficult; WordPress has a vast number of utilitarian methods to do drop data into its database. To name a few well-known methods…

And more!

However, the challenging portion of a migration is ensuring the data you get is readable by PHP and that you can walk through it and import it properly into WordPress. These days, you’ll often see REST API’s which use JSON or XML format. Before that, CSV was more common (and is still sometimes used today).

Walking Through Things – Recursion

The first and most formidable weapon in any migration is recursion. Recursion is the ability to loop over a result set and perform actions for each result, or set of results. With a JSON feed, it’s simple–grab your feed data, json_decode() the feed, and then use foreach() loop over each object handling it accordingly. With regular JSON result set, this is never a problem; feeds usually come in less than 1MB sizes, and in some rare cases you may have feeds which exceed 10MB in size.

Where I Faltered

As I said, small files aren’t a problem when you are decoding JSON data; it is the large files that get you. Of course, that is entirely dependent on the size of your PHP application. In this migration script I was working on, I was getting information in the following manner:

  1. wp_remote_get() to grab JSON feed
  2. json_decode() to put the feed into an array
  3. Use custom recursion method to walk through the data and import it into WordPress

Looks pretty straightforward, right? For small files, yes, it is.

The problem is that the file wasn’t small, rather it was one of those rare 100MB+ JSON files with no paging whatsoever. While not entirely evident at the time, the main problem was found in points #1 and #2.

Memory Management

With wp_remote_get() you receive an array of results from the HTTP request, which is stored in memory.  When you json_decode(), that variable is stored in memory. Anything you assign a variable to is allocated to memory. When you initialize a class, it’s stored in memory, so this means you MUST be careful on what you store, and don’t blindly hold onto variables.

The simplest and most effective method is to unset() variables when you’re finished with them. For arrays, you can use array_shift() to clean up the array while you walk through it, progressively lessening the memory footprint in your loops.

Dealing with HUGE JSON files

When dealing with JSON files, json_decode() will only get you so far. When you reach the large file-sizes like I have,  you need another solution. Streaming is the obvious choice–this is the act of parsing a JSON file either line-by-line or character-by-character.

For this, I have to give a huge shout-out to Janek Lasocki-Biczysko on GitHub for the JSON Character Input Reader. This nifty little script reads the JSON file character-by-character and allows you to parse each object individually, instead of the entire file at once, which prevents the need to load the entire file into memory.

Conclusion

DO NOT load a 100MB+ file into memory with json_decode(). Watch your loops and make sure you are cleaning up after yourself. Use unset() when you’re finished with the variable, or you’ve moved the data to another variable. Above all, if you’re ever curious about how much memory your loop is using, drop a memory_get_usage() method in there.

If you have some extra advice for PHP memory management, by all means do share.

Comments

3 thoughts on “Lesson Learned: PHP Memory Limits

  1. A Little late to the game, but still… I have recenly released stable version of a tool, which parses json stream of any size (or big json file) one item at a time and it’s easily and intuitivlely usable via foreach. No memory leaks and reasonable speed and stability. It uses json_decode under the hood, so it should decode the same. Hope it helps. https://github.com/halaxa/json-machine

Have a comment?

Your email address will not be published. Required fields are marked *

accessibilityadminaggregationanchorarrow-rightattach-iconbackupsblogbookmarksbuddypresscachingcalendarcaret-downcartunifiedcouponcrediblecredit-cardcustommigrationdesigndevecomfriendsgallerygoodgroupsgrowthhostingideasinternationalizationiphoneloyaltymailmaphealthmessagingArtboard 1migrationsmultiple-sourcesmultisitenewsnotificationsperformancephonepluginprofilesresearcharrowscalablescrapingsecuresecureseosharearrowarrowsourcestreamsupporttwitchunifiedupdatesvaultwebsitewordpress