For those of you who don’t know what goes into a migration, let me give you a little insight. You might be saying to yourself, “Getting data from one point to another should be easy!” and you would be partly right. It’s not the moving of data that makes it difficult; WordPress has a vast number of utilitarian methods to do drop data into its database. To name a few well-known methods…
However, the challenging portion of a migration is ensuring the data you get is readable by PHP and that you can walk through it and import it properly into WordPress. These days, you’ll often see REST API’s which use JSON or XML format. Before that, CSV was more common (and is still sometimes used today).
Walking Through Things – Recursion
The first and most formidable weapon in any migration is recursion. Recursion is the ability to loop over a result set and perform actions for each result, or set of results. With a JSON feed, it’s simple–grab your feed data,
json_decode() the feed, and then use
foreach() loop over each object handling it accordingly. With regular JSON result set, this is never a problem; feeds usually come in less than 1MB sizes, and in some rare cases you may have feeds which exceed 10MB in size.
Where I Faltered
As I said, small files aren’t a problem when you are decoding JSON data; it is the large files that get you. Of course, that is entirely dependent on the size of your PHP application. In this migration script I was working on, I was getting information in the following manner:
wp_remote_get()to grab JSON feed
json_decode()to put the feed into an array
- Use custom recursion method to walk through the data and import it into WordPress
Looks pretty straightforward, right? For small files, yes, it is.
The problem is that the file wasn’t small, rather it was one of those rare 100MB+ JSON files with no paging whatsoever. While not entirely evident at the time, the main problem was found in points #1 and #2.
wp_remote_get() you receive an array of results from the HTTP request, which is stored in memory. When you
json_decode(), that variable is stored in memory. Anything you assign a variable to is allocated to memory. When you initialize a class, it’s stored in memory, so this means you MUST be careful on what you store, and don’t blindly hold onto variables.
The simplest and most effective method is to
unset() variables when you’re finished with them. For arrays, you can use
array_shift() to clean up the array while you walk through it, progressively lessening the memory footprint in your loops.
Dealing with HUGE JSON files
When dealing with JSON files,
json_decode() will only get you so far. When you reach the large file-sizes like I have, you need another solution. Streaming is the obvious choice–this is the act of parsing a JSON file either line-by-line or character-by-character.
For this, I have to give a huge shout-out to Janek Lasocki-Biczysko on GitHub for the JSON Character Input Reader. This nifty little script reads the JSON file character-by-character and allows you to parse each object individually, instead of the entire file at once, which prevents the need to load the entire file into memory.
DO NOT load a 100MB+ file into memory with
json_decode(). Watch your loops and make sure you are cleaning up after yourself. Use
unset() when you’re finished with the variable, or you’ve moved the data to another variable. Above all, if you’re ever curious about how much memory your loop is using, drop a
memory_get_usage() method in there.
If you have some extra advice for PHP memory management, by all means do share.