Since joining WDS, I’ve had the awesome opportunity to be a part of our internal migrations team and create scripts to help migrate sites for Microsoft to WordPress. It’s an ongoing joke about my joy for migrations because in my initial interview I expressed that I wasn’t too fond of them and plugins were my thing. Boy, was I in for a surprise, because I’ve been studying and writing migrations scripts for almost a year now.
This post is born out of a year of challenges, growth, and my new found love and respect for the beast known as WordPress Migrations. Its purpose is to help those who may be entering this space for the first time or needing to refine their processes become more efficient (and make some more money) doing migrations.
This is not an exhaustive list, but here are ten things that I learned migrating websites to WordPress:
1. Create a Migration Questionnaire
When you accept the challenge of migrating a website, you automatically become the authority of content you don’t own and/or didn’t write, but your client will look to you to ensure that they’re not forgetting or missing anything in the process. On the surface, it may seem unfair, but consider that in most migrations, your client is coming from one of those “other guys” to WordPress for the first time. This means you are the authority and must be proactive in leading the troops. The best way is kicking off your migrations with a questionnaire.
Here are a few questions to get you started:
- Will authors be imported?
- Will comments be imported?
- If so, should comments be hidden or displayed?
- Is there any data that should not be imported?
This is your moment to shine. Take the lead and show your client why they hired you to move their content!
2. Create a Data Mapping Document
If you’re like me, you don’t like taking trips and getting lost, and if you do, you want to at least have a good cell phone signal for your GPS to work. After you’ve had your client complete the questionnaire, you’ll need a map for the data you’re migrating.
Some things to consider in creating your data mapping document:
- If any of the content relates to custom post types, what content will go where?
- What tables in the source data contain users?
Keep in mind that your client should sign off on this document before you begin writing and scripts. This is your road map for not getting lost, and you’ll want an agreement for all parties involved, for clarity’s (and sanity’s) sake.
3. Request an OFFICIAL sitemap
Forty-eight out of fifty states require some form of car insurance or financial waiver in case of an accident. Just like driving ninety miles per hour on the highway, when you’re moving hundreds if not thousands of posts and pages, the risk of losing something along the way is high.
In day to day life, you never plan on being in an accident, but sometimes it just happens, and you want to be covered when it does. Even with a data mapping document and questionnaire, it’s almost inevitable that somebody’s going to bring up a secret section of their website that should have been brought over.
A sitemap is the insurance you need for your migration. Yeah, you may be able to migrate content without it, but it’s just not smart to do. Like car insurance, not only does it protect you, but it also protects the client. By requesting this upfront, you bring both parties to a consensus of what’s coming over and what’s not. Don’t migrate without it!
4. Use a staging server
If you’re not already using a staging server for your projects, this is a good time to start. For those of you that don’t know: Your staging server is a publicly accessible server that allows you to test or revise web pages before they’re made live.
In the world of migrations, your staging server is where you should run your migrations. Here are a few reasons why:
- Your local machine is too slow (period)
- If your migration contains media, you have to now FTP all of that data to your server
- Production is never the place to make edits (before you go live)
- Staging allows the content team to get familiar with WordPress before launch
- You need a place to QA your migration with the client
5. Decide when to “freeze” the content
So the day has arrived and you’re ready to launch, so you visit the current site and your staging environment to compare and notice fifty new posts since your last import. It’s now two hours before launch and you have to migrate those posts, their images, sync your CDN, and take another snapshot. Whew!
All of this could have been avoided if you had of communicated a date for the client to stop creating new content. In your questionnaire or audit, make sure you discuss this and confirm that all parties are in agreement.
TIP: Allow the content team early access to the staging server so they can create new content there before launch.
6. Don’t forget the media
If you’ve limited the definition of media to images, then you’ve already made a huge mistake.
You left the PDFs, docs, spreadsheets, audio, video clips, etc., in some folder that’s going to be lost when your redirects start. As you already know, websites are made of more than text. Sometimes there are videos on YouTube, Vimeo, or sometimes they’re even self-hosted. If the site is older and no online forms were in place, there may be docs and PDFs scattered throughout. Your audit should account for all media types that need to be imported.
7. Say “Goodbye” to “Hello World”
A simple but often overlooked task in migrations is removing the default data in WordPress. Before you begin your migrations, it’s good practice to delete the “Hello World” post, first comment, and “Sample Page.” While it’s not mandatory, it keeps your migrations fresh and is one less task to remember before launch. Here’s a quick snippet to help you out:
function wds_goodbye_hello_world( $blog_id = 1 ) { if ( is_multisite() ) { switch_to_blog( $blog_id ); } // Remove the Hello World post AND comment wp_delete_post( 1, true ); wp_delete_post( 2, true ); wp_delete_comment( 1, true ); if ( is_multisite() ) { restore_current_blog(); } }
8. Store original URLs for redirect
This could be the most important thing to remember.
Imagine that you’ve migrated 5,000 posts for a Fortune 100 company, and you realize that you forgot to store the original URLs.
In case you don’t realize why that’s a bad thing: Consider that when the old URLs are forwarded to your new server, those old fashioned URLs with their extensions will 404. Now that Google has dinged you, and your site is no longer listed as first, you panic only to realize that life would have been better if you had just stored the originals. See, you could have forwarded them to the new pretty permalink and maintained your status on Google. Yes, it’s that serious!
If you’re still not convinced, imagine what you’ll say when your client calls you because a popular bookmark within the company (yes, people still use bookmarks) is gone, and they’re panicking because the secretary (who still uses fax machines and paper memos) can’t find her bookmark to the FAQs to answer questions when people call.
In short, it’s all about communication. You want to guarantee that everyone who’s looking for you today can find you tomorrow. So make sure that you store those URLs, and while you’re at it, save some post ids and other meta just in case.
9. Create a redirection script for WordPress
Remember, your migration is not complete when all of the content is imported, but when all of the old traffic routes to the new site properly. Since we’ve discussed the reason for storing original URLs, you need to create a simple redirection script that will actually process your old traffic. There are a few redirection plugins in the WordPress repo, but here’a sample one we use internally (you’ll need to add your own error checking):
function wds_redirect_old_traffic() { if ( ! is_404() && ! isset( $_SERVER['REQUEST_URI'] ) ) { return; } global $wp; $request = $wp->request; $post_id = wds_get_post_id_from_external_url( $request ); if ( $post_id ) { wp_redirect( get_permalink( $post_id ) , 301 ); exit; } if ( is_multisite() ) { restore_current_blog(); } } add_action( 'template_redirect', 'wds_redirect_old_traffic', 1 ); function wds_get_post_id_from_external_url( $url, $blog_id = 1 ){ global $wp_query; if ( is_multisite() ) { switch_to_blog( $blog_id ); } $args = array( 'post_type' => 'post', 'posts_per_page' => 1, 'post_status' => 'publish', 'meta_query' => array( array( 'key' => '_orig_url', 'value' => $url, 'compare' => 'LIKE' ) ) ); $query = new WP_Query( $args ); if ( $query->posts ) { return $query->posts[0]->ID; } return false; }
10. Update internal links
Before I close, I’d like to leave you with one more suggestion. When you’ve completed a migration and the site is launched, there’s this joy and sense of relief that washes over you. You realize that you’ve been holding your breath for a few months and can now relax…until you realize you forgot to do some post-launch cleanup.
One of those important tasks is updating each internal link to point to the new host and permalink. Even though we’ve covered storing original URLs and how they can help solve all of our redirection issues, we can make the site faster and more efficient by not having internal links point to the old site.
Fin!
Let me say thank you for making it through my first post for WDS! If you read this whole thing, then you now understand how important migrations are and how doing some simple tasks you can become a migrations master. If you’re still not a believer, then revisit this after you’ve migrated your first major site!
I’d also like to hear from you. What are some things you’ve learned with migrations? Or would like to learn or do you want more explanation on something I touched on in this post?
Do you have any advice as how to best handle a redirect for images if the original URL contains spaces (“%20”)? That is the only part I am hung up on.
Hey wzy, what I would recommend is sanitizing the file name/orig url before you store it in the database and the modifying the function provided above in the redirect script so that wp_query searches for ‘any’ post type which would allow it to retrieve an image. Here’s an example:
// In your function where you save the original url, sanitize the filename before you store it
$cleaned_image_filename = santize_file_name( $image_filename );
update_post_meta( $post_id, ‘_orig_url’, $cleaned_image_filename );
// In the wds_redirect_old_traffic function update line 9
$request = santize_file_name( $wp->request );
// In the wds_get_post_id_from_external_url function update the following on line 35
$args = array(
'post_type' => ‘any',
'posts_per_page' => 1,
'post_status' => 'publish',
'meta_query' => array(
array(
'key' => '_orig_url',
'value' => $url,
'compare' => 'LIKE'
)
)
);
What about using automated services, like CMS2CMS ? Did you try it?
Hi Ben, I haven’t used CMS2CMS so I can’t comment on it’s effectiveness. We normally write our migration scripts because in our experience we’ve found that no two systems are created equally. Also most of clients are using custom systems where automated processes won’t work.
Excellent and very thorough. Thank you.
Thanks David! Glad that you enjoyed!
Thanks for a well written article, Marcus.
I’ve had similar thoughts about WordPress migrations; because I’m working my way through one just now.
Obviously not the same size as Microsoft, but I’ve had challenges too.
I started by using questionnaires, but somehow I feel I might be asking the wrong questions. This is because I’m still having to do a lot of the discovery myself.
Have you found this to be the case?
Also, what type of questions have you found to work best in a migration scenario? i.e. questions that help to increase the success rate of a migration.
I find that even after painstaking investigation, a lot of tidbits still have to be moved manually (to a degree).
Just as an aside, you mentioned the use of publicly available staging servers.
I think you’re right.
This is essential, because of what you said in reason a:
“your computer is too slow (period)”. Yep. so true.
The problem that I’ve found though is that sometimes the staging server is left accessible to Google.
What then happens is that Google indexes both the staging server, as well as the live server. End result is a problematic “duplicate content” issue.
I think that staging servers should be used, but care should be taken to prevent them from entering Google’s search index. Yoast has a great post that details how this should be done.
Essentially, Google will be asked to crawl the site, but *not* include it it’s search index. Works really well, from an SEO standpoint.
Thanks again for the great post. I’m tempted to write a corollary on my own blog; although I don’t think I’m qualified to do so yet.
All the best.