Migrating back to WordPress with regex and ‘tr’

I decided that I didn’t really feel like trying to manually upgrade ChintzyCMS to the latest version, so my personal site is now back running WordPress. I wasn’t interested in migrating comments, so if you’ve said something witty in the past you’ll need to look forward for new material.

There were a few tools that helped in the PostgreSQL to MySQL and import process. I’m still a bit upset that WordPress doesn’t support Postgres natively, but such is life.

  • First, I used phppgadmin to export the posts table as XML, which gives a <column> and <row> style output from the table. I deleted the header and column description  tags from the beginning and end of the file.
  • Using the guide at WordPress Codex – Importing Content, I performed the following search and replace operations based on the Importing from [X]HTML instructions:
    • For each column tag that needed to be replaced, I searched for the regex
      <column name="title">([^<]*)</column>

      and replaced it with the appropriate tag pair, such as

      <title>\1</title>
    • For each column that didn’t have an equivalent match in the WordPress database, I searched for the same regex and replaced it with an empty string.
    • I saved the resulting file out as posts.xml.
    • Using the UNIX tr utility described here, I removed all newline characters in the file:
      tr -d '\n' < posts.xml > posts_no_newline.xml

I then was able to import the posts_no_newline.xml file using the WordPress RSS Importer plugin.

There are still some posts that contain remnants from an ASCII to UTF-8 conversion (the new MySQL database stores content in UTF-8) which I’ll likely fix programmatically; when that happens, I’ll update with how that was performed.

Comments are closed.