Importing to WordPress? Try an Apple II!

When I started blogging this past December, it didn't take me long to settle on WordPress as my CMS of choice. (Besides, my hosting company provided one-click installation.)  I found it easy to configure, personalize, and expand; in no time, I'd built a site from scratch and started creating content.

Now that I have some experience with WordPress, I'm ready to tackle a more daunting task than using it to create a new site: adapting an existing, non-CMS site to WordPress.  I have a personal Web site that could be considered a blog, had the term been popular when I'd first launched the site.  Its design hasn't changed in its six years, and even if I'm not prepared to update the content, the look could definitely stand an overhaul.  The content consists of 212 chronological news posts (saved as a single text file, manually updated and coded with a local text editor then FTPed to the server) and 300+ static pages (software reviews, mostly).  In design, it seems perfectly suited to a CMS -- but how to get it from a folder-and-file structure into WordPress's MySQL database format?

Focusing on just the 212 news posts for now, I looked at my options.  WordPress supports importing from multiple formats: Old Blogger, Blogware, DotClear, GreyMatter, LiveJournal, Movable Type and TypePad Type or Typepad blog, RSS, Textpattern, and WordPress.  There is no option for "a custom format you designed for your specific purposes which we could never anticipate."  So the challenge was to use one of the supported formats as an intermediary between my old site and WordPress.

This is one of my old site's entries:

<U>August 22, 2006</U><BR>

Revelation of the day: PS2 USB headsets (such as <A HREF =

"http://www.amazon.com/Logitech-97855021502-Playstation-USB-Microphone/dp/B0001H9L3O"

target="_blank">Logitech's</A>), generally used with games such as <A HREF =

"/ps2/karaoke.shtml">Karaoke Revolution</A>, also work fine for <A HREF =

"http://www.skype.com/helloagain.html" target="_blank">Skype</A>. (I assume the

reverse is true, for those of you looking to sing duets.)

<P>

However, the use of Skype to sing karaoke is highly unrecommended.

<P>

Pretty simple -- in fact, too simple.  For WordPress to recognize the above, I had to convert it to the format WordPress uses:

<item>

<title>August 22, 2006</title>

<link>http://www.mysite.net/news/august-22-2006/</link>

<pubDate>Tue, 22 Aug 2006 17:00:00 +0000</pubDate>

<dc:creator>kgagne</dc:creator>

        <category><![CDATA[News]]></category>

        <category><![CDATA[Archive]]></category>

<guid isPermaLink="false">http://www.mysite.net/news/august-22-2006/</guid>

<description></description>

<content:encoded><![CDATA[Revelation of the day: PS2 USB headsets (such as <a HREF = "http://www.amazon.com/Logitech-97855021502-Playstation-USB-Microphone/dp/B0001H9L3O" target="_blank">Logitech's</a>), generally used with games such as <a HREF = "/ps2/karaoke.shtml">Karaoke Revolution</a>, also work fine for <a HREF = "http://www.skype.com/helloagain.html" target="_blank">Skype</a>. (I assume the reverse is true, for those of you looking to sing duets.) <p> However, the use of Skype to sing karaoke is highly unrecommended. <p> ]]></content:encoded>

<wp:post_id>200</wp:post_id>

<wp:post_date>2006-08-22 12:00:00</wp:post_date>

<wp:post_date_gmt>2006-08-22 17:00:00</wp:post_date_gmt>

<wp:comment_status>closed</wp:comment_status>

<wp:ping_status>closed</wp:ping_status>

<wp:post_name>august-22-2006</wp:post_name>

<wp:status>publish</wp:status>

<wp:post_parent>0</wp:post_parent>

<wp:post_type>post</wp:post_type>

 </item>

It's a big difference, beyond the ability of a simple find-and-replace -- but some basic string parsing could do the job.  A script or macro could determine both the beginning of one blog post and end of another by looking for a line that begins and ends with the same HTML, with the date in between.  Now I just had to write such a converter.

For that, I turned to my trusty Apple II. My programming experience is primarily in (don't laugh) Applesoft BASIC, as well as the plain-English scripting language of the telecommunications program Spectrum. The latter was that with which I had the most recent experience, four years previous, so I turned to that, running under Sweet16, an Apple IIgs emulator for Mac OS X. Sure, there are Mac and Unix programming environments in which I could've accomplished this goal -- but finding and learning them would've taken longer than relying on my old friend.

The scripting was more tedious than difficult.  Since I originally wrote every line of the actual blog entries to end with a carriage return (CR), I didn't have to worry about Spectrum reading more than it could store into a single 255-byte variable.  It took about 2-3 hours of coding (less were I still the programmer I was a decade ago), but I finally got everything working.  It wasn't the most elegant script, but I was feeling rusty, even with the original printed manuals handy.

The only caveat was that, when I imported the final product into WordPress, it ignored the ID numbers (wp:post_id) I'd given the 212 individual news entries and instead assigned new IDs to the posts in the order in which they were imported: that is, newest to oldest. That means that July 2007 = 1, January 2001 = 212, and my next post in August 2007 = 213. Not very consistent. However, when WordPress exports posts, it reassigns IDs based on date. So exporting the posts then re-importing them gave a more logical order to their IDs.

So the news posts are converted -- but that was originally only one file from my old site. I can use this experience to write a converter for the remaining 300 files, but I insist on preserving the old URLs somehow; they've been around the Internet forever, and I don't want to break those links or lose my Google PageRank. A query in the WordPress support forum of how to do so has thus far gone unanswered. I'll continue to mull this one over; in the meantime, any suggestions?

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies