Wordpress XML to toto

posted on 2010-10-06 - amd.im/Iphx

In my efforts to convert my blog at amdavidson.com I wrote a little script to convert the xml file that Wordpress can export into text files that toto understands.

It's extremely hackish and will likely not generate 100% solid data, I had to edit ~10 of my 140 posts. Do not use this on a production system and check your posts before hand.

If you're still inclined, here's the gist:

#!/usr/bin/ruby

require 'rubygems'
require 'nokogiri'

puts 'parsing xml file'
parsed = Nokogiri::XML(open("./wordpress.2010-10-06.xml"))

puts 'pulling titles'
i = 0
title = Array.new
parsed.xpath('//item/title').each do |n|
title[i] = n.text
i += 1
end

puts 'pulling dates'
i = 0
date = Array.new
parsed.xpath('//item/pubDate').each do |n|
date[i] = n.text
i += 1
end

puts 'pulling content'
i = 0
content = Array.new
parsed.xpath('//item/content:encoded').each do |n|
content[i] = n.text
i += 1
end

puts 'pulling name'
i = 0
name = Array.new
parsed.xpath('//item/wp:post_name').each do |n|
name[i] = n.text
i += 1
end


puts 'muxing arrays'
if title.length == date.length and date.length == content.length  and content.length == name.length then
posts = [title, date, content, name]
else 
puts 'length broken!'
end

puts 'printing'
i = 0
while i < title.length do
filename = "articles/" + DateTime.parse(posts[1][i]).strftime("%Y-%m") + "-" + posts[3][i] + ".txt"

file = File.new(filename, "w")

# puts "filename: " + filename
file.puts "title: " + posts[0][i]
file.puts "date: " + DateTime.parse(posts[1][i]).strftime("%Y/%m/%d")
file.puts "author: Andrew"
file.puts "\n"
file.puts "#{posts[2][i]}"

i += 1
end

Note that the filenames and directories are hard coded... be sure to update them before running.

about

amdavidson.com is a simple blog run by Andrew Davidson, a manufacturing engineer with a blogging habit. He sometimes posts 140 character tidbits, shares photos, and saves links. You can also see posts dating back to 2005.

Search