Version 1.0 of the Ackbar Blog Engine

2011-01-23

I'm pleased to announce today the 1.0 release of the Ackbar blog engine which powers my site. I think Ackbar is maturing into a genuinely useful piece of software, especially when you consider the fact that anyone can host an instance of it for free on Google App Engine. Honestly, though, I primarily wrote it for fun. I was motivated by Rob Conery's blog post about how programmers should build their own blogs, in much the same way that Jedi build their own lightsabers. I'm a sucker for Star Wars, it turns out, hence the name of the framework. I'd like to spend the rest of this post looking at the result. If you're impatient, just go look at the code.

image of Admiral Ackbar

The Basic Architecture

Ackbar is a Clojure lisp application running on Google App Engine. It relies primarily on the Ring, Compojure, Enlive, and Appengine-Magic libraries to go about its daily business. My original intention was to just build on top of Compojure, since I was assuming that it was a full-stack framework like Rails or Django. In fact, Compojure is about 250 lines of code which mostly serves to create nice request routing syntax. In fairness, a lot of the stuff that was in Compojure has been factored out into Ring (Clojure's version of Rack/WSGI) and other libraries. Enlive happens to be the go-to choice for templating, and it accomplishes it in a somewhat unusual way. Basically, you start with an HTML file. You then give Enlive a bunch of CSS selectors and functions to run when the selectors match, and Enlive transforms the file into the desired content. There's no templating language to speak of. Crazy!

Performance

I'm somewhat concerned about performance. App Engine isn't exactly the speediest service in the entire world, and I'm doing somewhat intensive database queries when you load the page. App Engine also isn't "always on", so if nobody's visited my site for a while, they might be waiting for an instance to warm up, which can take quite a while. I'm sort of hoping that if I get more RSS subscribers (hint hint), their clients will ping my site often enough to keep App Engine running instances for me. We'll see.

I've also gone ahead and followed the "memcache everything" strategy (possibly the most important performance lesson I learned working at Facebook). This had a positive impact on the site. It also gave me an opportunity to illustrate the power of Clojure. Here's a lisp macro I wrote in the process of implementing memcache:

(defmacro memcache
  "Takes a memcache key and an s-expression that will
   evaluate to something serializable. If the key exists in
   memcache, returns the value under the key. Otherwise,
   evaluates the s-expression and caches it under the key."
  [memcache-key value]
  `(if (mc/contains? ~memcache-key)
     (mc/get ~memcache-key)
     (do (def value# ~value)
         (mc/put! ~memcache-key value#)
         value#)))

If you don't know much lisp, I wouldn't try overly hard to understand what's going on here. Basically, one of the simplest ways of using macros is to treat them as a sort of "template language" for lisp code generation, and that's what I'm doing here. If I wrap a database query in (memcache "key" (ds/query ...)), it will only perform the query if the key isn't already in memcache. This effect would be possible to achieve with a callback function, but lisp syntax makes it much more natural. The callback function approach is certainly possible in python, but the App Engine documentation still suggests that you repeat some boilerplate for checking memcache all over your code. I think this illustrates the advantages of lisp quite nicely.

Subscription

RSS was one of the later features I added, and it ended up being pretty easy (That's what the first S in "RSS" is there for, after all). Enlive handles XML reasonably well, but it doesn't handle namespaces or CDATA. This would be a problem for replicating Wordpress-style RSS, which uses both. Luckily, it's totally possible (and, I think, expected?) to just put the entire blog post in the description tag of your feed, using escaped HTML, which somewhat confusingly gets converted back into real HTML by the RSS reader. After getting everything set up, it's just a matter of plugging the result into feedburner, an awesome service (now owned by Google) which will syndicate the feed in a bunch of different formats, including email. It's much better than just linking to raw XML, especially since Chrome throws up when you link to an RSS feed.

File Uploads

I decided to implement file uploads to the App Engine blobstore API as part of Ackbar. It wasn't in my original plans, but file hosting is kind of nice to have. It lets me embed images into blog posts, like this:

File naming screen in the Ackbar engine

That particular image, in fact, illustrates probably the most terrible part of the Ackbar engine right now. After you upload a file, you get this screen, where you have to manually type in the filename. Yes, it really is that lame. I'm going to continue trying to solve this problem in a better way, but it's actually a bit tricky. You see, I just shove the file off to the blobstore API and wait for it to give me a "blob key" so I can get it back. When App Engine is done saving the file, it does an HTTP POST to a callback URL I specified. This POST contains the blob key, but it sure doesn't contain the filename. So I have no really easy way of knowing what file we're talking about when a blobstore upload completes. The "Name File" thing is my temporary fix while I try and figure out why nobody else thinks this API is totally insane.

The Future

So, software projects are never done, but Ackbar is now working well enough for me to be able to call it "1.0". Obviously, I'll keep polishing it and adding features. One thing that it doesn't support that's a noticeable omission is a comment system. I haven't really figured out what I want to do about this yet. I kind of feel that blog comments are pretty useless. I'd much rather people discussed my blog posts on Reddit or Hacker News than on my site, after all. Still, it's sort of expected. I'll have to think about it. Overall, I'm quite pleased with the results so far.

If you liked this, you should click here to subscribe for regular updates

Comment

thurn.ca Relaunch

2010-11-21

So, as is probably apparent, I've relaunched this site with a new look. There's a new backend to go with this too. The whole thing is running on about 200 lines of Clojure code, a custom blog stack of my own design. The code is based on the Compojure web development framework, and it's hosted by Google App Engine. It's all on github, if you're curious, but this is definitely the "minimum viable product" release, Version 0. It's just the bare minimum needed to get the site working, the result of around six hours of work, half of which was just learning the Compojure framework and how to integrate the the App Engine API.

In the fullness of time, I do intend to do a blog post (or series of blog posts) about how to build a site on the Compojure+App Engine stack, because I think it's very compelling. I'd like to get to at least Version 1.0 of my blog software before I do that, though. Here's a quick rundown of the features that I'm hoping to get done for Version 1.0:

  • At least two subscription options. I think RSS and Email are the best choices.
  • An archive which shows all of the posts I've made.
  • The ability to distinguish some posts as "pages" and prevent them from showing up in the feed.
  • Unlimited scrollback via JavaScript instead of pagination.
  • A general rewrite of the code to use Enlive instead of Hiccup for HTML templating. The first iteration doesn't separate logic from presentation very will.
  • Careful attention to performance, including looking into the App Engine memcache API.

Whew! All of that is going to take a while, so no promises when 1.0 will be done. Incidentally, if you're wondering what happened to most of the old content on my blog, well, I haven't figured out how to import from WordPress easily yet. I've moved five articles over by hand for testing, but the rest of my stuff appears to be trapped in WordPress-land. Hopefully I'll get that sorted out pretty soon too.

If you liked this, you should click here to subscribe for regular updates

Comment

New blog engine up and running

2010-11-20

I'll fill in a more substantial post on what's going on tomorrow... I need to get to sleep now. In the mean time, here's a block of Java code that's beautifully syntax highlighted with htmlize:

package Problem1;
class Problem1 {
  public static void main(String[] args) {
    int sum = 0;
    for (int i = 1; i < 1000; i++)
    {
      if (i % 5 == 0 || i % 3 == 0)
      {
        sum += i;
      }
    }
    System.out.println(sum);
  }
}

If you liked this, you should click here to subscribe for regular updates

Comment

Use Regular Expressions, Losers!

2009-02-05

Hello. As a matter of policy, I try and avoid the “angry rant” style of journalism. So I’m going to try and keep this as a friendly discussion. In fact, let me apologize for calling you a loser. I’m sure you’re a very successful individual. Now, take a moment and read the following block of gibberish: s/(<hr[^> ]*)>/\1\/>/ig. Interpret it and report back to me.

If you read the preceding block and identified it as a regular expression, you’re quite correct. In fact, this is a very simple regular expression I recently employed while converting a document from HTML to XHTML compliance. Here is the full UNIX command that I actually issued in this particular circumstance: sed -r -e ’s/(<hr[^>]*)>/\1\/>/ig’ file.html > file.xhtml. This command was useful because, in the document I was processing, someone had used a number of HTML <hr> tags (horizontal rule) when it should correctly be <hr /> in XHTML. That command converts them. If it is already obvious to you why this is the case, congratulations! You’ve qualified for “patronizing lecture exempt” status. If it’s not obvious, well, read the rest of this patronizing lecture.

Do you need to understand regular expressions? Yes. You really, really, do. Anyone who uses computers for significant work, especially work with text, needs to know them. Regular expressions are a mechanism by which we can identify strings of text that match some pattern, and then optionally change that text in some way. I know for a fact that I’ve caught a great many friends of mine, programmers and non-programmers alike, performing tedious operations on text files. If you ever find yourself changing more than a few lines of a file in exactly the same way, stop! Think regular expressions.

The remainder of this article will be an explanation of the above example. I hope that by the time you finish it, you’ll at the least see how regular expressions can improve your life. First up, let me talk about sed a bit. It’s a UNIX program. If you aren’t familiar with what UNIX is, you’re probably not in my target demographic for this article. Feel free to stick around, though. So, sed is standard UNIX. That means that your new MacBook came with it installed, as did your shiny student.cs account. It stands for “stream editor” and it is the canonical way to apply a regular expression (or “regex”) to a file. You’ll also note that I supplied two flags to sed. Personally, I usually have an alias set up to apply these flags by default, but I’ll explain ‘em anyway. That -r flag means “use extended regular expressions”. Extended regular expressions could more accurately be called “non-terrible regular expressions”, since they actually support the normal syntax. You should pretty must just always use these. This is also, incidentally, why I advocate using egrep over just normal grep. The other flag is -e, which you always need to use to invoke a regex from the command line. Since this is normally what I want to do, I also make -e a default.

Before diving into the regular expression, let me jump to the other end and explain everything else that is going on. After the regular expression supplied by the -e flag, you need to include the name of the file you are applying the regular expression to. In this case, it’s called file.html. By default, sed outputs to standard output, so if you want to store the results of the expression in another file, you need to pipe the result there, via something like the > file.xhtml bit at the end. The last thing I’ll point out before explaining the regex syntax is that I surrounded my expression in single quote characters. This is necessary to prevent the command shell (bash, in this case) from splitting up my expression every time a space occurs. Just get in the habit of doing it.

All right, now let’s get to the meat of the matter, the s/(<hr[^>]*)>/\1\/>/ig part. All substitution regular expressions take the form s/{some expression}/{stuff to replace it with}/{modifiers}. In this case the {some expression} bit is looking for bits of text that match a pattern for HTML horizontal rules, the {stuff to replace it with} is saying to add my forward slash, and there are two modifers being applied. Those modifers are /i and /g. The i indicates that I want my matching to be case insensitive – that means that I want to match <hr>, <Hr>, and <HR>. The g indicates that I want my search to be global, to be applied over and over. Seems simple enough.

Let’s say my text file has a line like this: <HR width=”50%” align=”center”>. I want this to become: <HR width=”50%” align=”center”/>. I need to be sure I am able to match all of the different fields that HTML’s <hr> tag supports. Let’s examine my {some expression} first: (<hr[^>]*)>. Always break down a regular expression by its bracketing. The first thing you need to know about is this: [^>]*. This part is called a character class. It is saying “match as many characters as possible that are not >”. When you put letters in between square brackets in a regular expression, the regular expression knows it can match any of the characters present there. When you start the square bracketed part with a caret (^), however, the regular expression can match anything that isn’t present. So [^>] means “match anything that is not >”. The * at the end (called a Kleene Star) means “zero or more times”. We put it after the character class to indicate that we want to match as many characters as we can that aren’t > at that point (but we’d settle for no such characters). So you can read (<hr[^>]*)> as “match a <, an h, an r, and then as many characters as you can that are not >”, and then a >. This is pattern we want to replace.

Let’s go on to {stuff to replace it with}, the last part of this regular expression I have not yet explained. There’s a special trick going on here that any regex whiz really needs to know about. The question is: Now that we’ve matched the <hr> that needs modifying, what do we want to replace it with? Clearly, we just want to change the ending bit from > to />. The first thing to know is that we can’t just write /, because then sed would think we were ending the replacement string and starting with the modifiers. We need to escape the / as \/ to make it work, so we want to replace > with \/>.

We’re not quite done yet. If our replacement string were just that, we’d simply replace all of the <hr> tags with />. That’s no good. We want to say “Remember that text we matched before? Bring some of it back into the replacement string.” Well, that’s what capturing parentheses are for. I didn’t explain the parentheses in the matching expression before, because they’re actually quite irrelevant to it. The whole thing would match just fine without those there. But what they accomplish is to say “take the text that is matched between these parentheses, and store it in a variable for later use”. The variables in question are just numbered slots, with their number corresponding to the order in which parentheses were encountered from the left side of the expression. So the text that we matched between those parentheses is stored in the “1″ variable. We access this variable in our replacement string by writing \1. This means “Take the content matched by the first set of parentheses (from the left) and re-insert it into the replacement string at this point”. This ends up actually having the desired result, so our expression manages to do the replacement as desired.

So there you go. This wasn’t by any means a comprehensive guide to regular expression use, but I hope it was enough to really demonstrate the power of regular expressions in action. My goal is to motivate you to commit to actually learning regular expressions, instead of just thinking “well, they’d be more work to learn that just making these changes manually.” Regular expressions can be learned in short order, and once you grasp them, they’ll make your life infinitely better.

If you liked this, you should click here to subscribe for regular updates

Comment

Programmer Haiku

2009-01-02

This is what I do when I get bored in classs...

Python sits calmly.
Like the ninja, it waits to
duck-type my objects.
I look in the cache
– but no. RAM yields nothing. I
check the disk… not found.
Hash tables and lists,
when combined, O(1) lookup.
Who needs arrays?
P and NP
– if they are equal, good news for
traveling salesmen.
Executing well
Dereference that pointer.
Oh Shi– SEGFAULT.

If you liked this, you should click here to subscribe for regular updates

Comment