Multi threading Grails Scripts

I maintain the Grails GWT plugin -> http://grails.org/plugin/gwt that was inherited from Peter Ledbrook

This is a nice little plugin, that does quite a lot of funky things in grails scripts, including managing its own dependencies via Ivy.

Being a GWT integration, one of its jobs is to compile the GWT Java into javascript for running in a production setting.

There was already and implementation from the first days of the plugin, of course, that handled this.  It had an issue, in common with all GWT based frameworks.  Compiling is slow and serial.  So, if you have 30 modules (like I do), they will compile one after the other.

There are some tweaks you can add of course, like the GWT worker threads option on the compiler.  This purports to add parallelisation to the compilation, and indeed it does.  My profiling, however, shows that the threads only get fired up in the last stage of the compilation, converting the AST into javascript variants.  This leaves the first bit of the build, (which takes most of the time!), serial.

If you watch your cpu (assuming you have more than one core), you’ll see one core bearing most of the load, and the others firing up occasionally, normally at the end of the build.

This was taking ages (getting to 40-50 mins on a fast dev machine), which isn’t really sustainable.

So, as a method of speeding things up, I investigated a method of using all the CPU resources available.

I implemented a Groovy class that duplicated the current compiler invocation and added a Java 6 Executor pool that let me call the compiler once for each module and have as many in parallel as there were cores available (it auto detects this).  This works well, cutting the compile time down by at least a third, sometimes more.

It did leave me with duplicated code though, some in the Grails script and then a copy in the Groovy class.  This made me unhappy, its a maintenance headache.

Really, I want to have the logic (an invocation of AntBuilder with a bunch of classpath and environment setup) in the script, as its used for more than just the compilation, its also used for running the GWT client and other tasks.

So, I moved all this logic into the script and passed it in via a Closure to the class.  All works fine when using the serial mode (that you know and love).

When in parallel mode, however, it got ugly.   I kept hitting wierd ArrayIndexOutOfBoundsExceptions whenever I ran it, and quickly gave up trying to fix it (I have other work to do too…).  I revisited it again today, as part of the prep for the next release of the plugin and realised I’d been cavalier with my thread management.

You see, Grails scripts expect to be single threaded, none of their environment is thread safe in any way, especially the global AntBuilder instance.

Whenever I ran in parallel compilation mode, up half a dozen threads at once would hit the global AntBuilder instance via my shared code in the script.

The solution is suprisingly simple and easily understood if you have some threading background.  If you have threads, don’t share state unless you are very, very careful

My code was originally :-

gwtJava = { Map options, body ->
  ..some setup...
  ant.java(options, body)  <---------  ArrayIndexOutOfBoundsException
}

Here, we are passing a body to be evaluated by the common java run (set up in a particular way).

This is broken when run multi threaded, instead we make our resources local and ensure we liberally clone and Closures we are passed to make sure shared state is minimised.  I also altered ‘body’ to accept a param that it calls ‘ant’, so overriding the globally accessed ant instance.  It would be possible to do via changing the Closure delegate as well.

gwtJava = { Map options, Closure body ->
  def localAnt = new AntBuilder()
  body = body.clone().curry(localAnt)
  localAnt.java(options, body)
}

The block is now safe to be accessed by as many threads as you like!

Grails Scripting with Plugins

Today I’ve been writing a bunch of event scripts in groovy that are designed to manage our code as it is split into a number of plugins (more on that another time).

One of the event handlers needed to copy bean builder definitions from the grails-app/conf/spring folder into the classes dir ready for importing. No probs, see here https://gist.github.com/954008
However, if the resource groovy is in a plugin, what to do? You could hardcode each plugin you want to manage, but that seems a bit old fashioned and J2EEish.

So, I went hunting around for a while for solutions to ‘how to get the list of plugins’ from within a build script. There isn’t much obvious in the interwebs, to be sure, and the Grails documentation is generally focused on the runtime rather than the scripting elements.
So instead I checked through the grails script source to see what was available.

Lo and behold, in _GrailsSettings.groovy, ‘pluginSettings’ is being initialised with an instance of grails.util.PluginBuildSettings, magic!

This gives us precisely what we need, a list of the currently installed plugins, and importantly, their root directories.

The final altered version of the code to copy the resources around is this :-

eventCompileEnd = {
  //Copy from current app
  ant.copy(todir:"${classesDirPath}/spring"){
    fileset( dir:"grails-app/conf/spring" ){
      include(name:"*.groovy")
      exclude(name:"resources.groovy")
    }
  }

  pluginSettings.pluginInfos.each {
    def dir =it.pluginDirectory?.file
    if (dir &amp;&amp; dir.exists() &amp;&amp; new File(dir, "grails-app/conf/spring").exists()) {
      ant.copy(todir:"${classesDirPath}/spring"){
        fileset( dir:"${dir.absolutePath}/grails-app/conf/spring" ){
          include(name:"*.groovy")
          exclude(name:"resources.groovy")
        }
      }
    }
  }
}

Now you can add resource scripts to your plugins and they will be available on the classpath of an application that imports them.

Automated Testing of Grails domain mapping against a Legacy DB

So, its quite normal to develop a Grails application as a greenfield project. You get to let GORM decide how to layout the database, maybe make a few tweaks here and there to improve things when you come to indexing, nothing major. You can be (mostly) sure that it’ll work with the database, as your integration tests will exercise it against the in memory DB, and this translates well to the deployment DB too.

What happens though when you are making an application that is designed to sit on top of an existing database? (the dreaded ‘legacy’…)

This shouldn’t be shied away from, or ignored. It really might not just work out. You have to plan for how to get the app live, and have automatic verification that it will even communicate correctly with the database.

Now, it might not be obvious when you start to use Grails, or hibernate even. You can incorrectly map a domain class to a table, and when the app starts up it will say exactly nothing about the situation. Hibernate does not check that your mappings are correct, it assumes they are. The only indication you will have that something is amiss is when your nice shiny new gui goes bang when some incorrect SQL is generated for you.

This is not something we want.

So, what do we do?

Well, its normal practice to have some kind of semi live environment (read – production like database) that has data in it, for UAT, or load testing, or anything like that.

I’m going to present to you a way to validate that all your Grails domain classes will map correctly to the database, and to do this on your CI server.

So, the basic steps are :-

  • Add a new datasource entry
  • Create a database mapping test
  • Run it in isolation
  • Profit….

DataSource.groovy
Define your datasources, and then include a new entry for your data filled production environment.

DataSource.groovy
 dataSource {
  pooled = true
  driverClassName = "org.hsqldb.jdbcDriver"
  username = "sa"
  password = ""
 }
 hibernate {
  cache.use_second_level_cache = true
  cache.use_query_cache = true
  cache.provider_class = 'net.sf.ehcache.hibernate.EhCacheProvider'
 }
 // environment specific settings
 environments {
  development { ... }
  test { ... }
  uat { ... }
  production { ... }
 }

So, I’ve missed out the JDBC config bits, I’m sure you can figure it out, but we’ve now got a ‘uat’ environment point at your data filled database. We can use this to exercise the DB mappings of all our current a future domain classes.

Next, we need the test

Mapping Test

This is best expressed in code. Make a new integration test (ie test/integration). We need the integration test phase as Grails runs up most of the environment, especially the spring app context and hibernate session factory.

class DatabaseManagementMappingTests extends GrailsUnitTestCase {
 def grailsApplication
@Test
 void domainsMappedCorrectly() {
  def failed = [:]
  grailsApplication.domainClasses.each {
    try {
      it.clazz.list(max:1).each {
        println "${it}"
      }
    } catch (Exception ex) {
      failed[it.clazz] = ex
    }
  }
  if (failed) {
    fail("Domains are not mapped correctly : ${failed.keySet()}")
  }
  }
 }

This queries the grailsApplication for all the domains and then the it.clazz.list(max:1) line will drag back exactly 1 item from each of your domains.

This may or may not load other bits of data via the mappings; assuming that you have data in all the corresponding tables, this will completely and automatically exercise your domain model against a real database.

Run it in isolation

Now, this is no good at all when run via grails test-app as that will simply use the in memory database. If you alter the datasource in use, then we must prevent the other tests running as well otherwise your data filled database is likely to be wiped!

The correct command is

grails -Dgrails.env=uat test-app integration: DatabaseManagementMapping.domainsMappedCorrectly

This switches us to use the UAT datasource, and runs the single integration test specified. There’s a bit of redundancy in there, but I find it makes the intent obvious.

Profit!

At this point you should be able to get your CI server (of any flavour) to run this command and so automatically test your domain class mappings. So after every commit your mappings will be exercised against your database, flagging up schema corruption, typos, migrations not being applied promptly etc.

Scalable ajax in Grails

Went to the Groovy and Grails exchange at Skillsmatter, in London this week. Great fun. 
Got to meet some good people, see some great sessions and even be honoured to present a session on one of my favourite subjects, http messaging!

First time in a more formal setting at a tech conference.  Exciting! 

http://skillsmatter.com/podcast/java-jee/high-volume-scalable-ajax-with-grails(code resources)

It seemed to go ok. Scary, but a total blast!

David.

Grails upgrade pains and joys..

The title of this is probably unfair, as its not grails itself I’m writing about. I thought I’d share with the world the recent pain of a Grails 1.2 to 1.3 app upgrade.

Leading off from my previous post, I’d just finished doing the upgrade of a mavenised grails app. This is a fairly weighty beast, worked on by mybe 20+ peeps on and off. I’s a Grails/ GWT combo, with pretty good test coverage and an extensive set of webdriver tests (more on these further down!)

So, I start the application up, it breaks straight away, complaining about a tomcat ‘SESSIONS.ser’ file.. a quick read around indicates a tomcat version change, caused by the updated dependencies.

I forgot to clean, ah well, an inauspicious start!

Grails clean sorts me out and the app starts up.

Params

I flick to the one of the main screens of the app (GWT remember), and it renders correctly. I click a button to request some data (a GET request to a controller) and nothing happens.. nada.
I trace the usual suspects, db changes, security filtering, bad http requests. All seems fine, until I notice that a a query parameter going into the db looks odd. (I was an hour in by this point).
Its there, but different. It often includes a space, the user selecting it. eg ‘Some Items’. When it gets into the service its now ‘Some+Items’.
Strange, obviously some form of URL encoding. Not the normal kind though, which would render as ‘Some%20Item’

I’m hiding something. I’ll admit it.

The URL being requested by GWT looked like this /workItemController/getItems/Some+Item?blah=blah

Its the grails params.id that is the problem. In 1.2.2 the above URL would mean that in your controller you could have ‘params.id == “Some Item”‘
In 1.3.4, you can’t. Slightly upsetting.

Changing the url to /workItemController/getItems?id=Some+Item&blah=blah means that the behaviour is the same in both cases.

Fun eh?

No, I don’t think so either. Small, minute, even, but a little annoying. Moving on.

as JSON

In the same controller there was some code that used the ‘as JSON’ construct to serialise some objects to JSON. This stopped working.
All the googling I could find said that the as JSON operator can’t be used on arbitrary objects. Well, that may be true in 1.3. It wasn’t in 1.2.
I may pluck up the energy to investigate why, but likely not… as the testing below soaked up most of it :-S

Tests

Lots of fun and games with the tests.

The unit and integration tests were mostly ok I thought, nothing special about them. However on running them I noticed that there were a bunch of failures. Classes that I hadn’t seen before.
It turns out that some of the devs had felt constrained by Grails testing infrastructure, being junit 3 based. So they decided to slip some junit 4 tests in there for their own personal use.

Now, I like Junit 4, its much nicer than Junit 3. I do, however, think that unless the tests are run in CI, they should not be in source control. If they are then they will start rotting. Rotten tests are an abomination, and cause untold headaches for whover is the maintenance programmer, ie me. So, all these tests you are tempted to write, be they performance checks, static data population, or just something in a special style that doesn’t integrate well; delete them. Go on…. do it now…

It’s not big, and it’s not clever. Don’t do it again or I’ll track you down with my pet bear. He doesn’t like them either, and he is capable of rippiing your arms off.

So, I think to myself, its a Junit 4 test, grails now supports junit 4… @Ignore??

Nope. doesn’t work. Grails then decides that its a junit 3 test suit and runs it anyway…
So I butcher up the rotten test classes and leave them with @Ignore as a marker for anyone else passing by.

Moving onto the webdriver tests. What fun!
We’ve had lots of trouble with this. Mostly boiling down to distinct differences in behaviour between an eclipse invocation of a test and via a test-app call.

Most are harmless; eg like lots of the tests have the assert keyword in them. With grails 1.2/ groovy 1.6 this meant that tests would have the most useless errors I’ve seen in a unit test. The cause? The Eclipse Groovy plugin imports Groovy 1.7, making the tests useful. Ah well, fixed in in the upgrade!

No, the real issue is that the maven invocation kept on leaving firefox windows hanging around, causing the integration server to eventually crash. This is caused by a resource leak in the handling of the webdriver instances. We had this once before, caused by our code having lazy instantiations of the driver. Not a good idea.
This time changes to the grails test infrastructure meant that all the resource cleanup code was being ignored.

Be aware, Grails does its best to handle both junit 3 and 4 tests at the same time, but there are foibles. We were overriding the runBare() method in TestCase to do the resource cleanup (pulled from a heavily forked grails webdriver plugin, actually).
This stopped working in 1.3.4, so instead I’ve moved all the functional tests to junit 4 and created a @BeforeClass and @AfterClass to manage the driver. This works well enough.

So ended my saga…..

junit4

Manual Maven Grails update

Recently I had to upgrade a Grails application from 1.2.2 to 1.3.4 at a client site. They use maven heavily and so use the grails maven plugin.

The Grails Maven plugin integration is imperfect. If you need to upgrade to a newer version of grails, there are some manual steps required.

Firstly, edit your POM to update the versions to the required version of grails/ groovy :-

  • groovy-all
  • grails-crud
  • grails-gorm
  • grails-bootstrap
  • grails-test

Check any other dependencies that may have shifted, especially hibernate, aspectj, javassist. This can be done by making a new grails project using the latest archetype and comparing with your current pom.

If you are lucky then mvn grails:exec -Dcommand=upgrade will now work

For me it didn’t, instead I got

C:\dev\svn\trunk>mvn grails:exec -Dcommand=upgrade
[INFO] Scanning for projects…
[INFO]————————————————————————
[INFO] Building SSE NG GUI
[INFO] task-segment: [grails:exec] (aggregator-style)
[INFO]————————————————————————
[INFO] [grails:exec {execution: default-cli}]
[INFO] Using Grails 1.3.4
Running pre-compiled script
Environment set to development
Clover: Using config: [on:false]

WARNING: This target will upgrade an older Grails application to 1.3.4.
Are you sure you want to continue?
(y, n)
y
[delete] Deleting directory C:\dev\svn\trunk\web-app\WEB-INF\classes
[delete] Deleting: C:\dev\svn\trunk\target\resources\web.xml
[delete] Deleting directory C:\dev\svn\trunk\target\classes
[delete] Deleting directory C:\dev\svn\trunk\target\plugin-classes
[delete] Deleting directory C:\dev\svn\trunk\target\resources
[delete] Deleting directory C:\dev\svn\trunk\web-app\gwt
[INFO] ———————————————————————–\-
[ERROR] BUILD ERROR
[INFO] ———————————————————————–\-
[INFO] Unable to start Grails

Embedded error: java.lang.reflect.InvocationTargetException
C:\dev\svn\trunk\null\src\war not found.
[INFO] ———————————————————————–\-
[INFO] For more information, run Maven with the -e switch
[INFO] ———————————————————————–\-
[INFO] Total time: 44 seconds
[INFO] Finished at: Wed Oct 20 09:03:02 BST 2010
[INFO] Final Memory: 37M/88M
[INFO] ———————————————————————–\-

Notice the \null\

This is an attempt to resolve GRAILS_HOME, failing as the maven plugin doesn’t use GRAILS_HOME.

A possible solution to this exists here http://tramuntanal.wikidot.com/upgrading-1-2

However this involves updating BuildConfig.groovy, to update dependency resolution. This option isn’t available to me, as we have disabled this functionality as it
duplicates the maven dependency resolution, causing some issues. I therefore did the following to complete the upgrade :-

  • Update applicaiton.properties with the new grails version
  • Obtain the latest hibernate plugin (from a grails installation) and install it
  • Obtain the latest tomcat plugin (from a grails installation) and install it

The app then starts correctly.

I then tracked any bugs remaining in the app (probably post later on…)

Try catch finally wierdness in Groovy 1.6

Had a fun time for a few hours this week!

I was doing some grails work for client, carrying on the webdriver integration. Using the grails webdriver plugin. All seems well, except that I was seeing a difference in the behaviour of my tests between the IDE and when invoked from the command line.
Using the maven grails integration, I ended up with groovy 1.6.something, standard for Grails 1.2.2.

When running the webdriver tests (junit) in eclipse, everything proceeds as normal, errors are reported normally, windows are cleaned up. When run from a maven command, any test that is in error will leave a firefox window behind.

As you might imagine, this is a bit of a nightmare for the CI server.

I eventually tracked it to a difference in behaviour between groovy 1.6 and 1.7.

Eclipse had picked up 1.7 to run against, as thats what the eclipse groovy plugin gave me. Maven carried on with 1.6

Given this groovy code (exists in the Grails WebDriver plugin, WebDriverTestCase.groovy btw) :-


try {
prinltn "Trying..."
} catch (Throwable t) {
println "Caught ..."
throw t
} finally {
println "Finally!"
}
println "End"

Would you believe that in groovy 1.6 the log would read :-

Trying…
Caught…
End

Whereas in 1.7

Trying…
Caught…
Finally!
End

Groovy 1.6 will miss the finally block when a throwable is rethrown from inside a catch block.

The workaround is to either upgrade, or extract the finally to surround the try/ catch. This is the approach I took.


try {
try {
prinltn "Trying..."
} catch (Throwable t) {
println "Caught ..."
throw t
}
} finally {
println "Finally!"
}
println "End"

Both versions now exhibit the same behaviour.

Quality Software – Fact or Fiction?

I’m a software developer, I write software systems, big and small. I like to make things that work well, that are seen to be good. I want the software I write to be of good quality. I’m sure that any other software devs among you will agree, and those that use software would want the same out of your developers….

This seems on the surface to be a worthy goal. A good ambition for writing software; if you take the above as is, however, it isn’t.

Without much more thought about it’s implications, it’s a recipe for evangelistic wars about testing tools, testing levels; frameworks, architecture patterns and so many other things.

Now, you may have figured out from the title of this where I’m going. The above declaration has a bit of a problem, what is the definition of quality software?

This is something that all of us have come across, but is often only discussed at the low level. JMock versus Mockito; TDD versus BDD; code comments versus documentation.

All these are valid discussions, but they become evangelistic when removed from their context. If I were to write an article on code commenting versus self documenting code (for the sake of argument), I would necessarily be quite abstract about the different advantages each has. Why each is good or bad etc. Given no context, the majority would probably side with self documenting code as being of good quality, and commented code as being of bad quality, and they may well be right.

However, until you take this idea and drop it into a codebase; you can’t know what will work and what will not. Context is the final arbiter of what is good in that particular case.

This leads to an interesting thought. If we can’t discuss something as apparently clear cut as comments versus self documenting code without reference to the actual system we’re going to be putting it it, the context of the problem; software quality is intimately tied up to it’s context.

This means that we can’t come up with on overarching theory of what software quality is!

We can’t create a pithy phrase, define a set of tools or create a prescribed code style that the world will follow.

Bugger!

At this point, I’m going to take a leaf from other practices in the world. In other spheres, quality is not defined by the how something is done (as all the examples above are), but by the end product in it’s context. How the end product matches up to it’s requirements is the measure of quality.

Applied to software, what would that look like?

Let’s imagine a fictional product, a messaging system (because I like them!)

The business requirements would look something like :-

  • Process 500 messages a minute.
  • Full audit
  • Store messages for 30 days.
  • 99.9% uptime.

These are the explicit requirements. There would be some implicit ones (commonly seen as expectations) that go with it as well, and are often overlooked in planning. For example

  • It should be quick to add new features.
  • Smaller team is better for the budget.
  • Functionality regressions must be avoided at all costs

They will change on the environment, but there are always the 2 sets.

Now, taking the above idea, that we can measure the quality our messaging app by testing it against its requirements, both implicit and explicit.

Anything we can do to meet those requirements will improve the quality of the software, as seen by the business, and by the developers. ‘Quick to add new features’ may lead us to develop a highly modular system that is enjoyable to code, or a test driven approach; ‘Avoid regressions’ may lead us to implement complete set of regression tests. All good things.

Given the context of this system, we can begin to piece together tools, techniques and technology that will meet all the requirements. It gives us an analysis tool that helps us to choose what will help us create quality software; because we have a common understanding of what is meant by that.

It allows us to be pragmatic in our choices, without resorting to simplistic generalisations on which tool/ technique/ whatever is best.

It does lead in interesting, and I think valid directions, however.

Take the requirement above – ‘It should be quick to add new features’. This can have several implications, only some of which impact the code itself. If the code were being implemented in an agile environment, you’d be able to schedule work into the flow very easily; and have it completed quickly. Similarly, a very junior team working on an old codebase will take longer to add new features, an experienced team will be much quicker. If you make efforts to spread knowledge widely around a team then you’ll see your features being implemented more rapidly.

So, the software becomes a higher quality piece, when tested against it’s requirements, only by changing the team developing it; the software itself hasn’t changed. This is weird… but an expected outworking of my assertion that software quality is intimately tied up with it’s context.

So, is software quality fact or fiction? Is there one true way to write code that is ‘good quality’?

I would say no. There isn’t. Software Quality when applied to the whole field of software development is a Unicorn, a creature that can’t be caught or tamed; and ultimately, doesn’t even exist…

Viewed pragmatically however, we are asking the wrong question.

Given my circumstances, given my team, given my code base as it is now; given my requirements, what would a quality piece of software look like? This is a powerful tool for evaluating all the tools, techniques and development methods that can help you on the way.

So the question becomes. What would quality look like in the FooBar Messaging App V1.2?

That’s real, and it’ll help tame the evangelists among us too!

Scalable and performant Ajax using Grails and Jetty

Ajax is a popular technique for creating highly interactive websites and web applications. It is based on polling over an HTTP connection; and often involves a client regularly polling a server to see if it has any updates for it. This is a well–known technique, and I won’t attempt to cover exactly how this is done, as there are lots of JavaScript libraries out there to help.

This scenario is the one I am dealing with in this article. The question is how to get great performance out of the technology? Regular browser polling, when implemented naively, can become a troublesome scalability problem very easily, and one that will only show up in some serious load tests / network fail tests (you do these, right?). How to prevent this?

Imagine the scenario, you’ve built your app, the client JavaScript is set to poll every 5 seconds, and the server responds in about 100 ms for each connection. Not bad, all seems fine. You deploy, go live and start getting users on.

You may have noted that the above scenario introduces up to 5 seconds of latency between the server knowing something and the client being informed on its next poll. Not the kind of performance we expect in a new application, surely! You try to reduce the timings lower, to give you better latency; now, this may or may not work, although in this scenario you better hope that your server keeps up with servicing the requests! If not then you’ll soon DDOS you own server (and database, and router etc) as connection requests start stacking up.

Now, a quick bit of maths. Each client polls every 5 seconds, and takes 100 ms to process, so you can deal with (averaging out), 50 clients from a single thread. Hmmm, that’s not a lot. Oh well, ramp up the thread pool, say 50 (a common default), this gives you a maximum of 2500 clients on your server, before your connection time starts to suffer solely due to thread contention (ignoring any other capacity issues).

Now, in our polling example (lets make it an email programme), our clients will be checking for mail every 5 seconds (our ajax poll), and will more than likely be told that nothing has changed.

So, you have 2500 clients for your server, most of which will receive updates on the order of minutes, and they have the effect of hammering your server (and database, and router etc) with essentially spurious requests (check your load graphs, they won’t be pretty for an app that should be doing very little).

Come back web 1.0, all is forgiven!

I am going to describe and show 2 techniques that, when used together, give the possibility of reducing this load down to where it should be, tracking the load in the application (I’ve seen 2 orders of magnitude), removing the large overhead.

HTTP is a request/ response protocol; there is no way around this. Anyone who purports to have found a way around this is stretching the truth somewhat. There is however, a technique that can be used to simulate the server making a connection to the browser and pushing information when it knows a change has been made. This neatly solves the latency problem above.

The technique tends to be called long polling, or Comet, and is relatively well-known (see Cometd, Bayeux and others). Imagine an HTTP connection being established, the client sends its request, but the server doesn’t respond, and also doesn’t close the connection. If the timeout is set to a large value (say 1 minute), then it is valid that the server could wait up to 1 minute and then respond. At any point within that minute, the server could learn something that is useful to this particular client, dump the information into the response and send it. The server can be given control of when to respond.

Great! Servers can choose what information flows to clients, and when. We can have sub-second Ajax performance! Imagine the possibilities . . .

There is a problem, however. Try implementing this on a Java Servlet container and you will run out of ideas quite quickly and call Thread.sleep(5900) (or something similar). This blocks the thread, very quickly draining your thread pool. Oh dear, our 2500 concurrent clients with a possible 5 second latency have suddenly become 50 with almost no latency. This may be fine for your application. Let’s assume not.

The second is less of a technique and more of an infrastructure choice. Jetty (version 6) is a relatively well-known Java Servlet container that is mostly used when embedded in one of the larger JEE Application Servers. Standalone, however, it is perfectly capable. It also has a trick up its sleeve when it comes to implementing long polling.

This is its Continuation mechanism, which allows a servicing request thread to be detached from the request it is currently servicing and for the request to be parked (suspended) until the app decides it wants to do something with it. Consider this for a second. Our app had a theoretical maximum of 2500 clients, due to thread contention only. All 2500 clients could connect, their requests would be put to sleep, and then the threads return to the pool. Only if something changed that a particular client needs to know about would it be interacted with. If you app is relatively low on updates, you could support ten thousand of clients, with sub-second latency, from a single server (depends on your app at this point, test, test, test!).

Grails (1.3.0) is my environment of choice, and neatly dovetails into Jetty for development.

Note that the code below will work on tomcat, but will not follow the same scaling line as Jetty, due to a lack of a Continuation mechanism. Instead, when a continuation blocks, it will hold onto the Thread. It appears to be possible to implement a similar system on Tomcat using its Advanced IO/ Comet support, however it would be a bit more work to integrate into Grails.

A version of the Continuation mechanism is in the next version of the servlet spec, and so one would expect it to be implemented across all the servlet containers. The new specification is broadly a superset of the Continuation implementation in Jetty 6.

Now, to code!

Setup

I’m not going to go into detail on many of the steps, due to lack of space, however the Grails docs are extensive and well written.

Firstly, create a new app, remove the tomcat plugin and install the jetty plugin.

Add jetty util to your dependencies (in BuildConfig.groovy)

dependencies {
  runtime 'org.mortbay.jetty:jetty-util:6.1.22'
}

When you call continuation.suspend(), a special exception (RetryRequest) is thrown that is interpreted by the container to suspend the current request. By default, Grails catches all the exceptions bubbling from app code. We prevent this by creating our own Spring ExceptionHandler and wiring it in resources.groovy. This will selectively rethrow and RetryRequest exceptions to inform Jetty we want to suspend our request.

public class RetryRequestExceptionResolver
  extends GrailsExceptionResolver {
  public ModelAndView resolveException(
  HttpServletRequest request, HttpServletResponse response, Object handler, Exception e) {
    if (e instanceof InvokerInvocationException || e instanceof ControllerExecutionException) {
      if (getRootCause(e) instanceof RetryRequest) {
        throw rootException;
      }
      return super.resolveException(request, response, handler, e);
    }
  return super.resolveException(request, response, handler, e);
}
}

And in resources.groovy.

// Place your Spring DSL code here
beans = {
//This adds support to bypass the Spring exception
// handling and
//pass RetryRequest up to the top level.
// exceptionHandler(osj.RetryRequestExceptionResolver)
}

Now, we want a simple example of a request being put to sleep and woken up when something interesting happens. Don’t expect production code!

import org.mortbay.util.ajax.Continuation;
import org.mortbay.util.ajax.ContinuationSupport;
class MessageController {
  //Multiples of these should be saved in a Grails service designed to look after them
  static Continuation continuation
  def index = {
    Continuation cont = ContinuationSupport.getContinuation(request, this);
    if (!cont.isResumed()) {
      continuation = cont
      cont.suspend(60000);
    }
  out << "Resumed at ${new Date()}"
}
def resume = {
  continuation?.resume()
  log.info "Resumed continuation"
  out << "Attempted to resume other request at " + "${new Date()}"
}
}

To test, go to http://localhost:8080/<yourapp>/message. Your browser will block for up to a minute. Open a second window, and visit http://localhost:8080/<yourapp>/message/resume. Your first browser will receive a response with some date / time corresponding to when you hit the resume URL.

Next Steps

This only demonstrates the very basics of the mechanism, there’s a chunk more you would need to do, probably wrap the managing of connections is a Grails service, specifically, logic to decide what clients need to know as soon as it is available, and dispatching straight away (instead of just dropping it in the DB and waiting for the request).

A few things you should think about when implementing the above:-

  • Buffering of messages (clients aren’t guaranteed to be connected)
  • Implementing a message protocol server and client side.Managing/ expiring Continuation instances.
  • OS network connection/ file handle limits (in order to increase to the maximum number of clients you want to support off the server)
  • If you want to have normal ajax polls going on at the same time as the long poll, remember that browsers have a limit on the number of concurrent http connections to a single domain/ port. The spec seems to say 2 (IE follows this), Firefox defaults to 6. You are only really safe with 2 concurrent connections per browser instance.

Ajax is a popular buzzword, and like any popular buzzword it should be investigated critically. It is not a panacea.

Here, however, we’ve seen the beginnings of a useful messaging infrastructure between a server and a browser. Interesting!

This could be integrated into your current infrastructure and extend your routing system all the way to the browser.

It wouldn’t be too hard to envisage creating a Spring Integration Channel or Endpoint implementation, for example, based around the Continuation instances.