Clustering Guice Java Web Applications

Considering the origin of Guice (it’s use in Google Adwords, one of the largest apps on the Internet), it’s fair to say it should support clustering / session replication.

There’s not much in the way of documentation around that aspect of it.

As it turns out, there’s a good reason for that – clustering and session replication with Guice are mind-numbingly simple. In fact, if you’re doing session replication / fail-over, you probably already have it working.

This is because, @SessionScoped objects in Guice are stored in HttpSession, and with consistent internal keys.

That might be obvious to some people, but it wasn’t to me. I thought that Guice used… well… MAGIC or something to keep track of objects in session scope.

Actually, delving into the Guice internals, here’s the “magical” code -

  public static final Scope SESSION = new Scope() {
    public  Provider scope(Key key, final Provider creator) {
      final String name = key.toString();
      return new Provider() {
        public T get() {
          HttpSession session = GuiceFilter.getRequest().getSession();
          synchronized (session) {
            Object obj = session.getAttribute(name);
            if (NullObject.INSTANCE == obj) {
              return null;
            }
            @SuppressWarnings("unchecked")
            T t = (T) obj;
            if (t == null) {
              t = creator.get();
              session.setAttribute(name, (t != null) ? t : NullObject.INSTANCE);
            }
            return t;
          }
        }
        public String toString() {
          return String.format("%s[%s]", creator, SESSION);
        }
      };
    }

So the GuiceFilter servlet filter is actually what provides the magic for this functionality. The actual objects are stored in the HttpSession as attributes.

So in order to support session replication you just need to make sure that your session scoped objects are Serializable, and everything is good. Normal HttpSession replication across your cluster will mean everything works as expected.

Of course, what if you have non-serializable dependencies?

Well they can be @Inject’ed into a static member variable to get around that, though I’d suggest a better solution would be to refactor the code such that the data (Serializable) component is kept in session scope, but the functional (non-serializable) part is split out into a @Singleton service class.

Posted in Java, Software Engineering | Tagged cluster, guice, Java, tips | Leave a comment

Setting Up Memcached As A Windows Service

Memcached is an in-memory, distributed key-value store for random pieces of application data. It is useful for clustering and distributed caching and it (and similar tools) are becoming an increasingly common feature of large Web-based apps.

I don’t like Windows any more than the next guy when it comes to using it as a Server OS, but sometimes you’ve got no choice – maybe you work in an environment where Windows Server is the only option. *sigh*

Memcached is not something that you would generally install on Windows, (not for production anyhow), but it is possible to have it running happily as a native Windows service.

Here’s how -

  1. Download either the 32-bit or 64-bit Windows builds of Memcached from NorthScale
  2. Unzip the memcached build into a server folder, say “C:\memcached”
  3. Run memcached.exe and ensure it starts. You should get a blank console window. Ctrl+C will close it, assuming all is well.
  4. To set it up as a native service we will download the Windows Server 2003 Resource Kit
  5. Install the Windows Resource Kit.
    For clarity I’ll refer to the install location as “C:\Program Files\Windows Resource Kits\Tools”, in reality it may be different. Substitute your install location as needed.
  6. Open a command prompt and change to your resource kit folder e.g. C:\Program Files\Windows Resource Kits\Tools
  7. At the prompt:
    instsrv Memcached "C:\Program Files\Windows Resource Kits\Tools\SRVANY.EXE"
  8. Open Notepad and paste the following into it -
    Windows Registry Editor Version 5.00
    
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Memcached\Parameters]
    "Application"="c:\\memcached\\memcached.exe"
    "AppParameters"="-m 1024"

    Adjust the path as necessary. You can add (or remove) memcached command line options with the “AppParameters” option.

    The “-m 1024″ creates 1024MB (1 GB) memcache. You can get a list of other options with “memcached -h” or on memcached.org

  9. Save the file as “c:\memcached\configservice.reg”
  10. Double click the file in Windows Explorer and merge the settings into the Registry.
  11. Start the service, e.g. “net start Memcached” at the command prompt
  12. You’re done!

Also, there is an alternative option – you can use the Java based clone of memcached, (the intuitively/unimaginatively) named jmemcached.

This can be used in conjunction with Java Service Wrapper to create Java-based Windows service which operates like the native memcached.

Posted in Software Engineering | Tagged cluster, guides, memcached, servers, windows | Leave a comment

Where Does Wicket Store It’s DiskPageStore Files?

Wicket stores everything other than the current Page for a given user’s session (by default) in a second-level disk-based session cache.

I’d never looked much into how it worked, until I wanted to know where these files were located.

The answer as it turns out makes a lot of sense – here’s the method that does the work in the DiskPageStore.java -

 private static File getDefaultFileStoreFolder()
    {
        final File dir = (File)((WebApplication)Application.get()).getServletContext()
            .getAttribute("javax.servlet.context.tempdir");
        if (dir != null)
        {
            return dir;
        }
        else
        {
            try
            {
                return File.createTempFile("file-prefix", null).getParentFile();
            }
            catch (IOException e)
            {
                throw new WicketRuntimeException(e);
            }
        }
    }

So by default it tries to use the servlet container’s local context’s temporary location, in a Wicket folder underneath. If that fails it attempts to grab the system’s temporary folder.

For Apache Tomcat, that means a folder under your webapps context in apache-tomcat/work/Catalina/… e.g.

Posted in Java, Software Engineering, Wicket | Tagged Java, web, wicket | Leave a comment

5 Minute Guide to Clustering – Java Web Apps in Tomcat

I’ve been taking a break from posting for the last couple of weeks. I was starting to get a bit run down, and feel like burn out was about to set in. The kind of blog posts I do take quite a bit of time, both in terms of the technical background work and the time to write and proof read the posts. Balancing that with work, plus personal projects, family life something had to take a break, and it’s not going to be work or family life :)

Anyhow, I’m back with another 5 minute guide. This time, how to set up clustering with Apache Web Server and Apache Tomcat.

For the purposes of the rest of this article, when I say “Apache” I mean the web server, and when I say “Tomcat” I mean Tomcat.

There are pretty much two ways to set up basic clustering, which use two different Apache modules. The architecture for both, is the same. Apache sits in front of the Tomcat nodes and acts as a load balancer.

Architecture of Apache and Tomcat cluster, protocols and connectivity

Traffic is passed between Apache and Tomcat(s) using the binary AJP 1.3 protocol. The two modules are mod_jk and mod_proxy.

mod_jk stands for “jakarta” the original project under which Tomcat was developed. It is the older way of setting this up, but still has some advantages.

mod_proxy is a newer and more generic way of setting this up. The rest of this guide will focus on mod_proxy, since it ships “out of the box” with newer versions of Apache.

You should be able to follow this guide by downloading Apache and Tomcat default distributions and following the steps. No funny business required.

Clustering Background

You can cluster at the request or session level. Request level means that each request may go to a different node – this is the ideal since the traffic would be balanced across all nodes, and if a node goes down, the user has no idea. Unfortunately this requires session replication between all nodes, not just of HttpSession, but ANY session state. For the purposes of this article I’m going to describe Session level clustering, since it is simpler to set up, and works regardless of the dynamics of your application.
…….  After all we only have 5 minutes! :)

Session level clustering means if your application is one that requires a login or other forms of session-state, and one or more your Tomcat nodes goes down, on their next request, the user will be asked to log in again, since they will hit a different node which does not have any stored session data for the user.

This is still an improvement on a non-clustered environment where, if your node goes down, you have no application at all!

And we still get the benefits of load balancing across nodes, which allows us to scale our application out horizontally across many machines.

Anyhow without further ado, let’s get into the how-to.

Setting Up The Nodes

In most situations you would be deploying the nodes on physically separate machines, but in this example we will set them up on a single machine, but on different ports. This allows us to easily test this configuration.

Nothing much changes for the physically separate set up – just the Hostnames of the nodes as you would expect.

Oh and I’m working on Windows – but aside from the installation of Apache and Tomcat nothing is different between platforms since the configuration files are standard on all platforms.

  1. Download Tomcat .ZIP distribution, e.g.
    Image showing download package

  2. We’ll use a folder to install all this stuff in. Let’s say it’s “C:\cluster” for the purposes of the article.
  3. Unzip the Tomcat distro twice, into two folders -
    C:\cluster\tomcat-node-1
    C:\cluster\tomcat-node-2
  4. Start up each of the nodes, using the bin/startup.bat / bin/startup.sh scripts. Ensure they start. If they don’t you may need to point Tomcat to the JDK installation on your machine.
  5. Open up the server.xml configuration on
    c:\cluster\tomcat-node-1\conf\server.xml
  6. There are two places we need to (potentially) configure -screenshot of where these lines are in server.xml

    The first line is the connector for the AJP protocol. The “port” attribute is the important part here. We will leave this one as is, but for our second (or subsequent) Tomcat nodes, we will need to change it to a different value.

    The second part is the “engine” element. The “jvmRoute” attribute has to be added – this configures the name of this node in the cluster. The “jvmRoute” must be unique across all your nodes. For our purposes we will use “node1″ and “node2″ for our two node cluster.

  7. This step is optional, but for production configs, you may want to remove the HTTP connector for Tomcat – that’s one less port to secure, and you don’t need it for the cluster to operate. Comment out the following lines of the server.xml -

  8. Now repeat this for C:\cluster\tomcat-node-2\conf\server.xml
    Change the jvmRoute to “node2″ and the AJP connector port to “8019″.

We’re done with Tomcat. Start each node up, and ensure it still works.

Setting Up The Apache Cluster

Okay, this is the important part.

  1. Download and install Apache HTTP Server.

    Use the custom option to install it into C:\cluster\apache2.2

  2. Now open up c:\cluster\apache2.2\conf\httpd.conf in your favourite text editor.
  3. Firstly, we need to uncomment the following lines (delete the ‘#’) -
    mod_proxy lines in httpd.conf to be uncommented

    These enable the necessary mod_proxy modules in Apache.
  4. Finally, go to the end of the file, and add the following:
    <Proxy balancer://testcluster stickysession=JSESSIONID>
    BalancerMember ajp://127.0.0.1:8009 min=10 max=100 route=node1 loadfactor=1
    BalancerMember ajp://127.0.0.1:8019 min=20 max=200 route=node2 loadfactor=1
    </Proxy>
    
    ProxyPass /examples balancer://testcluster/examples

    The above is the actual clustering configuration.

    The first section configures a load balancer across our two nodes. The loadfactor can be modified to send more traffic to one or the other node. i.e. how much load can this member handle compared to the others?

    This allows you to balance effectively if you have multiple servers which have different hardware profiles.

    Note also the “route” setting which must match the names of the “jvmRoutes” in the Tomcat server.xml for each node. This in conjunction with the “stickysession” setting is key for a Tomcat cluster, as this configures the session management. It tells mod_proxy to look for the node’s route in the given session cookie to determine which node that session is using. This allows all requests from a given client to go to the node which is holding the session state for the client.

    The ProxyPass line configures the actual URL from Apache to the load balanced cluster. You may want this to be “/”
    e.g. “ProxyPass /balancer://testcluster/”

    In our case we’re just configuring the Tomcat /examples application for our test.

  5. Save it, and restart your Apache server.

Test It Out

With your Apache server running you should be able to go to http://localhost/examples

You should get a 503 error page as per below -

This is because both Tomcat nodes are down.

Start up node1 (c:\cluster\tomcat-node-1\bin\startup) and reload http://localhost/examples

You should see the examples application from the default Tomcat installation -

Shut down node1, and then start up node2. Repeat the test. You should see the same page as above. We have transparently moved from node1 to node2 since node1 went down.

Start both nodes up and your cluster is now working.

You’re done!

Optional: Set Up Apache Balancer Manager

mod_proxy has an additional “balancer manager” component which provides a nice web interface to the load balanced cluster. It’s worthwhile setting this up if you want to remotely administer / monitor the cluster.

To do so is easy -

  1. Add the following to the bottom of your C:\cluster\apache2.2\conf\httpd.conf
    <Location /balancer-manager>
    SetHandler balancer-manager
    AuthType Basic
    AuthName "Balancer Manager"
    AuthUserFile "C:/cluster/apache2.2/conf/.htpasswd"
    Require valid-user
    </Location>

    This configures the balancer manager at http://localhost/balancer-manager

  2. We need to create a password file to secure it. At the command prompt you can use -
    c:\cluster\apache2.2\bin\htpasswd -c c:\cluster\apache2.2\conf\.htpasswd admin

    Then set a password when prompted. This password would be used by the balancer-manager URL to authenticate.

Restart your Apache web server, and go to http://localhost/balancer-manager

You should be prompted for a username/password as you set before, and see the balancer manager tool as below:

Posted in Java, Software Engineering | Tagged apache, cluster, guides, Java, open-source, performance, web | 7 Comments