Java in the cloud: Google, Aptana, and Stax

Just as the megastars in Hollywood seem to find each other and fall in love, it was only inevitable that two of the greatest buzzwords ever hatched -- "Java" and "cloud" -- would meet and begin to breed. Now that a number of companies have launched Java clouds, or begun weaving Java into their hosted development platforms, the race is on to remake the Java infrastructure in the cloud image.

There is some irony in this turn of events because the Java infrastructure has done better than most piles of code in solving the difficult problem of getting multiple processors and multiple machines working together. Java EE (Enterprise Edition) offers a very sophisticated set of mechanisms that pass messages between machines (Java Message Service or javax.jms.*) and handle database access (Java Persistence and Java Transaction). Then there's the Enterprise Java Bean, a sophisticated tool for managing persistence on a cluster, an abstraction that's so powerful and so dangerous that it has driven as many programmers mad as it has helped.

[ Is the mainframe the ultimate cloud platform? What should you do if your cloud provider disappears? What does cloud computing really mean? ]

A number of companies have repackaged the JVM (Java Virtual Machine) and turned it into a hosted service. To see how this is working out, I set up accounts at three different providers offering Java services on their cloud, built a few test applications, and bombed them with some HTTP requests.

All of them are very new. Google's App Engine just expanded to include Java and is now giving select programmers an "early look." Stax is in beta. Aptana's Cloud doesn't use either term but is adding new features. Surprisingly, Sun was not ready to let me test anything in its cloud but is expected to launch in a few months. (See the sidebar, Sun Cloud looks beyond Java, for a description of what Sun is planning.)

The most surprising element about all of these new clouds is how little they offer compared to the promise of the Java EE stack. At the core, they provide a simple servlet container, one that's stripped down and not much different from Tomcat because it is often just Tomcat. The tools do a better job of delivering a revolutionary way of purchasing computer time than they do of creating the next generation of Java flexibility.

But this may be because the creators have a slightly different goal than the creators of the original J2EE. They're not trying to create a wonderful cloud of objects that float from machine to machine, nibbling on a few cycles here, chomping on a large block of memory there. They're really just tackling the headaches of deploying a server, a process that can be maddening in many IT shops. They want to make it easy for a project to turn into a public Web site and then grow adequately if thousands or millions decide they want to tune in. The goal is to make all of this happen as automatically as possible without all of the headaches of approving purchase orders, reserving rack space, waiting for deliveries, and other time-consuming problems.

Some of the simplicity must also be because this is all very new. I wouldn't be surprised if the companies begin integrating all of the more sophisticated layers in their next generation. They're starting with Tomcat for now, and it shouldn't be too hard to catch up with Java EE.

[ See previous Test Center cloud reviews: Cloud versus cloud: A guided tour of Amazon, Google, AppNexus, and GoGrid | Inside Amazon Web Services | Windows Azure Services Platform gives wings to .Net ]

There is a great deal of variety in the approaches and the levels of abstraction. Google's App Engine caters to a thinner, more widely parallel set of applications that can scale automatically. Aptana's tool, on the other hand, is a nice IDE that integrates deployment and purchasing into Eclipse. Stax offers something that lies roughly in between both tools.

Google App Engine

Google's shining new Java wing to the App Engine should be very familiar to anyone who's spent time with the first generation based on Python because many parts of the architecture are unchanged. You write a thin layer of logic that juggles the requests and then you rely upon the back end to synchronize everything. The Java applications use the same database, image processing engine, and mailer in pretty much the same way as their Python-based cousins.

While the new Java tool will be very familiar to Python programmers who used the original engine, many of the ideas will be a bit strange and new to Java programmers. The database is not MySQL, Oracle, or even the embedded database included with the JVM, Derby. It's a proprietary data store with a small subset of SQL called GQL. You can't use JDBC (Java Database Connectivity) to link up with it; you need to use Google's own proprietary layer.

This is just the beginning of the changes. You can't just open up a socket and suck down a Web page; you have to use the URL fetching code. If you want to keep a cache of commonly used information, you should store your objects with the Memcache implementation that Google offers. Google's code will keep everything consistent so that the Memcache on all machines will offer the same thing when the synchronization is finished.

There are also a number of restrictions on the classes you can use. Google's version of the JVM isn't fully stocked. You won't be able to spin off a thread to handle processing and you can't write to the disk -- ever. You have to use Google's data store if you want to save the data.

All of these changes have some advantages. Google's data store is stripped down and optimized for working with many machines at once. The Memcache service saves calls to the data store -- something to be wary about because the meter is clicking whenever your servlet is churning. The image processing tool handles some of the work with native code, another advantage.

For all of these reasons, I think App Engine will be most attractive to projects that need to give people shared access to several big tables filled with data. It's not really for all Java programmers, but for people who are familiar with Java and like to use it to write some glue code to wrap around a big table. You can't do too much to the data on the way in or the way out because there are limits on the amount of time that each request can spin the meter.

I think these restrictions will mean that there will be relatively few applications that just pick up and migrate to the App Engine. All of the data access will need to be rewritten and some of the common tricks that use flat files will need to be re-engineered. Moving your application out of the App Engine will probably be a bit easier, but it will require changing your mindset because the App Engine used to handle some of the scaling and synchronization issues for you. It may be technically possible to run the App Engine debugging environment on your own server, but the Terms of Service say Google is giving you a license for the "sole purpose of enabling you to use and enjoy the benefit of the Service as provided by Google, in the manner permitted by the Terms."

Google is well aware of this issue and is trying to address it as it encourages people to use the system. IBM is even offering tips on how to migrate App Engine code to its platform. It's just a matter of getting the JDO (Java Data Objects) calls to talk with IBM's DB2 instead of Google's back end. I'm guessing that IBM hopes to grab customers who build the first rev in App Engine and then decide that some threading or slow cron jobs are absolutely necessary. I built several JSPs that deliberately sucked down a large amount of computation time, and it was pretty easy to push them hard enough to reach the limit. I don't think most Web 2.0 applications will run up against Google's CPU and memory limits, which are liberal from the standpoint of the typical Web application, but they could be a problem for anyone that wants to do much heavy processing. The image toolkit, for instance, will only work with images smaller than 1MB -- something that's a bit tight for serious photographers. My pocket camera, for instance, can turn out images that take up 4MB.

These limits might squeeze an application in unexpected ways. cron jobs are just URLs that are called at set times. That's a nice abstraction but it's definitely a bad fit for some of the massive reports that corporations generate every evening. It's more for housekeeping than any kind of asynchronous heavy lifting.

There are real advantages to what may at first seem like a straightjacket to any programmer who grew up opening sockets on a whim and writing to the file system whenever it felt good. The explicit limitations help architects create better applications that run more smoothly because they prevent overreaching the limits of the system. Many of the early adopters of the Java EE found themselves pulling out their hair when one of the automatic tools would take forever to deliver the magic that the API documentation promised. Making the limitations of the architecture apparent by writing a tightly limited API is more of a gift than a curse.

If there are no joins in the data store, then it will be easier to generate massive reports because the database table will be denormalized from its inception. If the jobs can't run that long, the architect can make living documents that let the user drill down to generate the necessary information on demand. That can be much more efficient than spending the entire night pre-computing something that won't be read by many people.

It's worth noting that Google has done a nice job of integrating the system with Eclipse. There is a wide variety of tools, and they do more than just upload WAR (Web archive) files to the App Engine. The standard application shell is integrated with Google Web Toolkit, the mind-bending tool that converts your Java code into JavaScript that runs on the client. The dashboard is simple but responsive. The spikes I generated in my jobs started showing up within seconds.

All of this adds up to a compelling tool for serious experimentation, the kind of monkeying around with the hope that it will turn quickly into some that's worth launching a hundred servers. The App Engine will scale up quickly and then stop on a dime as it follows the ebbs and flows of fortune's fickle whim automatically.

Aptana Cloud

Aptana made its name by creating a nice set of plug-ins that sit on top of Eclipse and make it simpler to develop Java, JavaScript, PHP, Ruby, and Python applications. Aptana Studio is a nice solution for many developers who want to work with all of the dominant Web programming languages, especially AJAX. Now the company is expanding this set of plug-ins in a partnership with Joyent hosting to produce Aptana Cloud.

Aptana Cloud, like the Studio, is a set of Eclipse plug-ins that smooth the deployment process to Joyent's collection of servers. In one tab of Eclipse you edit your code, and in another you control how it's deployed to the server running Tomcat, MySQL, and PostgreSQL. You can also build out Web sites with Rails, Jaxer, and PHP. Python is said to be coming.

The "My Cloud" tab is pretty much a fancy front end to the standard Linux server. In one tab, you can turn the server daemons (Tomcat, MySQL, PostgreSQL, and Apache) on or off. If you want to add more computational resources, you can switch to another tab where the options let you choose one of four settings for disk space and RAM. The basic introduction setting includes 256MB of RAM and 5GB of disk space, billed at $0.027 per hour, a price that works out to $20 per month. If you want more, you can move a little lever that goes up to 2GB of RAM and 25GB of disk space for $0.359 per hour, or about $267 per month.

The service is just a pretty face on many of the standard VPS (virtual private server) tools out there, something that will be comforting and familiar to anyone with hard-won experience wrestling with the standard offerings. Access to the database is available on port 3306. Secure FTP and Subversion are also ready and running. If you want root, it's yours with a click. All of the log files are nicely presented in yet another tab. The system load and memory consumption appear in a dashboard-like tab. You never need to leave Eclipse/Aptana Studio.

While you're technically running on a single virtual machine, the servers have eight CPUs and they're set up to allow bursts of computation that can consume over 95 percent of those eight processors. This is more a nice feature that smooths bumps for occasional busy periods, not a way to get eight CPUs on the cheap.

Aptana Cloud is less revolutionary and more evolutionary. You can use all of the experience you have with the traditional tools again here. The buttons map pretty cleanly to the tasks that used to require Emacs in the shell and just simplify the process. If you need to poke around under the covers, or set up some other workflow, the opportunity is here.

Stax on Amazon EC2

1 2 Page 1
Page 1 of 2
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon