Uri Cohen's Blog

Thursday, May 15, 2008

About Predictability and Traceability

In this post I will discuss two somewhat overlooked benefits of SBA.
But first I want to explain what triggered me to write it.

A few weeks ago I read a post based on a book by Daniel Schacter, a Psychology Professor at the Harvard University. According to Schacter, the human brain tends to generalize and categorize things so that we can easily remember them and relate to them, without having to think about all the details involved. This is thought to be an evolutionary advantage, since it enables us to categorize threats and identify them very quickly (albeit not always reliably…).

For example, most people would say "Italian cars are so fun to drive" (unless they had a Fiat Multipla), or "British cuisine sucks" (no offense my fellow Englishmen, there are very few things I like better than a hot, freshly fried fish n chips :) ).

Software products in general and GigaSpaces in particular are no exception to this human behavior.
Facing prospects and being involved in numerous sales opportunities, I often see people categorizing us as "a high-scalability, low latency solution" (which I'm ok with, don't get me wrong) or "caching solution" (which I'm less ok with, as it's only part of the story). But naturally, as generalizations tend to do, this doesn’t tell the entire story.

In one of my recent sessions for a certain prospect, I had an interesting discussion with one of the attendees, a savvy enterprise architect. After presenting our way of thinking, he said something in the following spirit:

"Well, our JEE tier based application works just fine now. The latency is reasonable for all important use cases, my developers add and remove changes at a reasonable time, and I'm ok with the capacities I need to handle. In fact, I know I can handle about 50% more throughput than I do now.
So, it's not that I don't like your solution, but I don't really need it for now. If I wrote an order management system for a bank I would definitely give it a shot, but for my current needs it's kind of an overkill"

At first I thought that he had a good point. After all, we are defining ourselves as an XTP (eXtreme Transaction Processing) application server, so if your application is not extreme (like this architect here) you don't really need GigaSpaces, right?
But after thinking for a few more seconds, we started a discussion in the following spirit (I'm U below, for Uri, and he is P, for prospect):

U: So do you think you will not need to grow with capacity anymore?
P: I didn't say that, I said I'm good for another 50% increase or so
U: And then what? Do you think you will get to a point when this will not cut it?
P: Hopefully we will, if our business is successful enough. I'll do some testing, find where the bottleneck is, which will probably be the database or the messaging as always, and buy more hardware or better storage. After all this system is not running on a very expensive hardware setup, so it shouldn't be too hard to get more budget for it. I might also re-architect parts of my app to make it perform better
(at this point my brain started to make funny noises trying to compile a proper response…)
U: But can you really tell how much more hardware you will need, or how much that will cost you? Or where will patching the current architecture get you?
P: hmmm… kind of.
U: What do you mean?
(pause…)
P: I'm not sure how this will affect the performance of the database and the messaging server. It will probably improve, just not sure how much
U: Are you sure? We have a customer using <database X with clustered configuration>. Going into this configuration actually slowed things down, because now the database servers need to coordinate everything with one another…
Furthermore, how will you make capacity planning? How will you plan the budget for this?
P: I have to check it first hand before I can really tell. I'll probably start with a couple of machines or change the relevant parts in the app, try it out to see if it's good enough, and if not add more machines to the mix
U: So you will go through a complete development and performance testing cycle without knowing if and at what cost it's going to solve your problem?
P: well, now that you put it like that…
U: And another point, what if you will go through all of this, and get great throughput numbers, but not so great latency numbers?
P: Then I need to trace and profile my app and find where I have a bottleneck
U: How will you do that?
P: Well there are a lot of tools out there for doing just that, very good ones I might add
U: Do they show you the entire latency path? I mean can they tell you where is the bottleneck between your mix of DB, messaging and application servers?
(pause…)
P: Some sort of can…
U: So let me get this correctly: To know your capacity, you need to build the entire production environment in advance, buy potentially very expensive tools to check in case something goes wrong, and even then you're not entirely sure if it'll do the trick for you?
P (a bit aggressive): So what do you offer to solve this problem?
(Finally, this conversation is heading somewhere…)
U: Well, with SBA, the whole point is that everything happens in the same JVM. So it's much more predictable, because there are no other moving parts involved. When the application is partitioned correctly, and one JVM gives you 2000 operations/sec for example, two would give you more or less 4000, and so on. This is what linear scalability is all about, and the SBA model enables it.
And since it's one JVM, you can just attach a simple profiler or even print you’re your own debug messages to a log file and analyze them later. It's as simple as profiling and debugging a stand alone Java app.

I won't wear you down with the rest of this conversation, but hopefully you get the point. I think the above summarizes very well why GigaSpaces is not just about scalability or latency. It also gives a pretty easy way to answer the following questions, which is not so trivial to do with the classis tier based approach:

Predictability - If I add X more machines, how much more throughput will I gain? Is there a latency penalty for that?
Traceability - Where the heck is my bottleneck?

Finally, if you'd like to read some real life story about how hard and expensive it was for people to use the "build, deploy, test, see what we got" methodology, and why scaling is something you need to think about in advance, here's a read well worth your time.

Sunday, April 27, 2008

To Spring or not to Spring

Just got back from a few days in the Netherlands and Belgium, where I gave technical sessions at a few knowledge events organized by our partners in the region.
One of the events was held by our Dutch affiliate, Tricode, which did a great job in setting it up and attracting quite a few developers and architects to it. We even have a GigaSpaces Dutch web site now which Tricode guys have set up for us - so thanks guys.

One of the questions I got after the session was something in the following spirit:
"Sure, Spring is great and OpenSpaces integrates very nicely with it which is great. But what if I don't like all this XML in my project, or just (imagine that) don't use Spring but still want to use OpenSpaces APIs which are clean and nice"?

Of course, you can always "imitate" Spring and call the OpenSpaces Spring factory beans from your own code, but that's not very clean and will require you to dig through the OpenSpaces code to understand how things work.
Thankfully, you have another option - OpenSpaces configuration API, aka OpenSpaces configurers. It's fairly new (6.0.2 onwards) and therefore is still relatively unknown to many of our users.

Some (pretty recent) History
Up until version 6.0, the preferred API to use GigaSpaces was JavaSpaces with proprietary extensions. In 6.0 we introduced OpenSpaces, which greatly simplified things and made the user experience much more positive by abstracting away lot of the API problems and inconsistencies.
From day 1, OpenSpaces has been all about Spring: It uses Spring for configuration and utilizes a lot of goodies that Spring provides such as transaction management framework, namespace based configuration, etc.
But since OpenSpaces' goal is become the preferred Java API for the product, the fact that you can only use it through Spring was a bit limiting to some of our customers and prospects.
In addition, the trend towards code based configuration (first with Guice and now also with Spring Java Config) also made us realize that we need a way to use OpenSpaces interfaces without wiring them through XML.

Configurers Demystified
So here are a number of snippets to show you how this is done. Let's first show the Spring equivalent creating a space instance and then wiring your listener on top of it using a polling container:

<os-core:space id="space" url="/./space" />

<os-core:giga-space id="gigaSpace" space="space"/>

<bean id="simpleListener" class="SimpleListener" />

<os-events:polling-container id="eventContainer" giga-space="gigaSpace">

<os-core:template>
        <bean class="org.openspaces.example.data.common.Data">
            <property name="processed" value="false"/>
        </bean>
    </os-core:template>

<os-events:listener>
        <os-events:annotation-adapter>
            <os-events:delegate ref="simpleListener"/>
        </os-events:annotation-adapter>
    </os-events:listener>
</os-events:polling-container>

The above XML snippet creates a space and registers to get notified on all objects of type Data whose processed field equals false. Here's how this is done in java code, via configurers:

//creating a space
IJSpace space = new UrlSpaceConfigurer("/./space").space();
//wrapping it with a GigaSpace
GigaSpace gigaSpace = new GigaSpaceConfigurer(space).gigaSpace();
//creating polling container
Data template = new Data();
template.setProcessed(false);
SimplePollingEventListenerContainer pollingEventListenerContainer = new SimplePollingContainerConfigurer(gigaSpace)
.template(template)
.eventListenerAnnotation(new SimpleListener())
.pollingContainer();

That's it. as you can see, it's even simpler than the XML equivalent.
A few interesting things about this:

All our configurers use method chaining, which makes them very intuitive to use. It's about as close as you can get to domain specific languages in pure Java :)
There are configurers for all of the OpenSpaces constructs, specifically: Space, GigaSpace, Event containers, Remoting Proxies and Scripting Proxies.

Where is this documented?
Every OpenSpaces docs page includes XML snippets that show the configuration.
Every such XML snippet is displayed in a tabbed pane, which typically has 3 tabs:

Namespace - which stands for the default configuration using Spring's namespaces support.
Plain XML - which shows how to configure the component via pure spring, without GigaSpaces specific namespaces
Code - which shows how to configure the component using configurers

Here's a screen shot to illustrate this:

Sunday, April 6, 2008

Impressions from MuleCon 2008

Just returned from San Francisco, where I gave a session about Mule and GigaSpaces integration at MuleCon. Although I'm still recuperating from the loooooong trip (about 18 hours each way) and the time differences (10 hours if really want to know), I think it was worth while attending the conference.
I think the MuleSource guys did a great job, and the conference was quite packed.
About 200 developers and architects from around the world attended and it was interesting to see the profile of Mule users and what people are doing with it.
Many of them use Mule mainly as an integration and messaging platform - i.e. bridging between applications, but some are also using it as a hosting platform for single applications. This enables better decoupling between application components and makes for a more flexible application which is not bound to any specific middleware product. In GigaSpaces we refer to this as intra-application SOA.
I think we managed to generate nice interest with regards to the GigaSpaces-Mule integration package, which essentially lets you scale out your Mule application in a very clean and non-intrusive way.
There are 3 key value adds that GigaSpaces can provide to Mule users:

High availability: making your Mule application highly available and fault tolerant, and maintaining in-memory and SEDA queues on top of the Space instead of just in memory. This allows your application to enjoy in memory performance and still maintain resilliency and HA. It means that after failover, your application can start working from the exact point that the failure took place and not loose any in pending in-memory messages.
Scalable transport: Use GigaSpaces messaging capabilities to obtain high throughput, low latency and extreme scalability for your messaging layer.
SLA driven grid based deployment: Use GigaSpaces service grid and SLA driven containers to enable declarative SLA, self healing and dynamic deployment capabilities for Mule applications.

Personally, the most useful part for me was the technical sessions by Mule developers and architects in which they showed the new features in Mule and other products they're working on like Galaxy, HQ and OSGi support. I especially found useful the sessions by Travis Carlson, Daniel Feist and Dan Diephouse - nice job on Galaxy and 2.0 guys :)
There are a few similarities between our own grid infrastructure (specifically what we refer to as the Grid Service Manager or GSM for short) and the planned capabilities for Galaxy in the next Mule releases (hot deployment and netboot to name two). We intend to cooperate with MuleSource to make sure our offerings are as aligned as possible in that regards, although this is still a very early stage to be able to tell how this will evolve.
Another interesting session was the enterprise infrastructure panel, in which various aspects of open source as a model for enterprise infrastructure were discussed.
The commercial open source model still has to be proven as a sustainable business model (and not as just a successful exit strategy) but there are certainly some convincing arguments raised in the session in favor of this approach.
For one, it is clear that at the end of the day, an organization seeking support and indemnification will invest money in the software (be it open or closed source) - I'm sure MuleSource guys would agree with me on that. But according to the panel, what makes open source more appealing in that regards is that the need is pushed from the bottom by developers and architects that just download the code and use it. So when approaching procurement to get budget for this, the fact that the open source libraries are already there and are used in the code makes a much better incentive for them vogons at procurement to approve the budget.

That's it for now. If you want to get more details about our integration with Mule just drop me a line here and I'll get back to you with more details.

Sunday, February 10, 2008

New OpenSpaces Demos and Examples Project

One of the goals of OpenSpaces.org is to promote best practices and recommended usage patterns among the GigaSpaces developer community.
To that end, we have decided to dedicate some of our own resources to create a number of sample applications and blueprints that will help realize this goal.
We have created a dedicated project under OpenSpaces.org named OpenSpaces Demos and Examples. This project will host these applications and provide a one stop shop for developers wishing to get some ideas on how to use GigaSpaces and OpenSpaces in various scenarios.
We also encourage developers to donate their own ideas and sample applications to this project by joining it and becoming active committers.
The first two applications we posted (actually one of them is already there and the other will be in the next few days) are demonstrating integration of GigaSpaces with Spring MVC based web applications.
The first one is a simple HelloWorld web application and the second is a more complex application that shows how to integrate a GigaSpaces based stock feed application with an AJAX based web client. The web client is based on the excellent open source ExtJS JavaScript library.
You're encouraged to download it and give it a shot.

Thursday, February 7, 2008

OpenSpaces SVF - Remoting on Steroids

When OpenSpaces was first released, one of its core features was Space Based Remoting.
Based on the Space as a discovery, transport, load balancing and failover capabilties, this remoting mechanism provided a a drop in replacement for other remoting implementations, allowing for exposed services to be highly available and redundant and for remote client to get fault tolerance and load balancing out of the box without changing a single line of code.
With 6.5, we have decided to change the name of this feature to the Service Virtualization Framework (SVF) as we feel it has become much more than just a remoting implementation, and better describes the overall value it can bring to applications, by virtualizing any service object across the GigaSpaces grid.
In the next few paragraphs I'll try to explain how this framework works and what are the new improvements we have added to it in GigaSpaces 6.5.

So how in fact does it work?
Underneath, the remoting mechanism relies on the space to get all the above.
What happens is that once the client makes a remote call, the local client side proxy (created dynamically at application startup) packages the invocation into an invocation object, and writes it to the space.
At the space side, you can choose whether you want to handle the remote call synchronously or asynchronously:

Handling the call synchronously means that the space will use the inbound communication thread (which was used to receive the request from the network) for the processing of the request. This is similar to most other remoting implementations. This mechanism is built on top of space filters, such that a dedicated filter delegates the invocation to the service object and returns the result back to the client. The benefit of it is that it's usually faster than using a separate thread for processing the request since there are less steps involved.
Handling the call asynchronously means that you have a separate thread pool (in the form of an OpenSpaces event container) that consumes the invocation object out of the space, calls the service and then return the result back to space. The client proxy then picks the result from the space using a take operation and returns it back to the caller.

The benefit of that is that the model is more scalable and safer than the synchronous one, as the load at any moment depends mostly on the number of processing threads rather than the actual number of requesting clients. It also allows for true asynchronous operations by using JDK Futures and retrieving the result of the invocation asynchronously.

Why use the SVF instead of ordinary remoting mechanisms?
For both implementations (sync and async), the fact that everything is done through the space means that the framework provides the following out of the box:

Automatic and transparent failover: using the space's failover capabilities, a remote call (i.e. the invocation object) is transparently routed to another node when the default node for the invocation becomes unavailable.
Load balancing: using the space's load balancing capability, the remote call (i.e. the invocation object) can be routed to any one of the cluster members or even to all of them. Again, this is done in complete transparency to the calling code.
Non intrusiveness: As with any other good remoting implementations, the client code is completely isolated from the underlying remoting mechanism. This is actually a very powerful yet non intrusive manner of implementing SBA. Both the client and service code can be completely independent of any GigaSpaces interface, making them truly portable across any runtime platform.

Code samples
The primary way to enable SVF in your application in your Spring beans file.
Let's take the example of the following business interface (assume a Data and DataResult classes):

package org.openspaces.example.data.common;

public interface IDataProcessor {

    /**
* Process a given data object and return a DataResult.
*/
    DataResult processData(String title, Data data);
}

It takes an instance of type Data as input and returns an instance of type DataResult.
For simplicity Lets also assume the following implementation for the service:

public class DataProcessor implements IDataProcessor {
    public DataResult processData(String title, Data data) {
   System.out.println("Processed: " + data);
     return new DataResult("Done processing: " + title);
}
}

In order to configure this on the space side we need to determine whether we want the service to be exposed as a synchronous service, asynchronous service or both.
The configuration is part of a processing unit deployed to the GigaSpaces grid.
Here's the Spring configuration inside the processing unit's pu.xml:

<os-remoting:service-exporter id="remotingServiceExporter">
    <os-remoting:service ref="dataProcessor"/>
os-remoting:service-exporter>

The above snippet exports the service such that it could be used as a remoting endpoint.
Now we need to configure the actual remoting mechanism.
Here's an example for a sync remoting configuration. Note that we simply take to the exported object and use it as space filter:

<os-core:space id="space" url="/./space">
    <os-core:filter-provider ref="remotingServiceExporter" />
</os-core:space>
<os-core:giga-space id="gigaSpace" space="space" tx-manager="transactionManager"/>

If we want to use async remoting we need to use either a polling or a notify container. Here's an example for a polling container configuration which uses two threads to process invocation requests:

<os-events:polling-container id="remotingContainer" giga-space="gigaSpace" concurrent-consumers="2">
    <os-events:listener ref="remotingServiceExporter" />
</os-events:polling-container>

On the client side, we should configure a proxy to be used by the client code.
In 6.0, the way to this was to configure everything in the Spring beans xml file, as follows.
For async proxy:

<os-remoting:async-proxy id="dataProcessor" giga-space="gigaSpace"
                         interface="org.openspaces.example.data.common.IDataProcessor"/>

For sync proxy:

<os-remoting:sync-proxy id="dataProcessor" giga-space="gigaSpace"
        interface="org.openspaces.example.data.common.IDataProcessor"/>

Now all we need to do is wire the proxy with the actual client code, as follows:

<bean id="myRemotingClient" class="org.openspaces.example.MyRemotingClient">
    <property name="dataProcessor" ref="dataProcessor"/>
</bean>

The client code would simply invoke the method on the interface:

//injected via Spring 
private IDataProcessor dataProcessor; ...

public void doSomethingWithRemoteProxy() {
  ...
  String title = ...
  Data data = ...
  DataResult result = dataProcessor.processData("title", data);
  ...
}

The call is completely unaware of how the space is deployed, how many instanced it has or what is the actual clustering topology.
Behind the scenes, the space proxy will route the invocation object that the proxy generates with every request to the relevant node. For example, if you're dealing with a partitioned space, the request will by default be randomly routed to one of the partitions (this is based on the hashCode of the invocation object and can be overridden - see below). With a replicated topology, the request will go to the node to which the client is currently connected (which is determined by the load balancing policy for the cluster and can be round robin, weighted round robin, etc.).
For both topologies, in case of failure of the node to which the request was sent, the request will be sent automatically to the backup node if such exists.
In 6.5, the client side configuration will even be simpler - you can simply annotate a field of the remote interface type with a @SyncProxy or @AsyncProxy annotation and OpenSpaces will create the proxy for you and inject in to your client object. Here's an example:

@AsyncProxy(gigaSpace="gigaSpace", timeout = 15000)
private IDataProcessor dataProcessor;

In the Spring beans file, the following line has to be included to make OpenSpaces infrastructure process all beans with this annotation:

<os-remoting:annotation-support />

In the above example, OpenSpaces infrastructure will inject the client code with an Async remoting proxy, which uses a GigaSpace instance by the name of "gigaSpace" (as defined in the Spring beans configuration file) and use a call timeout of 15 seconds.

Advanced Features
Routing the call in a partitioned space topology
Many of GigaSpaces users use the partitioned topology, which is required in case you have more data on the grid than any single machine can contain. It is also very useful for cases where you want to distribute the processing load between a number of machines, each handling a different subset of the data.
When an object is written into a partitioned space, the partition is determined by the hash code of the routing field, which is designated by the user. With remoting however, a method can have more than one parameter, or no parameters at all.
By default, the routing is determined by the hash code of the entire remote invocation object.
This behavior can be overridden using an implementation of the RemoteRoutingHandler interface:

package org.openspaces.remoting;

public interface RemoteRoutingHandler {
    T computeRouting(SpaceRemotingInvocation remotingEntry);
}

As you can see, this interface contains one method, computeRouting, which is given the remote invocation entry and returns a value based on which the routing value will be computed (the space proxy will call its hashCode() method for that). Here a sample implementation:

public class DataRemoteRoutingHandler impplements RemoteRoutingHandler {

public Long computeRouting(SpaceRemotingInvocation remotingEntry) {
    if(remotingEntry.getMethodName().equals("processData")) {
      Data data = (Data) remotingEntry.getArguments()[1];
      return data.getType();
  }
    return null;
}

In the pu.xml file, we need the proxy to reference this object:

<os-remoting:async-proxy id="dataProcessor" giga-space="gigaSpace"
                   interface="org.openspaces.example.data.common.IDataProcessor">
    <os-remoting:routing-handler>
        <bean class="org.openspaces.example.data.feeder.support.DataRemoteRoutingHandler"/>
    </os-remoting:routing-handler>
</os-remoting:async-proxy>

With 6.5 you can do it in much simpler fashion: you can simply add the @Routing annotation to the signature of the service interface, and the routing will be based on it!

package org.openspaces.example.data.common;

public interface IDataProcessor {
  DataResult processData(@Routing String title, Data data);
}

Sending a request to more than one node - ala Map/Reduce
In many cases, you would want to use all the nodes in the network to do some processing, and then aggregate the result on the client side, ala Map/Reduce.
In that case there are two things you should do: Define the remote proxy to broadcast the call to all partitions, and define a result aggregation policy to be executed on the client side once results from all nodes have returned.
The broadcast is supported for sync proxies, and is enabled in the following way (using annotations based configuration):

@SyncProxy(broadcast = true, remoteResultReducerType = MyResultReducer.class)
private IDataProcessor dataProcessor;

The above annotation configuration references the MyResultReducer class.
This is an implementation of the RemoteResultReducer interface, which is responsible to get the results from all the nodes and aggregate them into one object which will be returned to the calling code. Here's this interface's definition:

package org.openspaces.remoting;

public interface RemoteResultReducer {
    T reduce(SpaceRemotingResult[] results, SpaceRemotingInvocation remotingInvocation) throws Exception;
}

It's getting an array of SpaceRemotingResult instances which contains information on the invocation of a single node, such as the invocation result, whether or not an exception occurred, and where the invocation took place. It returns the final aggregated result, which is passed on to the calling code.
This enables you to grid enable your service without affecting the calling code.

Using futures and one one way calls
Sometimes you don't want the calling code to block and wait for the invocation to take place. This can be true if you know the calculation will take a lot of time, or if the calling thread does not require the invocation result to continue. When using async proxies, you have two options depending on the signature of the invoked method:

If it doesn't have a return value, you can simply declare it as "one way", which means that the client side proxy is not going to wait for it's completion:
```
@AsyncProxy(voidOneWay = true)
private IDataProcessor dataProcessor;
```
In the above snippet, the proxy will not wait for any method that has no return value.
If it does have a return value, you can use a JDK Future to get the result at later time or using another thread. To do that, you should declare an interface that returns a Future object. OpenSpaces infrastructure will detect that automatically and not block the proxy on the call. So our previous IDataProcessor interface will now look like this:
```
public interface IDataProcessor {
  Future processData(String title, Data data);
}
```
Note that you can still deploy the previous interface (without the Future) on the space side, as this is purely a client side related issue. So you can have client waiting synchronously to get the invocation result, and clients using a Future and retrieving the result asynchronously.

Remoting Aspects and MetaData
New to 6.5 is the ability to apply cross cutting concerns, similar to the AOP aspects or servlet filters. You can apply your own logic on the client (bofore the call is made) and on the server (after the call has been intercepted and before it's delegated to the service).
This is useful to apply system wide functionality such as logging and performance measurements. Another new feature in 6.5 is the ability to piggyback the remoting invocation and send custom metadata along with it. This could be very useful when implementing security for example.
In fact, combined with the remoting aspects, it's very simple to implement a custom, non-intrusive security mechanism that would be completely transparent to the calling code.
You can read about it more here.

Summary
In this post I showed the benefits and rich set of features that are part of the OpenSpaces Service Virtualization Framework. These are all documented in full in the GigaSpaces wiki.
I encourage you to try out our new 6.5 early access version and test drive the new SVF features.
You can download the EAP version here. An initial version of the documentation for OpenSpaces SVF can be found here.

Sunday, January 13, 2008

OpenSpaces Dynamic Scripting Support

One of the things I intend to do under my new hat as developer community manager is regularly publish posts about new and cool product features. Since our Early Access Program is will go live soon, I can also write about planned features so that GigaSpaces users can experience them by downloading a beta version and trying them out.
The main goal here is to make the community aware of the new product features and get feedback from developers even before the version is released.
The first new feature I wanted to introduce is the new OpenSpaces Scripting Support.
You're probably wondering what scripting has to do with distributed, ultra-scalable systems.
After all, when one hears the words Groovy, JavaScript or Ruby the almost immediate association is web applications, web browsers and HTML.

With the emergence of the likes of Groovy and JRuby, scripting is becoming more and more mainstream and used for building many types of applications. In addition, the realization that domain specific languages can be very elegant, powerful and useful in many cases also contributes to this important mind shift.

So how can a script be useful in a distributed space based application?
For many of our customers, one of the most appealing features of GigaSpaces is the ability to perform calculations and business logic on the space nodes.
This can be useful for a number of cases, such as performing aggregations on space data, validating data as it is written to space, enriching this data, etc.
Before version 6.0 was released, the only way to do that was to use Space filters and the custom query pattern. While very powerful, this was quite complex and cumbersome to use.
In 6.0, we introduced Space Based Remoting as part of the OpenSpaces framework.
This is a very powerful abstraction, enabling your application to transparently enjoy all the goodies the space can give, such as high availability, load balancing, sync/async execution and parallel processing via a Map/Reduce style API.
However, as with any remoting implementation, you have to physically deploy the remoting endpoint on the server side (in our case you define it within your processing unit).
For many applications, this wouldn't be a limitation, but for some there is a need to control what gets executed in the remote node in a more dynamic fashion - or better yet, let the client application decide what should be done on the server (space) side. This can be extremely useful for application that need to dynamically define the execution logic, such as algorithmic trading application that let the trader define the trading algorithm.
And this is where OpenSpaces Scripting support fits in.
The idea is that instead of deploying the endpoint on the server, every space has a built in generic script executor. The space client can then submit the script to be executed on any of the spaces, or even on all of them simultaneously using Map/Reduce style API. So clients can actually change the execution logic dynamically.
You can think of it as a sophisticated form of a database driver - only instead of submitting SQL statements and being limited to the relational model, you can now submit an actual program to your grid, which can do most anything Java code can do!!
The client can also control whether the script will be cached or not, and whether it will execute synchrounously or asynchronously.
When caching is enabled, the scripts are cached in their compiled form on the space side to support faster execution.

And now for some code samples
Setting up a scripting client is very simple.
On the space side, if you're using our EDG edition and starting a data grid, it's already enabled automatically. If you're deploying your own processing unit, you need to include the following in your pu.xml file:

<os-core:space id="space" url="/./mySpace">
    <os-core:filter-provider ref="serviceExporter"/>
</os-core:space>

<!-- A GigaSpace instance (that can be used within scripts) -->
<os-core:giga-space id="gigaSpace" space="space"/>

<!-- Theh scripting executor remoting support -->
<bean id="scriptingExecutorImpl"
  class="org.openspaces.remoting.scripting.DefaultScriptingExecutor" />

<os-remoting:service-exporter id="serviceExporter">
    <os-remoting:service ref="scriptingExecutorImpl"/>
</os-remoting:service-exporter>

<os-events:polling-container id="remotingContainer"
  giga-space="gigaSpace"
  concurrent-consumers="${scripting.asyncConcurrentConsumers}">
    <os-events:listener ref="serviceExporter"/>
</os-events:polling-container>

That's it, now you processing unit is all set for script execution.
As for the client side, things are even simpler. In this post, I will only show how to do it from a Spring based client application, However this can also be done from your code using our configurers (I will cover it in a future post).
One of the nicest things here, is the approach we took with regards to configuration via annotations. It's follows the spirit of what the guys at SpringSource did with 2.5, enabling you to configure dependency injection via annotations. Note that the client class implements Spring framework's InitializingBean interface which provides a convenient callback at application startup:

public class ScriptRunner implements InitializingBean
{
@AsyncScriptingExecutor
private ScriptingExecutor asyncScriptingExecutor;

    @SyncScriptingExecutor
private ScriptingExecutor syncScriptingExecutor;

    public void afterPropertiesSet() throws Exception
{
   asyncScriptingExecutor.execute(new ResourceLazyLoadingScript()
           .cache(true)
           .name("test1")
           .type("groovy")
           .script("classpath:/TestSpace.groovy")
           .parameter("name", "Uri"));
   syncScriptingExecutor.execute(new StaticScript().cache(true)
           .name("test")
           .type("groovy")
           .script("println name")
           .parameter("name", "Cohen"));
}
}

As you can see, all the client code has to define is a field of type ScriptingExecutor which is annotated either with @AsyncScriptingExecutor or @SyncScriptingExecutor (for asynchronous or synchronous script execution).
The client Spring beans XML file would then look like this:

<os-core:space id="space" url="jini://*/*/mySpace" lookup-groups="uri"/>
<os-core:giga-space id="gigaSpace" space="space"/>
<os-remoting:annotation-support />

<bean id="scriptRunner" class="ScriptRunner"/>

All we had to do for the annotations to get picked up is introduce the <os-remoting:annotation-support/> tag, which directs OpenSpaces infrastructure to processes all the beans in the application context that actually contain the appropriate annotations.
And finally, here's the Groovy script file referenced above:

for (i in 0..9999)
{
gigaSpace.write(new Message("blabla"))
}
println 'Done writing 10000 objects to the space'

A few more interesting things to note here:

The execution unit is an object which implements the org.openspaces.remoting.scripting.Script interface. Here we show two such implementations of it - one that lazily loads a script file from a location in the classpath, and another one which simply takes in the script itself as an argument.
We define the script type, in this case Groovy. If the space does not contain the groovy libraries in its classpath this will fail. As mentioned before, we support JavaScript, Groovy and JRuby out of the box. If you favor another scripting language you can easily add support for it as well.
The method chaning approach we took with regards to Script configuration (as we also with other configuration elements of GigaSpaces). We believe it's a very elegant way of configuring your application (and probably the closest one can get to a domain specific language in Java :)).
The script itself references a gigaSpace variable. Every script running inside the space has a number of contextual variables that are available to it automatically, such as the space itself and the Spring ApplicationContext in which it's defined.

Summary
OpenSpaces Dynamic Scripting Support is a very powerful tool that adds a great deal of flexibility to grid applications. Let's recap the benefits of this new feature in short:

It provides you a mean to invoke dynamic scripts on any or all grid members, much the same way you would invoke a SQL statement or even a programon on a database.
It gives the combined power of Java alongside new and powerful scripting languages such as Groovy and JRuby.
It enables you to perform distributed aggregations using a Map/Reduce style API.

I hope I managed to give you short glimpse of what this feature can do and how powerful it is.
In one of my the next posts I'll go into this a bit deeper and discuss some more interesting capabilities of this feature such as distributed aggregations, interceptors and other aspects that come up in this context, such as performance, class loading and security.

Saturday, January 12, 2008

OpenSpaces.org goes live!

This is my first blog under my new hat at GigaSpaces Technologies - Developer Community Technical Manager. I must say it's a bit of a mind shift than what I was used to, being much more focused on technology and community than on sales and customer related issues.
In the past couple of weeks we've been extremely busy in setting up our new OpenSpaces.org developer community web site, which is due to launch this week. We've also been busy in promoting our OpenSpaces Developer Contest, which gives you the chance to write a cool OpenSpaces plugin or application and win $10K for it - so if you think you're up for the challenge you welcome to give it a shot.
The main idea here is to create a place for all GigaSpaces users to share ideas and code in the form of recommended blue prints, useful plugins and even full blown applications.

This is my first time setting up such a community web site, and I must say it's been quite interesting so far.
The technical challenges in making such a collaboration platform work seamlessly are not to be taken lightly. There are many moving parts we had to glue together (Wiki, Jira, Forums, SVN - Fisheye and more) and as we at GigaSpaces like to say, having many moving parts in any system is not very easy to deal with ;)
Imagine setting these systems up for every single project, and making sure they all work together in terms of permissions, user management, etc. and making all of that work in less than a month... (p.s. This was a great team effort - thanks to all those involved).
Apart from a number of minor issues, we have found that all the products we use integrated quite well - I assume that the fact that most of them come from the same vendor (Atlassian) has something to do with it (P.S. thank you Atlassian and Jive guys for making our lives easier :) )

This is also the first chance I got to do some real hands on work with Spring 2.5 MVC framework (for our signup application). I really liked the annotation based controller configuration, it really makes everything a lot simpler.
We took a very similar approach with OpenSpaces in the upcoming GigaSpaces 6.5 version. Basically we will enable users to configure many aspects of their application (such as remoting, filters, etc.) without having write XML (well, maybe just one line). I'll write more about this in one of my next posts.

The OpenSpaces.org website already contains some interesting projects and there are many more to follow. It's really an indication of what you can do with this technology and how easy it is to use.

So what next after the launch?

Apart from the user projects, we also intend to use this platform to promote best practices by providing a number of blueprint applications relevant to many of our customers and prospects.
We also intend to publish code samples for new and cool product features, thus enabling our users to interact with us and say what they think about the new features and what they would like to see in the product.
So stay tuned, there's a lot more exciting stuff to come !