Uri Cohen's Blog

Sunday, February 22, 2009

Scalable SOA with Mule & GigaSpaces

If you missed the joint Webinar Ken Yagen of MuleSource and I did last week, you can find the recorded version here. The demo application shown during the webinar can be downloaded here.
The demo application requires the following to be installed on your machine:

Please note that the Mule/GigaSpaces integration is also available in the latest GigaSpaces GA release (6.6.x), but XAP 6.6.x only supports the Mule 2.0 branch and not 2.1.
The documentation for the integration can be found here (6.6.3) and here (7.0 EA).
If you have any question please drop me a note here or use our online forums .

I would like to take the opportunity and thank Ken and the rest of the team at MuleSource for hosting this webinar and giving us the chance to present the joint solution to the community.

Uri

Tuesday, February 10, 2009

Scalable SOA - with Mule & GigaSpaces

You are welcome to join Ken Yagen, Sr. Director of Engineering at MuleSource, and myself for our joint webinar on scaling your SOA implementation with GigaSpaces and Mule. The webinar will take place on Wednesday, February 11th 2009, at 9am PT / noon ET / 6pm CET.
In this webinar we will introduce the Mule & GigaSpaces joint solution and discuss the underlying details around how this integration works. We will also discuss a real life use case and present a short demo that shows the benefits of this integration.

See you there!

Uri

Tuesday, December 30, 2008

XAP 7.0 Is on the Move!

After the release of GigaSpaces XAP 6.6, we are now deep into the planning and development of our next major release, 7.0. This release, which is due in mid 2009, will include a few major themes, such as significantly better administration and monitoring capabilities, network resources optimizations, highly optimized and flexible local cache, reduced memory footprint, improved deployment model and support for deployment on the cloud.

If you've been following us in the past few years, you know that our R&D is practicing SCRUM for quite a while now as part of the development process. Our head of R&D, Guy Nirpaz, has talked about it numerous times to both our users and the development community at large. The nature of SCRUM and the fact that it's composed of sprints enable us to share recently developed features with our community, as we've done in the past for versions 6.5 and 6.6.

Our sprints last two weeks, at the end of which we publish the results of the sprint as an early access milestone release. Our 7.0 early access program already includes the first two milestone releases, which you can download and play with. We will be happy to hear any feedback you have. The first two milestones include the following highlights:

Significantly improved cache eviction policies (most notably a new LRU implementation) which improves LRU performance by a factor of 10 to 20 times (depending on the operation) for read operations without impacting write operation performance
Support for time based window scenarios by maintaining the entry lease as part of the POJO instance. You can now annotate a field with the @SpaceLeaseExpiration annotation and the space will use it to store and retrieve the lease expiration time for the instance. This value can later be propagated to an external data source. Based on it, expired instances can be filtered out of the space when loaded from the database.
The UI now enables you to the processing unit elements and the space cluster that belong to it in the same tab, so you have a coherent and intuitive view on your application components in one location. It also contains a summary view of all the space cluster details which enables you to quickly understand how your space cluster is functioning and what state it's in, as shown below:
Ultra Fast Native local cache for XAP .Net - we have implemented our very own ConcurrentHashMap like data structure in .Net, which enables you to enjoy a performance of millions of ID based read operations per second
SQL query optimizations: the space now processes SQL queries with OR staements in parallel, taking advantage of multi-core environments for reduced overall query times
Improved CLI deployment support: your deployment command will now only return after all application instances have been provisioned, enabling you to create complex scripts to deploy multiple dependant processing units

Naturally, there's a lot more than meets the eyes, and we're continuing to work on new and exciting stuff. You can stay tuned using our 7.0 early access page.

Happy new year,
Uri

Saturday, November 29, 2008

Some Things that Kept Us Busy Lately

It's been a quite while since I last posted, but things have been more active than ever here at GigaSpaces. I'm writing this on the way back from Tokyo and HK, where I've had a few days packed with meetings with partners, prospects and customers (and of course good food and drinks…). It sometimes amazes me what people actually do and plan to do with our product, but I must admit that this time I was really blown away by some of the things I saw. Besides some very interesting meetings with number of financial institutions (many of which actually want to take advantage of the current crisis seeking to upgrade and improve their systems so that they'll be ready when things pick up), I also met companies from other sectors that are taking the product to uncharted territories, but in a good sense. Just to give you a taste of things (hopefully I will be able to discuss them in detail in a separate post), one of our customers implemented a system to share complex 3D models between designers and perform various types of simulations on it, doing real time sharing through GigaSpaces. Since they chose .Net as the implementation platform for this application, they used GigaSpaces XAP .Net to perform the model sharing, and when you see in your eyes how a complex and details 3D model is loaded from one client and gradually appears on a bunch of other machines, this is truly impressive.
It also made me proud to see that many of our new features are adopted and used widely. Which brings me to the main topic – what kept us busy lately.
First of all, we have released our 6.6 branch in September, which might well be the first version to truly offer an end-to-end application platform. Some of the things. Here's a glimpse of what it offers:

Standard JEE web application support (via the Jetty web container)
An optimized, native .Net distribution (named XAP .Net), with full SBA capabilites
A brand new task processing API (including peer classloading capabilities)
Support for dynamic language invocation across the grid
Further enhancements of our remoting capabilities
Maven integration
Improved annotation support for configuring messaging and remoting components
Asyncronous operations
Out of the box integration with the Mule ESB
Seamless interoperability between Java, .Net and C++
UI improvements
Optimizations, optimizations, optimizations…
Revamped documentation web site

I encourage you all to give it a try and provide us your feedback. You can find the complete list of changes and enhancements here.
Besides the official 6.6 release (we've just released 6.6.2 - which I'll discuss in a separate post), we have also invested significant effort in integrating GigaSpaces with cloud and virtualization vendors. The integration package with Amazon EC2 enables you in a single click to provision EC2 instances, install GigaSpaces XAP on them, start GigaSpaces containers and deploy your application to them. Furthermore, it can also provision a MySql database that sits on top of Amazon EBS, and an apache load balancer incase you deployed a web application. Monitoring is done through our stadard user interface, which runs on the cloud and is displayed automatically as a local window on your pc. And the nice part of it is that it's all done via a web application, so no special installation is required to use it. We are currently at beta stages with this offering, and will soon make it publicly available. In fact, we already have quite a few customers that are using XAP on EC2 in production.

On another front, we have completely revamped our training offering (this is a good chance to thank Tricode, our Dutch partners, for taking this on and providing high quality results).The syllabus and details will soon be made available on our web site. The general idea was to create a modular training for various target audiences besides the usual core training (which was also completely rewritten). We offer both on site and public training. We will publish the next available schedule soon.

On the performance benchmarks front, Shay Hassidim, out deputy CTO, has been conducting numerous benchmarks in the last few months, which cover many aspects such as scalability, latency, web application performance and more. These benchmarks are posted from regularly in our company blog, and we have categorized them all for your convenience under one category. We will of course keep doing these benchmarks to provide more insights to our customers and prospects about how the product behaves in various scenarios.

That's it for this time - plane is about to land and they'll take my laptop if I don't close it :)

Sunday, July 20, 2008

The Space as a Messaging Backbone - Why You Should Care

One of the main themes of GigaSpaces 6.5 XAP release (which is also true for our direction as a company in general), is the fact that we consider ourselves as a very complete application platform, and not just an IMDG solution. This has been evident in prior versions of the product, by with 6.5 we believe we got even closer to that goal, and that in many cases we can replace altogether a JEE application server.
We are taking this quite seriously, and actually invested a lot in making sure that this vision is also realized. A major part of this effort is a project we took upon ourselves to test and document the migration process from a typical JEE application to full Space Based Architecture. To keep the comparison as unbiased as possible, we delegated the actual coding and testing work to a team of professionals at Grid Dynamics (which BTW did a great job).
They first implemented a typical OLTP application using JEE (JMS, EJB, Hibernate, Spring) deployed it on a leading app server and tested the application in terms of latency, throughput, CPU utilization, etc. Next, they migrated the application in a few steps to full SBA.
Each step was documented and tested to assess its affect on the system and validate the benefits it gives. The steps were as follows:

Add 2nd level cache to Hiberante, which didn't require any code change.
Change the data acccess layer to use the space instead of the database, with data base write behind (or Persistency as a Service) in the background
Change the messaging layer to use the Space instead of the app server's JMS implementation
Implement full SBA by collocating business logic, data and messaging in the same JVM by utilizing GigaSpaces processing units

Interestingly, steps 1 and 2 provided nice improvement in latency, but almost none in the throughput front, as shown here. After analyzing these results, we realized that the main bottleneck for throughput increase was the application server's messaging infrastructure, which uses the disk to maintain resiliency for JMS "persistent messages" and is working in an "active-standby" topology (which means only one node is handling messages at any given moment).
When replacing it with the Space, we saw a major increase in throughput, as seen in steps 3 and 4 in the graph above. Another important fact which is not evident from the graph, is that GigaSpaces is much more scalable since the Space can be partitioned across multiple JVMs, and message load is shared between partitions (unlike most MOMs, which end up writing to the disk and typically use the active-standby topology).

Who Else Should Care
Besides ordinary JEE applications, this information is also very relevant to ESB-based applications. When implementing such applications, most people overlook the above, and use messaging servers or even worse, relational databases to handle the transport for their ESB based applications. For example, 65.3 % of Mule users use JDBC for transport, 48.4% use JMS, 30% use MQ Series for tarnsport (accroding to the 2007 Mule user survey).
So you see the benefit in using GigaSpaces as a transport layer instead of the traditional solutions.
With GigaSpaces 6.5, we have built in the integration with Mule into the product, so you can enjoy the GigaSpaces scalable transport (and other integration points) out of the box.
In addition, there are a number of other ESB integrations available.
You can find a very complete integration package with Apache ServiceMix in our community web site, OpenSpaces.org. Also, here's a nice integration project of JavaSpaces and Apache Camel which also enables you to use GigaSpaces as a transport layer with Camel.

Going back to the migration project I mentioned above, we plan to publish the application itself and the results we came up with in the process so people can check it our for themselves and understand how we did things. So stay tuned, there's a lot more interesting stuff to follow.

My TechTalk at TSS.com

You're all welcome to watch my talk at TSSJS Prague about the challenges in scaling web 2.0 applications. It's now online at TSS.com

Thursday, May 15, 2008

About Predictability and Traceability

In this post I will discuss two somewhat overlooked benefits of SBA.
But first I want to explain what triggered me to write it.

A few weeks ago I read a post based on a book by Daniel Schacter, a Psychology Professor at the Harvard University. According to Schacter, the human brain tends to generalize and categorize things so that we can easily remember them and relate to them, without having to think about all the details involved. This is thought to be an evolutionary advantage, since it enables us to categorize threats and identify them very quickly (albeit not always reliably…).

For example, most people would say "Italian cars are so fun to drive" (unless they had a Fiat Multipla), or "British cuisine sucks" (no offense my fellow Englishmen, there are very few things I like better than a hot, freshly fried fish n chips :) ).

Software products in general and GigaSpaces in particular are no exception to this human behavior.
Facing prospects and being involved in numerous sales opportunities, I often see people categorizing us as "a high-scalability, low latency solution" (which I'm ok with, don't get me wrong) or "caching solution" (which I'm less ok with, as it's only part of the story). But naturally, as generalizations tend to do, this doesn’t tell the entire story.

In one of my recent sessions for a certain prospect, I had an interesting discussion with one of the attendees, a savvy enterprise architect. After presenting our way of thinking, he said something in the following spirit:

"Well, our JEE tier based application works just fine now. The latency is reasonable for all important use cases, my developers add and remove changes at a reasonable time, and I'm ok with the capacities I need to handle. In fact, I know I can handle about 50% more throughput than I do now.
So, it's not that I don't like your solution, but I don't really need it for now. If I wrote an order management system for a bank I would definitely give it a shot, but for my current needs it's kind of an overkill"

At first I thought that he had a good point. After all, we are defining ourselves as an XTP (eXtreme Transaction Processing) application server, so if your application is not extreme (like this architect here) you don't really need GigaSpaces, right?
But after thinking for a few more seconds, we started a discussion in the following spirit (I'm U below, for Uri, and he is P, for prospect):

U: So do you think you will not need to grow with capacity anymore?
P: I didn't say that, I said I'm good for another 50% increase or so
U: And then what? Do you think you will get to a point when this will not cut it?
P: Hopefully we will, if our business is successful enough. I'll do some testing, find where the bottleneck is, which will probably be the database or the messaging as always, and buy more hardware or better storage. After all this system is not running on a very expensive hardware setup, so it shouldn't be too hard to get more budget for it. I might also re-architect parts of my app to make it perform better
(at this point my brain started to make funny noises trying to compile a proper response…)
U: But can you really tell how much more hardware you will need, or how much that will cost you? Or where will patching the current architecture get you?
P: hmmm… kind of.
U: What do you mean?
(pause…)
P: I'm not sure how this will affect the performance of the database and the messaging server. It will probably improve, just not sure how much
U: Are you sure? We have a customer using <database X with clustered configuration>. Going into this configuration actually slowed things down, because now the database servers need to coordinate everything with one another…
Furthermore, how will you make capacity planning? How will you plan the budget for this?
P: I have to check it first hand before I can really tell. I'll probably start with a couple of machines or change the relevant parts in the app, try it out to see if it's good enough, and if not add more machines to the mix
U: So you will go through a complete development and performance testing cycle without knowing if and at what cost it's going to solve your problem?
P: well, now that you put it like that…
U: And another point, what if you will go through all of this, and get great throughput numbers, but not so great latency numbers?
P: Then I need to trace and profile my app and find where I have a bottleneck
U: How will you do that?
P: Well there are a lot of tools out there for doing just that, very good ones I might add
U: Do they show you the entire latency path? I mean can they tell you where is the bottleneck between your mix of DB, messaging and application servers?
(pause…)
P: Some sort of can…
U: So let me get this correctly: To know your capacity, you need to build the entire production environment in advance, buy potentially very expensive tools to check in case something goes wrong, and even then you're not entirely sure if it'll do the trick for you?
P (a bit aggressive): So what do you offer to solve this problem?
(Finally, this conversation is heading somewhere…)
U: Well, with SBA, the whole point is that everything happens in the same JVM. So it's much more predictable, because there are no other moving parts involved. When the application is partitioned correctly, and one JVM gives you 2000 operations/sec for example, two would give you more or less 4000, and so on. This is what linear scalability is all about, and the SBA model enables it.
And since it's one JVM, you can just attach a simple profiler or even print you’re your own debug messages to a log file and analyze them later. It's as simple as profiling and debugging a stand alone Java app.

I won't wear you down with the rest of this conversation, but hopefully you get the point. I think the above summarizes very well why GigaSpaces is not just about scalability or latency. It also gives a pretty easy way to answer the following questions, which is not so trivial to do with the classis tier based approach:

Predictability - If I add X more machines, how much more throughput will I gain? Is there a latency penalty for that?
Traceability - Where the heck is my bottleneck?

Finally, if you'd like to read some real life story about how hard and expensive it was for people to use the "build, deploy, test, see what we got" methodology, and why scaling is something you need to think about in advance, here's a read well worth your time.