Uri Cohen's Blog: GigaSpaces

Showing posts with label GigaSpaces. Show all posts

Thursday, May 10, 2012

XAP 9.0 – Geared for Real-Time BigData Stream Processing

XAP 9.0 is out the doorway, and I thought it would be a good opportunity to share some of the things we’ve been up to lately.

Traditionally, one of XAP’s primary use cases was large scale event processing, more recently referred to as big data stream processing or real time big data analytics. Some of our users are reliably and transactionally processing up to 200K events per second, in clusters as large as a few hundreds of nodes.

In a sense it’s taking Map/Reduce concepts and applying them to an online stream of events, analyzing events as they arrive rather than waiting for all the data to be available offline and only then triggering the Map/Reduce jobs.

There are many use cases that are applicable here, such as web analytics, financial trading, online index calculation, fraud detection, homeland security, guidance systems and essentially any use case that requires immediate feedback on a massive stream of events, typically tens or even hundreds of thousands of events per second.

In the last few months we’ve heard of numerous frameworks that claim to be a real time analytics silver bullet (some rightfully so, some less...), so I wanted to recap what we’ve learned here at GigaSpaces in the past few years from dealing with large scale event processing systems, and what we’ve done in XAP 9.0 to support these scenarios even better.

What It Takes to Implement Massive Scale Event Processing

There are a number of key attributes that are important for big data event processing systems to support, all supported by XAP and are at the core of it:

Partitioning and distribution: Perhaps the single most important trait when it comes scaling your system. If you can efficiently and evenly distribute the load of events across many event processing nodes (and even increase the number of nodes as your system grows), you will ensure consistent response times and avoid accumulating a backlog of unprocessed events.
XAP allows for seamless content based routing, based on properties of your events. It also supports adding more nodes on the fly if needed.
In memory data: in many cases, you need to access and even update some reference data to process incoming events. For example, if you’re calculating a financial index you may need to know if a certain event represents a change in price of a security which is part of the index, and if so you may want to update some index related information. Hitting the network and the disk for every such event is not practical at the scale we’re talking about, so storing everything in memory (the events themselves, the reference data and even calculation results) makes much more sense. An in memory data grid allows you to achieve that by implementing memory based indexing and querying. It helps you to seamlessly distribute data across any number of nodes, and takes care of high availability for you. The more powerful your in-memory indexing and querying capabilities are, the faster you can perform sophisticated calculations on the fly, without ever hitting a backend store or accessing the network. XAP’s data grid provides high availability, sophisticated indexing and querying, and a multitude of APIs and data models to choose from.
Data and processing co-location: Once you’ve stored data in memory across multiple nodes, it’s important to achieve locality of data and processing. If you process an incoming event on one node, and need to read and update data on other nodes, your processing latency and scalability will be very limited. Achieving locality requires a solid routing mechanism, that will allow you to send your events to the node most relevant to them, i.e. the one that contains the data that is needed for their processing. With XAP’s event containers, you can easily deploy and trigger event processors that are collocated with the data and make sure that you never have to cross the boundaries of a single partition when processing events.
Fault tolerance: Event processing is in many cases a multi-step process. For example, even the simplest use case of counting words on twitter entails at least 3 steps (tokenizing, filtering and aggregating). When one of your nodes fail, and it will at some point, you want your application to continue from the point in which it stopped, and not go through all the processing flow for each “in-flight” event (or even worse, completely lose track of some events). Replaying the entire event flow again can cause numerous problems: If your event processing code is not idempotent your processors will fail. And under high loads this can create a backlog of events to process, which will make your system less real time and less resilient to throughput spikes. XAP’s data grid in general, and event containers in particular, are fully transactional (you can even use Spring’s declarative transaction API for that if you’d like). Each partition is deployed with one synchronous backup by default, and transactions are only reported committed once the all updates reach the backup. When a primary fails, the backup takes over in a matter of seconds and continues the processing from the last committed point. In addition, XAP has a "self healing" mechanism, which can automatically redeploy a failed instance on existing or even new servers.
Integration with batch oriented big data backends: Real time event processing is just part of the picture. There are some insights and patterns you can only discover thorough intensive batch processing and long running Map/Reduce jobs. It’s important to be able to easily push data to a big data backend such as Hadoop HDFS or a NoSql database, which unlike relational databases, can deal massive amounts of write operations. It’s also important to be able to extract data from it when needed. For example, in an ATM fraud detection system you want to push all transaction records to the backend, but also extract calculated “patterns” for each user, so you can compare his or her transactions to the generated pattern and detect frauds in real time. You can use numerous adapters to save data from XAP to NOSQL data stores. XAP’s open persistency interface allows for easy integration with most of these systems.
Manageability: & Cloud Readiness: Big data apps can become quite complex. You have the real time tear, the Map/Reduce / NoSql tear, a web based front end and maybe other components as well. Managing this consistently, and more so on the cloud which makes for a great foundation for big data apps, can easily become a nightmare. You need a way to manage all those nodes, scale when needed and recover from failure when they happen. Starting from XAP 9.0, XAP users can leverage all the benefits of Cloudify, to deploy and manage their big data apps in their own data center or on the cloud, with benefits like automatic machine provisioning for any cloud, consistent cluster and application-aware monitoring and automatic scaling for the entire application stack, and not just the XAP components.

XAP 9.0 for Big Data Event Processing

Now that I covered what XAP already had to offer for big data analytics, I’d like to delve a bit into the new capabilities in XAP 9.0, our newest release, which complement nicely the already existing ones as far as big data stream processing is concerned:

FIFO groups (AKA Virtual Queues): this feature is quite unique to XAP. It allows you to group your events based on the value of a certain property, and while across groups you can process any number of events in parallel, within the same group you can only access one event at a time, ordered by time of arrival. Think of a homeland security system with multiple sensors – in many cases you want to process readings from the same sensor in order, so you can tell for example if a certain suspicious car is moving from one place to another, and not vice versa, but across sensors you want to process as many events as possible.
Storage types: Most real time analytics systems rely heavily on CPU and memory. So using them efficiently is always important. With XAP 9.0 we’ve introduced a new mechanism that allows users to annotate object properties (object can represent both events and data) with one of 3 modes – native (meaning the property is saved on the heap as a native object), binary (the property is serialized and is only deserialized when an actual client reads it) and compressed (same as binary, with gzip compression). This allows for fine-grained control over how the memory is utilized and save your application from doing unnecessary serialization and deserialization when accessing the data grid.
Transaction-aware durable notifications: Pub/Sub notification are important in scenario where you want a certain event to trigger multiple flows, or be processed in parallel on multiple servers. It is also useful when propagating processing results downstream to other applications. With XAP 9.0 we’ve enhanced our pub sub capabilities to be durable (i.e. even if a client disconnects and reconnects it will not miss an event) and provide once and only once semantics. In addition, Notifications for data grid updates (e.g. event objects written or removed, other data updated) maintain transaction boundaries. That means that if multiple events were written, or multiple pieces of data updated, the subscribed clients will be notified on all of them or none at all.
Reliable, transaction-aware local view: another interesting use case when it comes to event processing is when you want you event processor to be located outside of the data grid. This gives you the benefit of scaling your event processors separately from the data grid, at the expense of accessing the data over the network. However using the local view feature allows you to locally cache within the processor’s process, a predefined subset of the data that you to be relevant to your processing logic. The local view mechanism will make sure it stays consistent, up to date and that you never miss a data update even after disconnecting and reconnecting.
Web Based Data Grid Console: Understanding what’s going on with your events, what types of events are queued, and what data resides in the data grid is essential to the operation of every event processing system. XAP’s new data grid console allow you to monitor everything within the data grid from your browser using intuitive HTML5 interface. You can view event and data counters, submit SQL queries to the data grid, and do a lot more.
Cloud Enablement: XAP 9.0 comes with Cloudify, our new open source PaaS stack built in, which allows you to manage all of the components of your big app, including the backend Hadoop or NoSql database and the web front end.

See It in Action!

You can see all of that in action with our new twitter word count example, whose code is available on github.

What’s Next?

There are a few other cool features in XAP 9.0 that you can learn about here.

We’re planning a lot more interesting features around big data analytics, so keep your ears open :)

References

Thursday, April 7, 2011

XAP 8.0.1 is Out!

We’ve just released XAP 8.0.1, with a lot of goodies included. 8.0.1 is the first feature and service pack on top of XAP 8.0.0. It includes many enhancements and a few exciting new features. Here’s a short recap:

Improved Web UI Dashboard with Alerts View: The dashboard view now gives you a single click view of the entire cluster, including alerts on various problematic conditions. The previous view is now available under the topology tab. This is the first stage in the new Web based UI planned for XAP. You can find more details about it here.

Elastic Deployment for Stateless and Web Processing Units: The elastic deployment model introduced in 8.0 for stateful and data grid only processing units has now been extended to support stateless and web processing units. You can scale web applications and stateless processing units up and down based on CPU, memory or available resources.
Document (Schema-Free) API support for .Net: The .Net edition now includes the all new document API which was introduced in the 8.0.0 in the Java version. It enables you to maintain a completely flexible domain model without any restrictions on the entry's schema, and add/remove properties as your application evolves. It also simplifies interoperability with Java since when using the Document API it's no longer tied to a concrete .Net and Java classes.
Improved complex object querying and indexing for .Net: The .Net edition now enables you to query and index complex object structures, including nested collections and arrays.
Deep POJO/PONO - Document Interoperability: Documents and POJOs can now be mixed interchangeably across all nesting level. You can read a document as a POJO/PONO (assuming its type name corresponds to the POJO/PONO class name) and vice versa. The space will convert between the formats across all nesting levels, so if you have a complex Java object for example which contains a reference to a nested Java Object or a collection of nested objects, the space will convert the entire object graph to documents and sub documents. In addition, you can also define a "bag" of dynamic properties for a certain POJO/PONO so that new properties that are added via the document API to the entry are exposed in the POJO/PONO instance via this bag.
Map/Reduce and Native Query Support for JPA: The XAP JPA Implementation now supports the JPA NativeQuery facility. On top of running queries in the Space's native syntax, it also enables you to actually execute Space tasks across one or all cluster nodes and bring the power of the grid to the JPA API. Tasks can be defined using the GigaSpaces task execution interfaces or even as a dynamic language script for scripting languages that are supported as part of the JVM.
Method Level routing and result reducers for Space Based Remoting: Space Based Remoting has traditionally been a very popular facility to reliably expose scalable business services to your application clients. In 8.0.1, you can specify method level behaviors for the foundational remoting constructs such as RemoteRoutingHandlers and RemoteResultReducers via the dedicated @ExecutorRemotingMethod and @EventDrivenRemotingMethod annotations.
WAN Replication Improvements: 8.0.1 adds a number of important improvements and bug fixes to the replication over WAN module, such as better peer classloading behavior (when the classes written to the space are not part of the space's classpath), better cleanup of replicated entries, and support for replication of .Net entries.
Improved Performance of .Net Executor API: The .Net task execution API has gone some optimization in the way that tasks are passed to the space and executed in it, which resulted in performance boosts of up to 250%.
More JPA goodies: In addition to NativeQuery support, we have also implemented a number of other changes, including better JPQL syntax support (LIKE, IS NULL), optimistic locking support and improved relationship handling.
Improved XA Transaction Support: XA transactions can now work against a partitioned space cluster as a single XA resource (via the distributed Jini transaction manager) rather than working with each partition separately.
Mule 3.1 Support: The build in Mule ESB support has been ungraded to support Mule version 3.1.

The full list of changes, improvements and bug fixes can be found in our release notes section.

You welcome to give it a go and let us know what you think.

Tuesday, March 8, 2011

Data Grid Querying, Revisited

There has been a great deal of talk lately about the new EHCache cache querying capabilities and the advantages of real-time analytics through in-memory cache querying. I find that rather odd since extensive querying and processing capabilities have been around for years with in memory data grids like GigaSpaces XAP, Oracle Coherence, Gemstone GemFire and more recently Hazelcast and GridGain. So I don’t really understand the big fuss around EHCache finally supporting it….

But that’s actually a great opportunity to revisit some of the work we’ve done in our recent 8.0 release in the context of querying. There are two main features we’ve introduced in 8.0 that take data grid querying to the next level.

Complex Object Querying

The first one is complex object graph querying. Simply put, you can now index and any type of property, at any nesting level, of your data grid objects: primitive types, user defined types, or collections/arrays of either.
Think of an Author object, which has a collection of books, each of them containing a collection of user reviews.
Here are just two examples of what you can achieve with XAP 8.0:
• List all the Authors that wrote at list one sci-fi book:
Author[] authors = gigaSpace.readMultiple(new SQLQuery<author>(Author.class,
"books[*].genre = ‘Sci-Fi’ ORDER BY name",Integer.MAX_VALUE);
• List all the Authors who have at least one book on which your friend on Facebook commented:
Author[] authors = gigaSpace.readMultiple(new SQLQuery<author>(Author.class,
"books[*].comments[*].username= ?", “myFacebookFriend”, Integer.MAX_VALUE);
This goes hand in hand with our new Document support, so you can apply the above to schema free documents as well as just plain Java or .Net objects.

GigaSpaces JPA

The second feature is our all new JPA support. We believe this to be a major step towards ease of use and standardization of distributed data store access.

There have been a lot of discussions in the developer community about how the “classic” querying and interaction models (such as SQL/JDBC and JPA) can be mapped to the world of distributed data stores (which in memory data grids are a part of).

When it comes down to querying, each distributed data store defines its own querying language and semantics. Some are completely proprietary (e.g. EHCache’s new query DSL, or MongoDB’s query syntax), some are modeled after standard SQL to some extent or the other (e.g. GigaSpaces’ native querying support, Coherence’s recent QL, or GridGain’s query language).
All of these seem be solid query implementations (naturally you can guess which one I like best :) ).

But when developing to each of those products, you need to learn their interaction model, data access semantics (such as transaction management) and of course querying language.

The first take at bridging the two worlds, and IMO pretty successful one considering the inherent difficulties in bridging the two models (see this presentation I’m giving at QCon London in two days), was actually done by Google almighty with the JPA support in AppEngine.
Although not 100% standard (actually not even close), it gives you as a developer a very easy and familiar way to implement scalable data access on top of their BigTable data store, and a very clear migration path for putting your apps on AppEngine, should you choose to do so. It also makes your code a lot more portable since JPA, even a partial version of it, is still way more standard and portable than any proprietary API.

GigaSpaces JPA – Design Principles

When we initially thought of implementing a JPA interface to our data grid, we had the following goals in mind:

Make our user’s life easier by providing a standard way for the to interact with the data grid that will allow them to easily get up to speed with new applications, and port existing JEE applications more smoothly and quickly.
For that, we created a thin layer on top of the excellent OpenJPA framework so that configuring a GigaSpaces EntityManagerFactory is a breeze if you’re already into JPA in general and OpenJPA in particular. We also managed to support a nice set of features from the JPA specification (actually more extensive than Google’s AppEngine JPA) so that most of the stuff users actually do with JPA is covered. Naturally, this is just our first take at this huge project, and we’ll add more and more capabilities as we move forward.
Enable our users to leverage the power and scalability of the data grid to scale their data access, even when they’re using JPA (which was originally designed for a centralized data model). This means exposing content based routing and data grid indexing to their JPA domain model, and abstracting away cluster wide transactions and queries.
Protect our users from making wrong implementation decisions when modeling and accessing their distributed data. In Google AppEngine’s terminology, this means supporting Owned Relationship only such that object graphs are not scattered across many nodes and are always limited to a single node.
Allow our users to use our powerful native APIs for functions that are not covered by the JPA specification, e.g. event handling, distributed task execution, and more. So it actually means that you can use the JPA API, while still operate on the same data model and entities via our native POJO based API. This is a very powerful concept that covers not only JPA but all of our data access APIs, we call Same Data, Any API and it means that you can operate on the same data using a variety of APIs – besides our native POJO API and JPA. For example, you can read a JPA entity as a document (in case you want to treat it in a more loosely flexible manner, or use our C++ client side API from native clients to read this data.

See for Yourself

We have just published an comprehensive tutorial that explains the principles of the our JPA implementation using the good old Spring PetClinic sample app (we’ve added our own flavor to it :) ). It explains the data model considerations and shows how take advantage of data grid features such as space based remoting to optimize the data access of the application and use our

Tuesday, December 7, 2010

TV Made Right :)

We've just created a new channel on YouTube called GigaSpaces TV. It will feature interviews and tech talks on various aspects of the product, and you also get to see me unshaved in the first episode ;). In it I give a short overview of the new APIs we're coming our with in our 8.0 release.

Stay tuned by subscribing to the channel!

Enjoy,

Uri

To Scale or Not to Scale - the Recording

The recording of my session at Devoxx 2010 is now available online at parleys.com
At this point you need a subscription to view it in full (which I highly recommend anyway since there's a lot of valuable content there), but it will be made fully public over coarse of the next few months.

Enjoy,

Uri

Friday, November 5, 2010

Yes, Sql!

See below the slide deck from my session today at QCon SF titled "Yes, Sql!" (hopefully you liked it if you attended :) ). It focuses on how the classic querying models like plain SQL and JPA map to distributed data stores. It first reviews the current distributed data stores landscape, and its querying models (K/V, Column, Document), and discusses the wide range of APIs for data extraction from these data stores. It then discusses the main challenges of mapping various APIs to a distributed data model as we've done over the past few months in GigaSpaces, and the trade offs to be aware off.

Yes, Sql!

View more presentations or Upload your own.

The direct link is http://www.slideboom.com/presentations/232665/Yes,-Sql!

Tuesday, October 19, 2010

Our Citrix Integration Demo

This week we’re attending the Interop conference in New York to present our integrated solution with Citrix Netscaler and XenServer. The solution enables GigaSpaces XAP applications to utilize the Netscaler load balancer AND XenServer infrastructure to dynamically and automatically scale applications (e.g. a standard JEE web app) based on real time application load.

The following screen cast demonstrates how a standard JEE web application (in this case the Spring framework PetClinic demo application) is dynamically scaled on the Citrix SoftLayer cloud.

It runs on a number of virtual hosts which in turn run the Netscaler load balancer, the Web container and the MySql database.

The demo shows how the web application automatically scales out when the load increases. The scale out process includes the following stages:

The system identifies that the average load has crossed a certain threshold.
The system dynamically starts a new Xen virtual machine to host a new instance of the web application. This VM includes the GigaSpaces Agent component which enables XAP to dynamically start new JVM to host another web application instance.
A new web application instance is provisioned to the newly started JVM
The Netscaler load balancer configuration is automatically updated to reflect the new web container and route traffic to it
The average load goes back to normal since the traffic is not evenly balanced across the old and new web container.

The demo also shows how the system automatically recovers from a forced virtual machine failure by re-instantiating the virtual machine and the GigaSpaces components on it, and then re-provisioning the missing web application instance onto it.

Enjoy,
Uri

Tuesday, October 27, 2009

Join Us for the New XAP Security Framework Webinar

You’re welcome to join our XAP security webinar on Wednesday, October 28.

In this webinar I’ll will explain the motivation behind our revamped security framework and overview its capabilities. The session will also include a live demo in which I’ll show how to secure the GigaSpaces runtime environment in a few simple steps, and how to enable security for a typical application without changing a single line of code or configuration.

The webinar will be held at 9am PDT, 12pm EST, 5pm CET. Please click here to register

Hoping to see you tomorrow,
Uri

Tuesday, September 15, 2009

GigaSpaces XAP 7.0.1 Is Out!

I’m happy to announce that today we’ve released our latest version, XAP 7.0.1. I would like to share with you a few details about it. Version 7.0.1 is a service pack release on top of version 7.0.0 and as such is backwards compatible with it. In addition to a number of bug fixes on top of 7.0.0, It includes some very interesting features and improvements around 3 main areas: Enterprise grade security, improved usability and APIs and better troubleshooting and monitoring capabilities. In addition to that, we’ve also worked on further optimizing the performance of our data grid implementation. Here’s a short description of the release highlights:

Security: 7.0.1 introduces a revamped security implementation for the best possible data grid security. This implementation was specifically designed to support enterprise and cloud data grid scenarios in which the data grid is accessed by multiple applications and serves as a central data repository. In terms of relevance to the cloud, it brings a new level of data security which enables you to safely store data in cloud environments without having to worry about security hazard related to storing your data off premises. In addition to transport level security for all data grid related communication (based on SSL), it includes support for users and roles, with a comprehensive permissions system to enforce authorization for every operation, starting from the management of the GigaSpaces infrastructure, processing units and space (data grid) instances and ending with content based authorization to access the data grid contents and operate on it. The new security implementation is fully supported by the various management interfaces (GUI, CLI and administration API) and also provides open APIs for integration with 3^rd party user registries. For more details please refer to this page. We will also dedicate a separate blog post to this important aspect of the XAP 7.0.1.
API and Usability:
- All new space based remoting implementation for XAP.NET. Space based remoting has been around since version 6.0 in XAP for Java and extends the Spring remoting stack to provide a wealth of benefits in comparison to traditional remoting implementations, such as: high availability of exposed services, transparent client side failover, location transparency, load balancing across services in the cluster, map/reduce support and asynchronous invocation (please refer to “The Service Virtualization Framework” white paper in our whitepapers section if you’d like to further learn about the benefits of space based remoting). As of version 7.0.1, it is also available for our XAP.NET users with all the goodies which are part of the Java implementation. The documentation of this feature can be found here.
- XAP.NET built processing unit container. In order to further ease and simplify the processing unit development and deployment experience in XAP.NET, we've implemented the Basic Processing Unit Container, which is a smart built in implementation of the processing unit container interface. This container automatically starts and manages GigaSpaces related components for you (such as space instances, event containers and remote service endpoints), relieving you from the need to explicitly manage the life cycle of those components and allowing you to focus on your application’s business logic.
- Easier cache eviction policy configuration. This is a nice little configuration improvement that makes our users’ life easier when configuring the cache eviction policy. In previous releases, the recommended way to control the eviction policy was by using space properties. In 7.0.1, this has been made much more elegant with native XML namespace and Java code configuration support. Here’s an example of how you would have configured LRU eviction in your pu.xml and in code in 7.0.0 and 7.0.1:
- Extended Indexing at the property level: Extended indexing allows you to index properties of objects written to the space with a BTree index, thus allowing for range queries (based on the property type’s natural ordering). Prior to 7.0.1, extended indexing was only available at the class level, which means you had to either use the extended index for all the properties of a certain class or for none at all (and make do with basic indexing). In 7.0.1, we’ve enabled extended indexing at the property level, so you can now choose the right indexing scheme for each property. For example, the identifier property of a certain class would typically be indexed with a basic index, which does not have a sense of order between indexed values and is therefore more lightweight and faster than extended indexing. For another date/time or number property, you would use the extended indexing since if you would like to perform range queries (e.g. all the objects with date property before 1/1/2009). This can be configured via annotations, e.g. @SpaceProperty(index=IndexType.Extended) or via XML, e.g.
  <property name="lastName" index="EXTENDED"/>.
Performance improvements for embedded operations. As with every release, we have dedicated quite a lot of time to further optimize areas which we knew could be improved. In this release we have focused on performance of embedded space operations and by optimizing the concurrency of internal thread and object pools we have been able to present improvements of up to 200% for embedded space operations. The following graphs show across-the-board improvements in comparison to 7.0.1 (these were measured with 8 threads concurrently accessing the space). It is important to mention that these improvements were achieved without any sacrifice to the consistency and correctness of the space operations:
Better troubleshooting and monitoring:
- More information in log files: We are continuing to improve the quality and amount of information exposed via our log files. 7.0.0 introduced some major logging improvements (per-pu log messages, improved file naming scheme, time-based rollover policies and more). In 7.0.1 we’ve added some more helpful information to help you tune and troubleshoot the system. The first one is GC awareness. You can define a GC pause upper threshold (10 seconds by default). In case the garbage collection process takes longer, the system will log this for future reference. In addition, we’ve added logging for the recovery process (which occurs when starting a backup space instance and recovering the data from the primary) so you can now tell exactly what happened during the process and how long it took
- Replication statistics in administration API. This is an important addition to the administration API which enables you to monitor the rate at which objects are replicated and whether or not there’s an issue with the replication mechanism. For more details refer to the org.openspaces.admin.space.SpaceInstanceStatistics interface(specifically, the getReplicationStatistics() method).

As always, we would be happy to hear your feedback on the new release (you can send it directly to feedback7.0 at gigaspaces dot com). For the complete release content please consult to the official 7.0.x release notes (refer to the notes relevant for 7.0.1).

Wednesday, August 5, 2009

StockDemo and GigaSpaces Petclinic for XAP 7.0 Now Available

If you’ve been following our webinars and demos in the last year or so, in many of them we’ve used two applications to demonstrate the product. The first one is the stock demo, which is an AJAX enabled Spring MVC web application that communicates with the Space and displays real time stock quotes. The second one is the good old Spring PetClinic application with the data access layer accessing GigaSpaces instead of a relational database, and persisting asynchronously to a MySql database (via Hibernate).

I’ve been planning for quite some time now to port both application to XAP 7.0 due to the packaging and deployment improvements of this version.

I finally found the time to do this after finalizing the 7.0 release. You may now find the up to date version of both applications on our forge site, OpenSpaces.org.

Note that both applications are also available as one-click demos on our cloud framework. The 7.0 compatible-version will soon be made available on this framework as well.

Enjoy,
Uri

Saturday, July 25, 2009

Join Us for the 7.0 Overview Webinar

You’re all welcome to join our 7.0 introduction webinar next Tuesday (July 28). In this technical update and live demo, I will go over the new and noteworthy in the 7.0 release.

Joining me in this webinar will be Shay Banon, our System Architect (who also runs the Compass project in his spare time).

I will explain how GigaSpaces XAP combines a robust in-memory data grid, the Jetty web container and a grid-based business logic execution framework to form a single, easy-to-use platform on which you can build and run extremely scalable enterprise applications.

Key release highlights which I’ll cover in this webinar include performance and scalability improvements, new monitoring and administration API, new data grid APIs and cloud integration. These will be illustrated through the use of live demos.

Shay will describe a real life use case in which XAP 7.0 was used to implement a distributed search functionality on a social network and drastically improve the performance of the application.

Please click here to register. Note that for your convenience the webinar will be held twice, on two separate schedules, to accommodate for different time zones.

Sunday, May 24, 2009

Building GigaSpaces Apps in Minutes with Giga Systems Builder

If you're into MDA, there are some great news for you. One of our partners, New Technology/enterprise (NT/e) has created a very impressive tool to build and deploy end-to-end GigaSpaces applications in minutes. The tool is called GigaSystemsBuilder (GSB).
GSB is an Eclipse plugin which makes it quick and easy for Java developers to model, code and deploy GigaSpaces applications.
Since it's based on well known and battle-proven patterns on top of the GigaSpaces XAP platform, it reduces learning time and on-going development effort, and also increases the likelihood of successful projects.
It covers the entire application life cycle, and enables you to model, code, build, run and even deploy the application on the cloud (using our cloud framework) in a few clicks from within your IDE.
The tool is free for development and deployment, and comes with optional paid support package from NT/e.
A good place to start exploring this tool is this 10-minute screen cast.

Enjoy,
Uri

Sunday, May 10, 2009

UI & Manageability Improvements in XAP 7.0

As GigaSpaces XAP 7.0 release is getting closer, it's time to overview some of the important new features and enhancements of this version. In this post I will focus on the manageability aspects of this release. I'll review the general concepts behind the management interfaces, namely the GUI and the new Administration API, and will also touch on these two interfaces.
I will dedicate separate posts to each of them in future to keep things in a digestible size.

One of the main issues with previous releases was that the tooling behind the product provided a lot of information about the GigaSpaces runtime, but the data was not fully organized according the product's domain model, and the tools did not expose all of the capabilities that you would have expected. In this release we have gone a great to deal to make sure the management interfaces are comprehensive, coherent, easy to use and accurately reflect to product's domain model.

The GigaSpaces Domain Model
GigaSpaces is very simple to understand if you look at things hierarchically, starting from the physical infrastructure and ending at the application level.

The physical infrastructure level: At the bottom level, you have the physical infrastructure, which includes the physical (or virtual, if you're running on the cloud) hosts. Information about this level includes CPU, memory and network utilization, and naturally machine level properties such as operating system, processor architecture, etc.

The process/JVM level: The next level is the JVM level, which hosts three main components:

The GigaSpaces container (GSC) - this is a lightweight container, which hosts deployed application instances (e.g. a data grid instance, a Spring application or a web application instance). It's in fact a meta container, which exposes SLA and manageability capabilities on top of other lightweight containers such as Spring or Jetty.
The GigaSpaces manager (GSM) - this is the actual deployment manager. You deploy your application to the GSM, which takes care of provisioning application instances to the running GSCs and mobilizing the deployed application binaries to these GSCs. It is also responsible for monitoring the state of the deployed applications and using the GSCs to enforce the SLA requirements, e.g. take action if a certain application instance fails
The GigaSpaces agent (GSA) - New in 7.0. Think of it as a daemon or a background service which simply start and stops the other services (GSM, GSC) and receives remote commands from the management interface (GUI or admin API)

Information about this layer includes JVM systems properties, JVM stats (available heap size, number threads, GC stats, etc.) and information specific to the above components, such as the applications managed by a specific GSM or the application instances running on a certain GSC. This layer can also be actively managed, e.g. start, stop and restart components.

The processing unit (application) level: The next level is the processing unit level. A processing unit is, simply put, the deployment and packaging unit. A certain application can be composed of one or more processing units. A processing unit can take multiple forms, but in general we distinguish between 3 types of processing units:

Data only, which only contains a data grid
Business logic only, which contains application code which may or may not access a data grid, but does not host the actual data grid instance
Data and business logic, which contains both the data grid and business logic, which in most cases interacts with the embedded data grid in memory to achieve the lowest possible latency the highest level of scalability

Once deployed, the processing unit has one or more instances, e.g. a data only processing unit with 5 space partitions would have 5 instances. If we define a backup to each of the partitions then it would have 10 instances. Typically, the processing unit is configured via Spring, with custom GigaSpaces namespace bindings to reflect the SLA associated with it and the GigaSpaces components it may contain. Furthermore, in version 6.6, we have also implemented processing units which are plain JEE web applications, Mule applications or even .Net applications. This can be thought of as a mere extension to the basic processing unit model (after all, all of the above also run business logic and may or may not contain a space).
Information about this layer includes the processing unit's name, type, number of instances, its contents, e.g. space instances, web application instances, exposed remote service and so on.
It also includes runtime statistics like the throughput of space operations, web requests per second if it's a JEE web app, and many more runtime characteristics.
Operations that can be taken on this layer include deployment, undeployment, relocation from one GSC to another, restart of a certain instance, increment/decrement the number of instances, etc.

The following diagram depicts the above (click to enlarge):

The management interfaces
Let's start with the admin API. This is a completely new API, which gives you full control over the entire lifecycle of the GigaSpaces runtime components.

Using this API, you can start and stop any GigaSpaces component (literally, at the JVM process level!) or even your own custom components, deploy and undeploy your processing units, and monitor almost any aspect you can think of using either polling or event based notifications.

Monitorable entities are all the ones mentioned above, namely the physical hosts (CPU, memory, network utilization), JVMs (number of threads, available heap memory, object count, etc.), deployed processing units (transactions committed, space data, JEE requests stats) or even the underlying network layer.

The main benefit of this API, besides it's cleanliness and completeness, is that opens a variety of options to extend the core product functionality, e.g. implement sophisticated cluster-side bootstrapping, easily integrate with any 3rd party monitoring tools, implement automatic scaling for your web application (as shown in my latest webinar) and much more. But that's something I'll cover in detail in a separate post.

As for the GUI, it also caters for all of the above, but allows to view everything graphically (and as they say, a picture is worth a thousand words). We used the excellent Balsamiq Mockup tool to quickly create sketches of the way we want to see things, and from there the job went fairly smoothly. The user interface is now divided into 3 main tabs:

Hosts: which gives a physical view of things, but also shows the entire component hierarchy described above
Applications: which focuses on the logical side of things, namely deployed processing unit and the components they contain
Space browser: a pure data-grid only view to enable you to focus on data grid aspects

Naturally, besides viewing things you are also able to operate on the cluster and do all the above mentioned operations using the UI (.e.g. restart a certain JVM, relocate a running processing unit instance from on host to another, or even increase the number of processing unit instances).
The following image shows how we went from mockup to real implementation (in this case showing the "Hosts" view):

I encourage you to give it a try and let us know what you think. Naturally this milestone 8 and not the final release so there are a few more tweaks and improvements we will add, but this can give you the general sense of things.

Cheers,
Uri

Wednesday, March 25, 2009

My Session at the Banking and Finance Technology Forum in HK

You're welcome to view my session from the annual Banking and Finance Technology Forum held last week in Hong Kong. The session is 20 minutes long, and deals with the challenges in developing and deploying applications on the cloud as opposed to your local IT environment.
I want to take the opportunity and thank our partners at Cluster Technologies for setting up a very nice booth and making all the arrangements for the conference.
As always, HK was action packed. Besides the conference, I had some very good meetings with various customers and prospects that focused on our upcoming 7.0 release and the GigaSpaces-Mule integration package which seems to be gaining quite a nice momentum these days.

Enjoy,
Uri

Sunday, February 22, 2009

Scalable SOA with Mule & GigaSpaces

If you missed the joint Webinar Ken Yagen of MuleSource and I did last week, you can find the recorded version here. The demo application shown during the webinar can be downloaded here.
The demo application requires the following to be installed on your machine:

Please note that the Mule/GigaSpaces integration is also available in the latest GigaSpaces GA release (6.6.x), but XAP 6.6.x only supports the Mule 2.0 branch and not 2.1.
The documentation for the integration can be found here (6.6.3) and here (7.0 EA).
If you have any question please drop me a note here or use our online forums .

I would like to take the opportunity and thank Ken and the rest of the team at MuleSource for hosting this webinar and giving us the chance to present the joint solution to the community.

Uri

Tuesday, February 10, 2009

Scalable SOA - with Mule & GigaSpaces

You are welcome to join Ken Yagen, Sr. Director of Engineering at MuleSource, and myself for our joint webinar on scaling your SOA implementation with GigaSpaces and Mule. The webinar will take place on Wednesday, February 11th 2009, at 9am PT / noon ET / 6pm CET.
In this webinar we will introduce the Mule & GigaSpaces joint solution and discuss the underlying details around how this integration works. We will also discuss a real life use case and present a short demo that shows the benefits of this integration.

See you there!

Uri

Tuesday, December 30, 2008

XAP 7.0 Is on the Move!

After the release of GigaSpaces XAP 6.6, we are now deep into the planning and development of our next major release, 7.0. This release, which is due in mid 2009, will include a few major themes, such as significantly better administration and monitoring capabilities, network resources optimizations, highly optimized and flexible local cache, reduced memory footprint, improved deployment model and support for deployment on the cloud.

If you've been following us in the past few years, you know that our R&D is practicing SCRUM for quite a while now as part of the development process. Our head of R&D, Guy Nirpaz, has talked about it numerous times to both our users and the development community at large. The nature of SCRUM and the fact that it's composed of sprints enable us to share recently developed features with our community, as we've done in the past for versions 6.5 and 6.6.

Our sprints last two weeks, at the end of which we publish the results of the sprint as an early access milestone release. Our 7.0 early access program already includes the first two milestone releases, which you can download and play with. We will be happy to hear any feedback you have. The first two milestones include the following highlights:

Significantly improved cache eviction policies (most notably a new LRU implementation) which improves LRU performance by a factor of 10 to 20 times (depending on the operation) for read operations without impacting write operation performance
Support for time based window scenarios by maintaining the entry lease as part of the POJO instance. You can now annotate a field with the @SpaceLeaseExpiration annotation and the space will use it to store and retrieve the lease expiration time for the instance. This value can later be propagated to an external data source. Based on it, expired instances can be filtered out of the space when loaded from the database.
The UI now enables you to the processing unit elements and the space cluster that belong to it in the same tab, so you have a coherent and intuitive view on your application components in one location. It also contains a summary view of all the space cluster details which enables you to quickly understand how your space cluster is functioning and what state it's in, as shown below:
Ultra Fast Native local cache for XAP .Net - we have implemented our very own ConcurrentHashMap like data structure in .Net, which enables you to enjoy a performance of millions of ID based read operations per second
SQL query optimizations: the space now processes SQL queries with OR staements in parallel, taking advantage of multi-core environments for reduced overall query times
Improved CLI deployment support: your deployment command will now only return after all application instances have been provisioned, enabling you to create complex scripts to deploy multiple dependant processing units

Naturally, there's a lot more than meets the eyes, and we're continuing to work on new and exciting stuff. You can stay tuned using our 7.0 early access page.

Happy new year,
Uri

Saturday, November 29, 2008

Some Things that Kept Us Busy Lately

It's been a quite while since I last posted, but things have been more active than ever here at GigaSpaces. I'm writing this on the way back from Tokyo and HK, where I've had a few days packed with meetings with partners, prospects and customers (and of course good food and drinks…). It sometimes amazes me what people actually do and plan to do with our product, but I must admit that this time I was really blown away by some of the things I saw. Besides some very interesting meetings with number of financial institutions (many of which actually want to take advantage of the current crisis seeking to upgrade and improve their systems so that they'll be ready when things pick up), I also met companies from other sectors that are taking the product to uncharted territories, but in a good sense. Just to give you a taste of things (hopefully I will be able to discuss them in detail in a separate post), one of our customers implemented a system to share complex 3D models between designers and perform various types of simulations on it, doing real time sharing through GigaSpaces. Since they chose .Net as the implementation platform for this application, they used GigaSpaces XAP .Net to perform the model sharing, and when you see in your eyes how a complex and details 3D model is loaded from one client and gradually appears on a bunch of other machines, this is truly impressive.
It also made me proud to see that many of our new features are adopted and used widely. Which brings me to the main topic – what kept us busy lately.
First of all, we have released our 6.6 branch in September, which might well be the first version to truly offer an end-to-end application platform. Some of the things. Here's a glimpse of what it offers:

Standard JEE web application support (via the Jetty web container)
An optimized, native .Net distribution (named XAP .Net), with full SBA capabilites
A brand new task processing API (including peer classloading capabilities)
Support for dynamic language invocation across the grid
Further enhancements of our remoting capabilities
Maven integration
Improved annotation support for configuring messaging and remoting components
Asyncronous operations
Out of the box integration with the Mule ESB
Seamless interoperability between Java, .Net and C++
UI improvements
Optimizations, optimizations, optimizations…
Revamped documentation web site

I encourage you all to give it a try and provide us your feedback. You can find the complete list of changes and enhancements here.
Besides the official 6.6 release (we've just released 6.6.2 - which I'll discuss in a separate post), we have also invested significant effort in integrating GigaSpaces with cloud and virtualization vendors. The integration package with Amazon EC2 enables you in a single click to provision EC2 instances, install GigaSpaces XAP on them, start GigaSpaces containers and deploy your application to them. Furthermore, it can also provision a MySql database that sits on top of Amazon EBS, and an apache load balancer incase you deployed a web application. Monitoring is done through our stadard user interface, which runs on the cloud and is displayed automatically as a local window on your pc. And the nice part of it is that it's all done via a web application, so no special installation is required to use it. We are currently at beta stages with this offering, and will soon make it publicly available. In fact, we already have quite a few customers that are using XAP on EC2 in production.

On another front, we have completely revamped our training offering (this is a good chance to thank Tricode, our Dutch partners, for taking this on and providing high quality results).The syllabus and details will soon be made available on our web site. The general idea was to create a modular training for various target audiences besides the usual core training (which was also completely rewritten). We offer both on site and public training. We will publish the next available schedule soon.

On the performance benchmarks front, Shay Hassidim, out deputy CTO, has been conducting numerous benchmarks in the last few months, which cover many aspects such as scalability, latency, web application performance and more. These benchmarks are posted from regularly in our company blog, and we have categorized them all for your convenience under one category. We will of course keep doing these benchmarks to provide more insights to our customers and prospects about how the product behaves in various scenarios.

That's it for this time - plane is about to land and they'll take my laptop if I don't close it :)

Sunday, July 20, 2008

The Space as a Messaging Backbone - Why You Should Care

One of the main themes of GigaSpaces 6.5 XAP release (which is also true for our direction as a company in general), is the fact that we consider ourselves as a very complete application platform, and not just an IMDG solution. This has been evident in prior versions of the product, by with 6.5 we believe we got even closer to that goal, and that in many cases we can replace altogether a JEE application server.
We are taking this quite seriously, and actually invested a lot in making sure that this vision is also realized. A major part of this effort is a project we took upon ourselves to test and document the migration process from a typical JEE application to full Space Based Architecture. To keep the comparison as unbiased as possible, we delegated the actual coding and testing work to a team of professionals at Grid Dynamics (which BTW did a great job).
They first implemented a typical OLTP application using JEE (JMS, EJB, Hibernate, Spring) deployed it on a leading app server and tested the application in terms of latency, throughput, CPU utilization, etc. Next, they migrated the application in a few steps to full SBA.
Each step was documented and tested to assess its affect on the system and validate the benefits it gives. The steps were as follows:

Add 2nd level cache to Hiberante, which didn't require any code change.
Change the data acccess layer to use the space instead of the database, with data base write behind (or Persistency as a Service) in the background
Change the messaging layer to use the Space instead of the app server's JMS implementation
Implement full SBA by collocating business logic, data and messaging in the same JVM by utilizing GigaSpaces processing units

Interestingly, steps 1 and 2 provided nice improvement in latency, but almost none in the throughput front, as shown here. After analyzing these results, we realized that the main bottleneck for throughput increase was the application server's messaging infrastructure, which uses the disk to maintain resiliency for JMS "persistent messages" and is working in an "active-standby" topology (which means only one node is handling messages at any given moment).
When replacing it with the Space, we saw a major increase in throughput, as seen in steps 3 and 4 in the graph above. Another important fact which is not evident from the graph, is that GigaSpaces is much more scalable since the Space can be partitioned across multiple JVMs, and message load is shared between partitions (unlike most MOMs, which end up writing to the disk and typically use the active-standby topology).

Who Else Should Care
Besides ordinary JEE applications, this information is also very relevant to ESB-based applications. When implementing such applications, most people overlook the above, and use messaging servers or even worse, relational databases to handle the transport for their ESB based applications. For example, 65.3 % of Mule users use JDBC for transport, 48.4% use JMS, 30% use MQ Series for tarnsport (accroding to the 2007 Mule user survey).
So you see the benefit in using GigaSpaces as a transport layer instead of the traditional solutions.
With GigaSpaces 6.5, we have built in the integration with Mule into the product, so you can enjoy the GigaSpaces scalable transport (and other integration points) out of the box.
In addition, there are a number of other ESB integrations available.
You can find a very complete integration package with Apache ServiceMix in our community web site, OpenSpaces.org. Also, here's a nice integration project of JavaSpaces and Apache Camel which also enables you to use GigaSpaces as a transport layer with Camel.

Going back to the migration project I mentioned above, we plan to publish the application itself and the results we came up with in the process so people can check it our for themselves and understand how we did things. So stay tuned, there's a lot more interesting stuff to follow.

My TechTalk at TSS.com

You're all welcome to watch my talk at TSSJS Prague about the challenges in scaling web 2.0 applications. It's now online at TSS.com

Thursday, May 10, 2012

What It Takes to Implement Massive Scale Event Processing

XAP 9.0 for Big Data Event Processing

See It in Action!

What’s Next?

Thursday, April 7, 2011

Tuesday, March 8, 2011

Complex Object Querying

GigaSpaces JPA

GigaSpaces JPA – Design Principles

See for Yourself

Tuesday, December 7, 2010

Friday, November 5, 2010

Tuesday, October 19, 2010

Tuesday, October 27, 2009

Tuesday, September 15, 2009

Wednesday, August 5, 2009

Saturday, July 25, 2009

Sunday, May 24, 2009

Sunday, May 10, 2009

Wednesday, March 25, 2009

Sunday, February 22, 2009

Tuesday, February 10, 2009

Tuesday, December 30, 2008

Saturday, November 29, 2008

Sunday, July 20, 2008

About Me

My Twitter Updates

Blog Archive