Friday, July 16, 2010

Enterprise Applications And Mid-tier Caching

Enterprise Applications And Mid-tier Caching
Here's an interesting take on how an enterprise application can be made to achieve high performance, scale and high availability, using mid-tier caching.

Modern enterprise applications are being pushed to become faster, with a higher scale. Consequently, the 3-tier architecture model is changing to move data closer to the applications and to integrate data from several sources.
In addition, requirements of a higher scale are forcing applications to tolerate staleness and move towards weaker forms of consistency. These changes are being facilitated by:

* Hardware trends in 64-bit architecture, multi-core processors and inexpensive memory
* Ubiquitous data ‘in motion and everywhere'; and
* More and more enterprise applications require a faster and inexpensive scale, along with resilience to failure.

Welcome to the world of ‘mid-tier caching', also known as ‘cache tier' or simply ‘caching'.


The next generation applications are composite, aggregating data and business logic from sources that can be local, federated, or cloud-based. Data and applications can reside in different tiers with different semantics and access patterns. For example, data in back-end servers/clusters or in the cloud tends to be authoritative; data on the wire is message-oriented; data in the mid-tier is either cached data for performance or application session data; and data on the devices could be local data or data cached from back-end sources.

With the costs of memory going down, fairly large caches can be configured on the desktop and server machines. With the maturity of 64-bit hardware, 64-bit CPUs are becoming mainstream for client and server machines. True 64-bit architecture support dramatically increases memory limits (to bytes). For example, desktops can be configured with 16 GB (gigabyte) RAM, and servers can be configured with up to 2 TB (terabyte) of RAM. Large memory caches allow for data to be located close to the application, thereby providing significant performance benefits to applications. In addition, in a world where hundreds of gigabytes of storage is the norm, the ability to work with most data in memory (large caches) and easily shift from tables to trees and on to graphs of objects, is key to programmer productivity for next-generation applications. While the query-based access is a requirement, the data access, management requirements and semantics of caches are different from what commercial DBMSs support.

Over_view_of_distributed_cacheCaching in applications
As described earlier, data can reside in different tiers (in different service boundaries) with different semantics. For example, data stored in the backend database is authoritative and requires a high degree of data consistency and integrity. Typically, there tends to be a single authoritative source for any data instance. Most data in the mid-tier, being operated by the business logic, can tend to be a copy of the authoritative data. Such copies are suitable for caching. Understanding the different types of data and their semantics in different tiers, helps define the different degrees of caching that is possible Microsoft MCTS Training.

Reference data

Reference data is a version of the authoritative data. It is either a direct copy (version) of the original data or aggregated and transformed from multiple data sources. Reference data is practically immutable-changing the reference data (or the corresponding authoritative data) creates a new version of the reference data. This means that every reference data version is unique. Reference data is an ideal candidate for caching; as the reference data does not change, it can be shared across multiple applications (users), thereby increasing the scale and performance.
Consider a product catalogue application aggregating product information across multiple backend applications and data sources. ReferenceData

The most common operation on the catalogue data is read (or browse). A typical catalogue browse operation iterates over a large amount of product data, filters it, personalises it, and then presents the selected data to the users. Key-based and query-based access is a common form of operation. Caching is a critical requirement for catalogue access. If not cached, operations against such an aggregate catalogue require the operations to be broken up into operations on the underlying sources -- invoke the underlying operations, collect responses, and aggregate the results into cohesive responses. Accessing the large sets of backend data for every catalogue operation can be prohibitively expensive, and can significantly impact the response time and throughput of the application. Caching the backend product data closer to the catalogue application can significantly improve the performance and the scalability of the application. Similarly, aggregated flight schedules are another example of reference data.

Referenced data is refreshed periodically, usually at pre-configured intervals, from its sources, or refreshed when the authoritative data sources change. Access to reference data, though shared, is mostly read. Local updates are often performed for tagging (to better organise the data). To support a large scale, reference data can be replicated in multiple caches on different machines in a cluster. As shown below, in a social networking scenario, details of the friends list, usernames associated with a particular ID/user login, etc would be such reference data that is easy to cache and provides high scalability Microsoft MCITP Certification.

No comments:

Post a Comment