Wednesday, November 25, 2015

Hardware Sizing for Java/JEE Products [Solr/Lucene Based]

For doing a hardware sizing on Java/JEE; especially where indexing frameworks like Lucene are involved - many more attributes add to the final sizing tabulation. [This entry does not include anything on Database Sizing]

Before we get ahead with a sizing exercise, we need to understand that the following will impact the accuracy of the metrics.

1. Decision on Exact Version of Runtime, Used Frameworks, Servers,
2. Experience of the Senior Engineer/Architect doing the Sizing Exercise
3. Understanding the Functional Characteristics of the System being Built
4. Agreeing upon the Non-Functional Characteristics of System being Built
5. Appreciating Sizing Environments [Development, Testing, Production, UAT,...]
6. The Future Extensions or Possible Lifeline of the System Being Built


The following are the most important standard guidelines and criteria for hardware or server sizing. Please note that these guidelines are for the server that hosts the Application. It does not contain any Database Sizing guidelines.  
  •  Hardware Component should operate at no more than 80% Utilization 
  •  Processor and Memory Resources should be allocated for Maximum User Load 
  •  User Think times and Network Latency should be taken into Account 
  •  Number of Potential Users and Number of Concurrent Users 
  •  Service Time and Average Response Time of your Application

If you are using Solr/Lucene type of indexing or disk-based frameworks; then it is important that you estimate the entire possible index size by deducing the number of documents, number of indexed fields and number of stored fields and also the average size of each document. By considering some buffer, you may be able to compute, almost accurately, the Disk Space. While computing the Estimated Memory Requirements for Solr/Lucene; additionally; the number of Unique Terms per Field also need to be considered. In the references below, I have provided a sheet (that has been made publicly available by 'Lucidworks'). It will also provide you all the attributes that you 'tune' to your Memory Sizing and Disk Space Requirements for Solr/Lucene, especially with respect to 'Caching of Query Terms'.

While doing sizing exercise, you can provide various tabulated forms as the result for each of the possible environments.  Alternatively, you may choose to present a single tabulated result (mentioning the environment for which you are providing this sizing). You may mention, the additional constraints that may be applicable across environments. It is important the buffer may be added to each of the computed attributes; keeping in mind the cost and future extensibility. Most of them come to a conclusion that "Hardware is Inexpensive these Days - We can Recommend something that is Beyond the Best Possible Maximum Load". Though this may work almost always, we may not be able to come out with a "Possible Minimum Estimate with Least Cost". Coming out with the the estimates; keeping latter in mind, will equip us to better understand the future issues that various functional and non-functional aspects may cause. This is especially if we want to achieve maximum efficiency under the constraints for all possible 'Loads'. For example, if we were to achieve this ('latter') in the 'Development, Testing or User Acceptance Testing' environments; we may be able ot point out that Memory Leak that would have manifested itself due to an Incorrect Development Practice or Deployment Strategy. Sometimes, we may also end giving a "Inflated Estimate" for an otherwise "Size-S System"; the resources which may always lie unused - if we go with the former approach.

Before I take you to the tabulation there are a few terms, that need definition (from the text book). They are very often assumed and the slight difference in their actual meanings may be better to know as it is.

User Think Time:  The time the user is not engaged in actual use of the processor (The time between Requests). This is used interchangeably with User Wait Time. In absolute real-life however, this has a slightly different impact as it involves the 'Time Required by an User for thinking and performing his next action in the application either due to the response or otherwise'.
Response Time: The response time measured at the client under load. (Average of Time). 
Concurrent Users:  The number of users measured on the server, taken in snapshots from the Server Status or Server Console.
Service Time:  The elapsed time to complete the operation measured for a single user.
Maximum User Load: The maximum number of concurrent users that may be expected or the system is tested for.
User Wait Times: The time elapsed between actions or clicks for a given user. This is used interchangeably with User Think Time. In absolute real-life however, this has a slightly different impact as it involves the 'Time Required by an User for analyzing or reading data received between request and also performing other tasks such as reading email, using the telephone, and chatting with a colleague or on other Applications simultaneously Running'. If we were to go deeper into Software Testing and Performance - Both of these may be put to great use to improve user experience and/or performance.
CPU Utilization: Average of the Total CPU Utilization as a Percentage.


The final tabulated Hardware Sizing Recommendation for the Java/JEE Product will look like the following: (One Table is shown gere for 'Development' environment and consideration for 'Production/UAT' environments provided below).


The Load Balancing, Data Clustering, Failover Strategy and Backup Strategy are not planned for, due to the nature of the System.

FIELD NAME
FIELD TYPE
Type of Environment
Development [/Testing]
Type of Machines
Physical [/Virtual]
Number of Servers
1x
Operating System
Red Hat Enterprise Linux - Linux X.Y.ZZ-AAA.BB.C.eRR.xpp_bb OS
Application Server
Weblogic ??c (Weblogic ??.?.?*)
Load Balancing
[NONE]
Data Clustering
[NONE]
Failover Strategy
[NONE]
Database Connections
10 [maxActive], 02 [maxIdle]
Backup Strategy
[NONE]
Processors
4 Cores
Concurrency
~500 Concurrent Users 
[Including Think Times]
Memory / RAM
4GB
Garbage Collection
Generational Garbage Collector [-XX:UseG1GC]
Disk Capacity
[Reasons]
    
   Lucene Indexing
~10GB SSD [/HDD]  
[Logs, Indexes, Dependencies, +Buffer]  

~300MB [Worst Case, +Buffer]
Java Heap Size

   Lucene
   Second Level Caching
Dedicated Machine [-Xms=??g -Xmx=??.?g]
 
~100MB [Worst Case, +Buffer]
 
~000MB [NONE]
 
This recommendation is for the Development Environment. It is best that the above is used / emulated for any of Development or Testing. For Production or User Acceptance Testing environments, the considerations (with our recommendations in brackets) related to Storage Capacity [500GB SSD], Storage Redundancy [RAID], Processor Cores [08+], Total Memory (RAM) [08GB+], Application Failover Strategy [Active-Active with 4x Physical Servers] should best match with Other Organizational or Hardware Tier Standards.


I am giving you the following links, which can be used as reference to get the best results:
https://docs.oracle.com/html/A90444_01/sizing.htm#1032856
https://technet.microsoft.com/en-us/library/cc181325.aspx
http://lucidworks.com/blog/2011/09/14/estimating-memory-and-storage-for-lucenesolr/
http://lucidworks.com/blog/2009/09/02/scaling-lucene-and-solr/
https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls 



Happy Hardware Sizing for Java/JEE Products!


[Note: I am a Software Development Architect, working for a US based Software Product Company and this write-up is based on the work done as part of Special Product Customization for a Big Logistics Customer, as well as for later use in the Product Itself]. 

No comments: