Employing Elasticsearch for Log Automation in Hybrid Cloud
Elasticsearch is an open-source, distributed, and RESTful real-time search and analysis engine. Elasticsearch is built on top of Apache Lucene, which is a proven, high-performance, full-featured text search engine library written entirely in Java.
Powered by Lucene, Elasticsearch reliably and securely acquires data from any given source, in any given format and enables you to search, analyse and visualize it in real-time. It is used to index and analyse log data that is shipped by Logstash using the regex-based parser, Grok. Elasticsearch accepts the JSON data input from Grok, indexes it into binary format which is then optimized for fast distributed searches.
Corestack Multi Cloud Operations Manager helps standardize and automate IT Processes through its unique cloud as code approach. It provides a unified framework to operate and manage infrastructure, application and services effectively. Through the Operations Manager it automates operations of Elasticsearch in Hybrid Cloud
CoreStack takes care of operations automation required in managing the log life cycle in Logstash and Elasticsearch, such as Provisioning, Deployment, Configuration management, Monitoring, Backup, Archival and Purge.
Operations Automation in Elasticsearch
The diagram below represents the log operations that occur in the hybrid cloud setup.
There are cases where we need to have an Elasticsearch cluster setup done dynamically for setting up new environment or for indexing data for a certain period of time. Corestack offers templates to provision Elasticsearch cluster dynamically in multiple cloud environment including AWS, Openstack and Azure.
Managing Logstash agent configuration in one or two servers is easy, but the complexity and effort increases when we have to manage configuration in a larger number of servers. Corestack provides the ability to manage configurations of Logstash to a group of servers, defined as inventory in the system. It enables configuration management through leading tools such as Chef or Ansible or Puppet.
Monitoring health of Elasticsearch
CoreStack provides templates to monitor the health of infrastructure, services and performance of Elasticsearch cluster. The monitoring tracks the availability and utilization of infrastructure including compute, storage and network.
Autoscaling Elasticsearch Cluster
The scale up and scale down of Elasticsearch is very much required to meet the dynamic demands of indexing. The scaling policy defined based on the monitoring data to perform scaling in the cluster.
In case Elasticsearch has reached a high search load capacity, that is when it blinks red/yellow, the monitoring alert triggers an action to scale the cluster by adding an extra node, thus providing the system extra compute power and memory.
The atomic scaling unit for an Elasticsearch index is the shard. It makes sense then that to increase the size of the Elasticsearch cluster, the number of shards be increased. However, the number of primary shards cannot be changed after an index is created in Elasticsearch. Therefore, a new node is added, to increase the number of shard replicas. This modification to the Elasticsearch cluster can be done by firing a CURL call to the ES REST API endpoint.
By adding an extra node to the Elasticsearch cluster we can immediately distribute the search to the extra compute and memory that comes with the new node. Also, if the demand is low, the number of shards in use can be reduced by decreasing the number of nodes.
Log data shipped using Logstash
The log data collected by Rsyslog is shipped using Logstash into the Elasticsearch instance that is running on the private cloud. However, there is a catch here. Rsyslog collects information from various sources, and this data is present in various formats. For further processing and indexing, it is vital for the data to be parsed. For this, the regex-based Grok plugin is used. Grok parses the data into JSON, which is then sent to Elasticsearch.
Log data is essential for debugging, tracking and compliance. Also, it is important that the log data be managed properly to keep the system from crossing the storage threshold. The source logs must be rotated, archived and purged to handle storage in servers. Corestack provides scripts to configure the log rotation policies for various log sources across the system.
Log Snapshots and Archival
Elasticsearch creates snapshots of indices and moves them to archival storage. This strategy helps in reducing the footprint of the data that needs to be searched within Elasticsearch. Having lesser indices or smaller index sizes will require lesser compute and memory, thereby lowering the cost of the solution.
CoreStack’s ES template ensures that Elasticsearch takes such snapshots on regular intervals and archives them in Cloud Cold Storage. It ensures that the backup is maintained in a different location and the compliance related rules are adhered to.
Retrieval and analysis of log data
For further analysis, organizations can retrieve the log data from the object storage and apply analytical tools to gain additional insights from the data.
Data value decreases over time. Hence to manage storage better, a cycle needs to be in place. Corestack enables you to have a custom schedule for triggering such cycles. Once the data is no longer required for processing, it should be purged to economically manage the storage space. Not only economics, even legal requirements necessitate purging of data after a specific time period.
Snapshots of Elasticsearch indices can be purged via a normal file delete, based on the requirements. Be it local storage or cloud, Corestack’s Elasticsearch template can easily delete unnecessary data from storage on a regular basis. Also, coupled with its AWS plugin, Corestack can easily handle the cleanup of the corresponding index files from the S3 buckets.