09. 01. 2024 Matteo Cipolletta Unified Monitoring

Reassign Elasticsearch ILM Policy with Python

Index Lifecycle Management (ILM) policies constitute a fundamental component in Elasticsearch index management. They enable users to define the life stages of an index, determining when and how specific actions, such as transitioning from a “hot” to a “cold” state or deleting obsolete indices, should occur. ILM policies empower users to ensure the optimal distribution of resources and the effective maintenance of indices over time.

Elasticsearch provides the flexibility to define customized ILM policies, allowing developers and users to adapt index management to the specific needs of their system.

You can find more details at: ILM: Manage the index lifecycle | Elasticsearch Guide [8.10] | Elastic

The purpose of this blog post is show the development of a Python script designed to simplify and expedite the process of bulk reassignment of ILM policies in Elasticsearch. The script aims to provide a practical, automated tool for dynamically managing the ILM policies of indices that may have a wrong policy assigned.

Through the use of this script, users can easily address ILM policy optimization without manual intervention on each index, saving time and resources. Its implementation is particularly valuable in scenarios where modifications need to be applied to multiple indices simultaneously, or in situations requiring high scalability.

Dependencies

To address this use case, the Python script needs the following libraries:

  • json
  • re
  • requests
  • urllib3
  • argparse

and optionally:

  • logging
  • pprint (to format Elasticsearch error output)

The core idea

To effectively switch and reassign an index’s lifecycle policy, we need to follow these steps:

  1. Check if the ILM policy we want to assign exists on Elasticsearch
  2. Check if the rollover_alias and indexing_complete fields are present
    1. Eventually assign rollover_alias to indices that don’t have it
  3. Remove a currently existing ILM policy from the index we want to change
  4. Assign the new ILM policy to the index

To avoid leaving indices in an unknown state, it’s imperative that we perform error handling and save the skipped indices (with problems) for later investigation.

The steps

Be sure to use the admin certificate for authentication, as it has the necessary privileges to modify indices and assign ILM policies:

headers = {"Content-Type": "application/json"}
cert_path = "/neteye/local/elasticsearch/conf/certs/"
cert_file = "admin.crt.pem"
key_file = "private/admin.key.pem"

Then parse the arguments with argparse:

# Arguments definition
parser = argparse.ArgumentParser(description="This script is intended to reassign ILM Policy to the indexes on Elasticsearch")
parser.add_argument('-v', '--verbose', help="enable verbose mode", action='store_true')
parser.add_argument("-p", "--policy", dest="ILMPolicy", type=str, required=True, metavar="my_policy", help="The ILM Policy that will be assigned to the matching indexes")
parser.add_argument("-i", "--indexes", dest="Indexes", nargs="+", required=True, metavar="myindex", help="The list of indexes separated by a blank space. Example: --indexes myindex1 myindex2")
policy = args.ILMPolicy
indexes = args.Indexes
skipped_indices = []             # Array of skipped indices
URL = "https://elasticsearch.neteyelocal:9200"

Check if the ILM policy that we want to assign exists in Elasticsearch:

## Check if ILM Policy exists
http_response = requests.get(f"{URL}/_ilm/policy/{policy}",
                             headers = headers,
                             cert = (cert_path + cert_file, cert_path + key_file),
                             verify = False)

Fetch the rollover_alias and indexing_complete fields, and prepare the payload:

for index in indexes:
    # We need to preserve the rollover_alias and indexing_complete, so we need to query elasticsearch to get that info first
    response = requests.get(f"{URL}/{index}/_settings",
                            headers = headers,
                            cert = (cert_path + cert_file, cert_path + key_file),
                            verify = False)
        if response.status_code == 200:
            # Preparing the payload since some info can be missing
            rollover_alias = None
            try:
                rollover_alias = response.json()[index]["settings"]["index"]["lifecycle"]["rollover_alias"]
                except:
                    # If rollover alias is not found, we need to assign a new one (everything before the date in the index name is the rollover_alias)
                    rollover_alias = re.match(r"^(.+)-\d{4}\.\d{2}\.\d{2}.*", index).group(1)
                indexing_complete = None
                try:
                    indexing_complete = response.json()[index]["settings"]["index"]["lifecycle"]["indexing_complete"]
                except:
                    indexing_complete = None
                if not rollover_alias:
                    logging.warning(f"rollover_alias for index: {index} is not set, skipping since this can cause issue with ILM rollover")
                    skipped_indices.append(index)
                else:
                    if indexing_complete:
                        payload = {
                            "index.lifecycle.name": policy,
                            "index.lifecycle.rollover_alias": rollover_alias,
                            "index.lifecycle.indexing_complete": indexing_complete
                        }
                    else:
                        payload = {
                            "index.lifecycle.name": policy,
                            "index.lifecycle.rollover_alias": rollover_alias
                        }

Once we’ve prepared the payload for the new ILM policy request, we need to remove the old ILM policy from the individual indices:

response = requests.post(f"{URL}/{index}/_ilm/remove",
                         headers = headers,
                         cert = (cert_path + cert_file,cert_path + key_file),
                         verify = False)

Then, if removal is successful, we can perform the assignment of the new ILM policy to the index:

response = requests.put(f"{URL}/{index}/_settings/",
                        data = json.dumps(payload),
                        headers = headers,
                        cert = (cert_path + cert_file, cert_path + key_file),
                        verify = False)

And there you have it, a script that can reassign the ILM policy in bulk, to multiple indices.

Some Notes

In the interest of brevity and to maintain focus on the core functionality of the script, I’ve omitted certain aspects related to HTTP response code handling and the implementation of comprehensive error try-catch mechanisms. While handling HTTP response codes and incorporating robust error handling are crucial components of production-ready scripts, their exclusion in this demonstration aims to streamline the code presentation for clarity and simplicity.

In a production environment, I highly recommend that you incorporate thorough error handling mechanisms, including the interpretation of HTTP response codes and the implementation of try-catch blocks to gracefully manage unexpected issues. These safeguards enhance the reliability and robustness of the script, ensuring that it can gracefully handle various scenarios and provide meaningful feedback in case of errors.

For the sake of this illustrative example, I assume that users will adapt and extend the script to include comprehensive error management tailored to their specific deployment environment and use cases.

Always consider best practices for error handling when deploying scripts in production to guarantee the script’s stability and resilience in real-world scenarios.

For this reason, be sure to perform verifications on the response.status_code field after each request, and add logging to present a better output. Also, you may want to consider skipping those indices on which Elasticsearch returns some kind of error and add them to the skipped_indices array with:

skipped_indices.append(index)

And then at the end of the script you can print the skipped indices so you can investigate any problems later:

# Printing the skipped indices
if len(skipped_indices) > 0:
    print(f"Skipped indices:")
    print(skipped_indices)

These Solutions are Engineered by Humans

Are you passionate about performance metrics or other modern IT challenges? Do you have the experience to drive solutions like the one above? Our customers often present us with problems that need customized solutions. In fact, we’re currently hiring for roles just like this as well as other roles here at Würth Phoenix.

Matteo Cipolletta

Matteo Cipolletta

I'm an IT professional with a strong knowledge of Security Information and Event Management solutions. I have proven experience in multiple Enterprise contexts with managing, designing, and administering Security Information and Event Management (SIEM) solutions (including log source management, parsing, alerting and data visualizations), its related processes and on-premises and cloud architectures, as well as implementing Use Cases and Correlation Rules to enable SOC teams to detect and respond to cyber threats.

Author

Matteo Cipolletta

I'm an IT professional with a strong knowledge of Security Information and Event Management solutions. I have proven experience in multiple Enterprise contexts with managing, designing, and administering Security Information and Event Management (SIEM) solutions (including log source management, parsing, alerting and data visualizations), its related processes and on-premises and cloud architectures, as well as implementing Use Cases and Correlation Rules to enable SOC teams to detect and respond to cyber threats.

Latest posts by Matteo Cipolletta

15. 03. 2024 APM, Log-SIEM, NetEye
Unleashing Elastic APM: Containerized Scalability Explored
See All

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive