This can be a visitor publish co-written with Mike Mosher, Sr. Principal Cloud Platform Community Architect at a multi-national monetary credit score reporting firm.
I work for a multi-national monetary credit score reporting firm that provides credit score danger, fraud, focused advertising, and automatic decisioning options. We’re an AWS early adopter and have embraced the cloud to drive digital transformation efforts. Our Cloud Heart of Excellence (CCoE) crew operates a worldwide AWS Touchdown Zone, which features a centralized AWS community infrastructure. We’re additionally an AWS PrivateLink Prepared Companion and supply our E-Join answer to permit our B2B clients to connect with a variety of merchandise by non-public, safe, and performant connectivity.
Our E-Join answer is a platform comprised of a number of AWS companies like Software Load Balancer (ALB), Community Load Balancer (NLB), Gateway Load Balancer (GWLB), AWS Transit Gateway, AWS PrivateLink, AWS WAF, and third-party safety home equipment. All of those companies and sources, in addition to the massive quantity of community visitors throughout the platform, create numerous logs, and we would have liked an answer to mixture and manage these logs for fast evaluation by our operations groups when troubleshooting the platform.
Our authentic design consisted of Amazon OpenSearch Service, chosen for its means to return particular log entries from intensive datasets in seconds. We additionally complemented this with Logstash, permitting us to make use of a number of filters to counterpoint and increase the info earlier than sending to the OpenSearch cluster, facilitating a extra complete and insightful monitoring expertise.
On this publish, we share our journey, together with the hurdles we confronted, the options we considered, and why we went with Amazon OpenSearch Ingestion pipelines to make our log administration smoother.
Overview of the preliminary answer
We initially needed to retailer and analyze the logs in an OpenSearch cluster, and determined to make use of the AWS-managed service for OpenSearch referred to as Amazon OpenSearch Service. We additionally needed to counterpoint these logs with Logstash, however there was no AWS-managed service for this, so we would have liked to deploy the appliance on an Amazon Elastic Compute Cloud (Amazon EC2) server. This setup meant that we needed to implement loads of upkeep of the server, together with utilizing AWS CodePipeline and AWS CodeDeploy to push new Logstash configurations to the server and restart the service. We additionally wanted to carry out server upkeep duties resembling patching and updating the working system (OS) and the Logstash software, and monitor server sources resembling Java heap, CPU, reminiscence, and storage.
The complexity prolonged to validating the community path from the Logstash server to the OpenSearch cluster, incorporating checks on Entry Management Lists (ACLs) and safety teams, in addition to routes within the VPC subnets. Scaling past a single EC2 server launched concerns for managing an auto scaling group, Amazon Easy Queue Service (Amazon SQS) queues, and extra. Sustaining the continual performance of our answer turned a big effort, diverting focus from the core duties of working and monitoring the platform.
The next diagram illustrates our preliminary structure.

Doable options for us:
Our crew checked out a number of choices to handle the logs from this platform. We possess a Splunk answer for storing and analyzing logs, and we did assess it as a possible competitor to OpenSearch Service. Nonetheless, we opted towards it for a number of causes:
- Our crew is extra conversant in OpenSearch Service and Logstash than Splunk.
- Amazon OpenSearch Service, being a managed service in AWS, facilitates a smoother log switch course of in comparison with our on-premises Splunk answer. Additionally, transporting logs to the on-premises Splunk cluster would incur excessive prices, eat bandwidth on our AWS Direct Join connections, and introduce pointless complexity.
- Splunk’s pricing construction, based mostly on storage in GBs, proved cost-prohibitive for the quantity of logs we meant to retailer and analyze.
Preliminary designs for an OpenSearch Ingestion pipeline answer
The Amazon crew approached me a few new function they have been launching: Amazon OpenSearch Ingestion. This function provided an awesome answer to the issues we have been going through with managing EC2 situations for Logstash. First, the brand new function eliminated all of the heavy lifting from our crew of managing a number of EC2 situations, scaling the servers up and down based mostly on visitors, and monitoring the ingestion of logs and the sources of the underlying servers. Second, Amazon OpenSearch Ingestion pipelines supported most if not all the Logstash filters we have been utilizing in our present answer, which allowed us to make use of the identical performance of our present answer for enriching the logs.
We have been thrilled to be accepted into the AWS beta program, rising as certainly one of its earliest and largest adopters. Our journey started with ingesting VPC move logs for our web ingress platform, alongside Transit Gateway move logs connecting all VPCs within the AWS Area. Dealing with such a considerable quantity of logs proved to be a big job, with Transit Gateway move logs alone reaching upwards of 14 TB per day. As we expanded our scope to incorporate different logs like ALB and NLB entry logs and AWS WAF logs, the dimensions of the answer translated to greater prices.
Nonetheless, our enthusiasm was considerably dampened by the challenges we confronted initially. Regardless of our greatest efforts, we encountered efficiency points with the area. By collaborative efforts with the AWS crew, we uncovered misconfigurations inside our setup. We had been utilizing situations that have been inadequately sized for the quantity of knowledge we have been dealing with. Consequently, these situations have been consistently working at most CPU capability, leading to a backlog of incoming logs. This bottleneck cascaded into our OpenSearch Ingestion pipelines, forcing them to scale up unnecessarily, even because the OpenSearch cluster struggled to maintain tempo.
These challenges led to a suboptimal efficiency from our cluster. We discovered ourselves unable to investigate move logs or entry logs promptly, typically ready days after their creation. Moreover, the prices related to these inefficiencies far exceeded our preliminary expectations.
Nonetheless, with the help of the AWS crew, we efficiently addressed these points, optimizing our setup for improved efficiency and cost-efficiency. This expertise underscored the significance of correct configuration and collaboration in maximizing the potential of AWS companies, finally resulting in a extra optimistic consequence for our information ingestion processes.
Optimized design for our OpenSearch Ingestion pipelines answer
We collaborated with AWS to boost our general answer, constructing an answer that’s each excessive performing, cost-effective, and aligned with our monitoring necessities. The answer includes selectively ingesting particular log fields into the OpenSearch Service area utilizing an Amazon S3 Choose pipeline within the pipeline supply; various selective ingestion can be executed by filtering inside pipelines. You need to use include_keys and exclude_keys in your sink to filter information that’s routed to vacation spot. We additionally used the built-in Index State Administration function to take away logs older than a predefined interval to scale back the general price of the cluster.
The ingested logs in OpenSearch Service empower us to derive mixture information, offering insights into developments and points throughout the complete platform. For added detailed evaluation of those logs together with all authentic log fields, we use Amazon Athena tables with partitioning to shortly and cost-effectively question Amazon Easy Storage Service (Amazon S3) for logs saved in Parquet format.
This complete answer considerably enhances our platform visibility, reduces general monitoring prices for dealing with a big log quantity, and expedites our time to determine root causes when troubleshooting platform incidents.
The next diagram illustrates our optimized structure.

Efficiency comparability
The next desk compares the efficiency of the preliminary design with Logstash on Amazon EC2, the unique OpenSearch Ingestion pipeline answer, and the optimized OpenSearch Ingestion pipeline answer.
| Preliminary Design with Logstash on Amazon EC2 | Authentic Ingestion Pipeline Resolution | Optimized Ingestion Pipeline Resolution | |
| Upkeep Effort | Excessive: Resolution required the crew to handle a number of companies and situations, taking effort away from managing and monitoring our platform. | Low: OpenSearch Ingestion managed many of the undifferentiated heavy lifting, leaving the crew to solely preserve the ingestion pipeline configuration file. | Low: OpenSearch Ingestion managed many of the undifferentiated heavy lifting, leaving the crew to solely preserve the ingestion pipeline configuration file. |
| Efficiency | Excessive: EC2 situations with Logstash might scale up and down as wanted within the auto scaling group. | Low: On account of inadequate sources on the OpenSearch cluster, the ingestion pipelines have been consistently at max OpenSearch Compute Models (OCUs), inflicting log supply to be delayed by a number of days. | Excessive: Ingestion pipelines can scale up and down in OCUs as wanted. |
| Actual-time Log Availability | Medium: With a view to pull, course of, and ship the massive variety of logs in Amazon S3, we would have liked numerous EC2 situations. To avoid wasting on price, we ran fewer situations, which led to slower log supply to OpenSearch. | Low: On account of inadequate sources on the OpenSearch cluster, the ingestion pipelines have been consistently at max OCUs, inflicting log supply to be delayed by a number of days. | Excessive: The optimized answer was in a position to ship numerous logs to OpenSearch to be analyzed in close to actual time. |
| Value Saving | Medium: Operating a number of companies and situations to ship logs to OpenSearch elevated the price of the general answer. | Low: On account of inadequate sources on the OpenSearch cluster, the ingestion pipelines have been consistently at max OCUs, rising the price of the service. | Excessive: The optimized answer was in a position to scale the ingestion pipeline OCUs up and down as wanted, which saved the general price low. |
| General Profit | Medium | Low | Excessive |
Conclusion
On this publish, we highlighted my journey to construct an answer utilizing OpenSearch Service and OpenSearch Ingestion pipelines. This answer permits us to concentrate on analyzing logs and supporting our platform, while not having to assist the infrastructure to ship logs to OpenSearch. We additionally highlighted the necessity to optimize the service as a way to improve efficiency and scale back price.
As our subsequent steps, we purpose to discover the lately introduced Amazon OpenSearch Service zero-ETL integration with Amazon S3 (in preview) function inside OpenSearch Service. This step is meant to additional scale back the answer’s prices and supply flexibility within the timing and variety of logs which can be ingested.
In regards to the Authors
Navnit Shukla serves as an AWS Specialist Options Architect with a concentrate on analytics. He possesses a robust enthusiasm for aiding purchasers in discovering invaluable insights from their information. By his experience, he constructs revolutionary options that empower companies to reach at knowledgeable, data-driven selections. Notably, Navnit Shukla is the achieved creator of the e book titled “Knowledge Wrangling on AWS.” He might be reached by way of LinkedIn.
Mike Mosher is s Senior Principal Cloud Platform Community Architect at a multi-national monetary credit score reporting firm. He has greater than 16 years of expertise in on-premises and cloud networking and is obsessed with constructing new architectures on the cloud that serve clients and resolve issues. Outdoors of labor, he enjoys time together with his household and touring again residence to the mountains of Colorado.
