top of page

An Early Look at Amazon Security Lake: Centralizing and Analyzing Security Data

  • Writer: Nithin Janardhanan
    Nithin Janardhanan
  • Apr 16, 2023
  • 7 min read

Amazon Security Lake is a powerful, scalable, and cost-effective solution for centralizing and analyzing security data. This service enables organizations to store, transform, and analyze data from various AWS services, making it easier to manage security information and event management (SIEM) solutions. In this blog, we'll explore the core features of Amazon Security Lake, the supported AWS services, retention management, and integration with AWS Security Hub and AWS Organizations. We'll also discuss how to use Amazon Security Lake to reduce data ingestion costs and improve your overall security posture.


Challenges with AWS Security Hub

AWS Security Hub offers valuable insights and a centralized overview of your organization's security posture. However, there are certain limitations that may impact its effectiveness in specific scenarios. For example, the dashboard update interval could be shorter to provide more real-time information during incident situations. Additionally, security engineers might encounter issues due to data format inconsistencies between vendors.

AWS enables the adoption of third-party datasets to enhance cloud account cybersecurity, which is conveniently integrated into the Security Hub security score (e.g., through GuardDuty). However, customers with specific datasets or a need for near real-time detection might find Security Hub's capabilities insufficient.


The Need for Data Standardization and Automation in Cybersecurity

As AI plays an increasingly important role in automating tasks, standardizing and automating data in cybersecurity becomes even more critical. Currently, many cybersecurity teams and analysts still manually clean and transform data, leading to fatigue, burnout, and reduced productivity. While automating the remediation of a Security Hub finding can be straightforward, dealing with complex attack scenarios requires a unique approach that may not be predictable. Incorporating AI-based tools can be a game-changer in these scenarios.


Amazon Security Lake Features and Functionality

Security Lake runs extract, transform, and load (ETL) jobs on raw source data, converting the data to Apache Parquet format and the OCSF schema. After processing, Security Lake stores source data in an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account in the AWS Region that the data was generated in. Security Lake creates a different Amazon S3 bucket for each Region in which you enable the service. Each source gets a separate prefix in your S3 bucket, and Security Lake organizes data from each source in a separate set of AWS Lake Formation tables.


Utilizing Lake Formation for Enhanced Security and Data Management

Amazon Lake Formation is a key component of Amazon Security Lake, providing a centralized platform for managing and securing your data lake. By integrating Lake Formation, Security Lake can offer the following benefits:

  • Fine-grained access control: Lake Formation allows you to define granular permissions for accessing your data, ensuring that only authorized users and applications can access sensitive information.

  • Data Cataloging: Lake Formation automatically catalogs your data, making it easy to discover, understand, and query your data across multiple AWS services.

  • Data Transformation: With Lake Formation, you can automate data transformation processes, converting raw data into queryable formats like Apache Parquet and the OCSF schema. This simplifies data management and reduces the need for manual data wrangling.

  • Multi-tenancy isolation: Lake Formation enables multi-tenancy isolation, allowing you to securely manage data from multiple accounts and sources within a single data lake.

Supported AWS Services

Amazon Security Lake can collect logs and events from the following natively-supported AWS services:

  • AWS CloudTrail management and data events (S3, Lambda)

  • Amazon Route 53 resolver query logs

  • AWS Security Hub findings

  • Amazon Virtual Private Cloud (Amazon VPC) Flow Logs

Retention Management

To manage your data cost-effectively, you can specify retention settings. Since Security Lake stores your data as objects in Amazon S3 buckets, the retention settings correspond to an Amazon S3 Lifecycle configuration, allowing you to specify your preferred storage class and the time period that objects will stay in that storage class before transitioning to a different storage class.

In Security Lake, you set retention settings at the Region level. For example, you might choose to transition all objects in a specific Region to the S3 Standard-IA storage class 30 days after they're written to the data lake. The default Amazon S3 storage class is S3 Standard.


Integration with AWS Security Hub

AWS Security Hub provides a comprehensive view of your security state in AWS and helps you check your environment against security industry standards and best practices. Security Hub collects security data from across AWS accounts, services, and supported third-party partner products, enabling you to analyze your security trends and identify the highest priority security issues.

When you integrate Amazon Security Lake and Security Hub, you receive Security Hub findings in Security Lake. Security Hub findings become a source that Security Lake subscribers can consume, helping you analyze your security posture. When you enable Security Hub, it begins to consume, aggregate, organize, and prioritize findings from AWS services that you have enabled, such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie.


Combining Security Hub and Security Lake

Security Hub and Security Lake are complementary services designed to strengthen your organization's security posture. While Security Hub focuses on compliance and snapshotting the overall security posture, Security Lake is centered around data management through its CONA approach (Centralize, Optimize, Normalize, Analyze).

A key difference between the two services is their data format: Security Hub uses the proprietary ASFF (AWS Security Finding Format), while Security Lake adopts the open-source OCSF. Both services can integrate with third-party providers and unload your development team from building custom integrations with individual AWS services such as Macie, GuardDuty, or Inspector.

Using Security Hub and Security Lake together creates a powerful combination that bolsters your security defenses. Security Hub can be seen as "Present Posture," focusing on general principles, guidelines, and security hygiene. In contrast, Security Lake represents "Future Posture," providing a foundation for predictive automations that leverage clean and homogenized data.


Multi-Account Support with AWS Organizations

You can use Amazon Security Lake to collect security logs and events from multiple AWS accounts. To help automate and streamline the management of multiple accounts, we strongly recommend that you integrate Security Lake with AWS Organizations.


Cost Benefits of Amazon Security Lake compared to Splunk

One of the main advantages of using Amazon Security Lake is the potential for cost savings compared to traditional SIEM solutions like Splunk. Splunk is a powerful platform for log management and analysis, but it can be expensive, especially when ingesting large amounts of data. In contrast, Security Lake offers a more cost-effective solution that scales with your organization's needs.

The cost benefits of using Amazon Security Lake compared to Splunk are as follows:

  • Data Storage: Security Lake stores data in Amazon S3, which offers lower storage costs compared to Splunk's proprietary storage solution. Additionally, Security Lake uses the Apache Parquet format and the open-source OCSF schema, which allows for better data compression and reduced storage costs.

  • Data Ingestion: Splunk charges based on the volume of data ingested, which can be expensive for organizations with large amounts of data. With Security Lake, you can use AWS services like Amazon Kinesis Data Firehose, Amazon SQS, or Amazon S3 to ingest data at a lower cost.

  • Flexibility: Security Lake allows you to utilize AWS services like Amazon Athena and Amazon SageMaker for analytics and machine learning, potentially reducing the need to ingest data into Splunk for these purposes. This provides flexibility in terms of data usage and analysis while reducing costs associated with data ingestion in Splunk.

  • Retention Management: Security Lake enables you to set retention policies on a per-region basis, allowing you to optimize storage costs based on your organization's specific needs. This feature allows you to transition data to more cost-effective storage classes in Amazon S3, further reducing storage costs.

Querying Data in Amazon Security Lake

To query data in Amazon Security Lake, the Lake Formation data lake administrator must grant SELECT permissions on the relevant databases and tables to the IAM role that queries the data. A subscriber must also be created in Security Lake before it can query data. Once the necessary permissions and subscribers are created, users can query the data using the Amazon Athena service, which is integrated with Security Lake.

Here are some examples of the types of data that can be queried in Amazon Security Lake and the insights they can provide:

  • CloudTrail data: Unauthorized attempts against AWS services, all IAM activity in specific accounts during a specific time range, instances where a certain credential was used

  • Route 53 resolver query logs: A list of DNS queries and source information for a specific time range, a list of DNS queries that didn't resolve, a list of DNS queries that resolved to a certain IP address

  • Security Hub findings: All findings where the severity is greater than or equal to 'MEDIUM' and 'NEW', all non-informational findings, all findings where the resources are Amazon S3 buckets

  • Amazon VPC Flow Logs: Traffic originating from a specific IP and port, count of distinct destination IP addresses, traffic originating from a specific network range

What is OCSF?

The Open Cybersecurity Schema Framework (OCSF) is a collaborative, open-source initiative by AWS and industry partners aimed at establishing a common schema for security events. It also defines versioning criteria and includes a self-governance process for security log producers and consumers. OCSF's source code is publicly available on GitHub.

Security Lake automatically converts logs and events from supported AWS services to the OCSF schema. The converted data is stored in an Amazon S3 bucket, with one bucket per AWS Region in the user's account. Custom sources can also write to Security Lake, but must follow the OCSF schema and Apache Parquet format. Subscribers can interpret logs and events as generic Parquet records or use the OCSF schema event class for a more accurate interpretation of the information in a record.


Conclusion

Amazon Security Lake is a game-changing service that simplifies the management of security data across AWS accounts and services. By centralizing and analyzing your security data, you can reduce data ingestion costs, enhance your security posture, and gain greater flexibility in working with your data. Although it's still in preview, Amazon Security Lake promises to be a valuable addition to your security toolkit.


References:

Security Lake Presentation - Re:Invent 2022: https://www.youtube.com/watch?v=V7XwbPPjXSY

How to get started and manage Amazon Security Lake with AWS Organizations: https://www.youtube.com/watch?v=fKGhscpwN-k

CONTACT ME

  • Black LinkedIn Icon

Thanks for submitting!

This form no longer accepts submissions.

Enterprise Architect

Phone:

Available on request

Email:

© 2023 by Nithin Janardhanan.

bottom of page