AWS Athena를 이용하여 ELB 액세스 로그 분석하는 방법

안녕하세요 오늘은 BESPIN GLOBAL SRE실 정민아님이 작성해주신 ‘AWS Athena를 이용하여 ELB 액세스 로그 분석하는 방법’ 에 대해 소개해드리도록 하겠습니다.

구성 전
구성
정상 동작 확인

1. 구성 전

1-1. 필수 조건

로드 밸런서의 액세스 로그 활성화 및 S3 Bucket 내 적재되어 있어야 합니다.
Athena Query 실행 결과를 저장할 S3 Bucket 이 존재하여야 합니다

1-2. 로드 밸런서의 액세스 로그 활성화

로드 밸런서 선택 > 속성 > 편집 > 모니터링 항목 내 액세스 로그 활성화

로드 밸런서 선택 > 속성 > 편집 > 모니터링 항목 내 액세스 로그 활성화

액세스 로그란?
- 로드 밸런서에 전송된 요청에 대한 자세한 정보를 캡처하는 액세스 로그를 제공합니다.
- 각 로그에는 아래와 같은 정보가 포함되어 있어서 트래픽 패턴을 분석할 수 있습니다.
- [요청받은 시간, 클라이언트 IP 주소, 지연 시간, 요청 경로 및 서버 응답 정보]

1-3. Athena Query 실행 결과 저장할 S3 Bucket 선택

Amazon Athena > 쿼리 편집기 > 설정 > 관리 > 쿼리 결과의 위치 선택

Amazon Athena > 쿼리 편집기 > 설정 > 관리 > 쿼리 결과의 위치 선택

2. 구성

2-1. ALB 로그의 테이블 생성

하기의 CREATE TABLE 명령문을 복사하여 쿼리 편집기에 기입 후 실행합니다.

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
 type string,
 time string,
 elb string,
 client_ip string,
 client_port int,
 target_ip string,
 target_port int,
 request_processing_time double,
 target_processing_time double,
 response_processing_time double,
 elb_status_code int,
 target_status_code string,
 received_bytes bigint,
 sent_bytes bigint,
 request_verb string,
 request_url string,
 request_proto string,
 user_agent string,
 ssl_cipher string,
 ssl_protocol string,
 target_group_arn string,
 trace_id string,
 domain_name string,
 chosen_cert_arn string,
 matched_rule_priority string,
 request_creation_time string,
 actions_executed string,
 redirect_url string,
 lambda_error_reason string,
 target_port_list string,
 target_status_code_list string,
 classification string,
 classification_reason string
 )
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
 WITH SERDEPROPERTIES (
 'serialization.format' = '1',
 'input.regex' = 
 '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-
9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" 
\"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" 
\"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
 LOCATION 's3://your-alb-logs-directory/AWSLogs/<ACCOUNT-
ID>/elasticloadbalancing/<REGION>/

alb_logs: 쿼리 실행 시 해당 네임으로 테이블 등록이 됩니다.
LOCATION ‘s3://your-alb-logs-directory/AWSLogs/<ACCOUNT

ID>/elasticloadbalancing//’ : 로드 밸런서의 Access Log 가 저장된 S3 Bucket 경로를 기입합니다

2-2. 파티션 프로젝션을 사용하여 ALB 로그의 테이블 생성

저장된 객체가 많을수록 람다의 실행 시간이 비례함에 따라서 파티션을 나눠 테이블을 생성할 수 있습니다.

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs_2023_12 (
 type string,
 time string,
 elb string,
 client_ip string,
 client_port int,
 target_ip string,
 target_port int,
 request_processing_time double,
 target_processing_time double,
 response_processing_time double,
 elb_status_code int,
 target_status_code string,
 received_bytes bigint,
 sent_bytes bigint,
 request_verb string,
 request_url string,
 request_proto string,
 user_agent string,
 ssl_cipher string,
 ssl_protocol string,
 target_group_arn string,
 trace_id string,
 domain_name string,
 chosen_cert_arn string,
 matched_rule_priority string,
 request_creation_time string,
 actions_executed string,
 redirect_url string,
 lambda_error_reason string,
 target_port_list string,
 target_status_code_list string,
 classification string,
 classification_reason string
 )
 PARTITIONED BY
 ( day STRING
 )
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
 WITH SERDEPROPERTIES (
 'serialization.format' = '1',
 'input.regex' = 
 '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-
9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" 
\"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" 
\"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
 LOCATION 's3://your-alb-logs-directory/AWSLogs/<ACCOUNTID>/elasticloadbalancing/<REGION>/'
 TBLPROPERTIES
 (
 "projection.enabled" = "true",
 "projection.day.type" = "date",
 "projection.day.range" = "2022/01/01,NOW",
 "projection.day.format" = "yyyy/MM/dd",
 "projection.day.interval" = "1",
 "projection.day.interval.unit" = "DAYS",
 "storage.location.template" = "s3://your-alb-logs-directory/AWSLogs/<ACCOUNTID>/elasticloadbalancing/<REGION>/${day}"
 )

alb_logs_2023_12: 쿼리 실행 시 해당 네임으로 테이블 등록이 됩니다.
LOCATION ‘s3://your-alb-logs-directory/AWSLogs/<ACCOUNT
ID>/elasticloadbalancing//’ : 로드 밸런서의 Access Log 가 저장된 S3 Bucket 경로를 기입합니다.
“projection.day.range” = “2022/01/01,NOW” : 2022/01/01 을 분석을 하려는 시작 날짜로 변경합니다.

2-3. AWS Athena 테이블 생성 확인

2-4. 정상 동작 확인

2-4-1. 해당 ALB 로그에 파티션 프로젝션을 사용하여 테이블 쿼리

SELECT * FROM "alb_log_db"."alb_logs_2023_12" limit 10;

아래와 같은 결과를 확인 할 수 있습니다.

2-4-2. 날짜 설정하여 8.8.8.8 해당 IP 대상 유입이 있었는지 확인 (파티션이 아닌 전체 테이블인 경우)

SELECT *
FROM "alb_log_db"."alb_logs_2023_12”
WHERE client_ip = '8.8.8.8
WHERE day >= '2023/12/01’
AND day <= '2023/12/31';

2번 항목 예시에 추가적으로 원하는 항목만 출력될 수 있도록 설정 (파티션 프로젝션 경우)
- 원하는 항목만 출력: client_ip, elb_status_code
- count(*) as count: 결과 집합의 행 수를 count로 제한합니다.
- GROUP BY: SELECT 설명의 출력을 일치하는 값의 행으로 나눕니다.
- ORDER BY: 하나 이상의 출력 expression으로 결과 집합을 정렬합니다.
- ASC: 오름차순 정렬 DESC: 내림차순 정렬

SELECT client_ip, elb_status_code, count(*) as count
FROM "alb_log_db"."alb_logs_2023_12" 
WHERE client_ip = ‘8.8.8.8’
AND time >= '2023-11-30T15:00:00'
AND time <= '2023-12-31T15:00:00'
GROUP BY client_ip, elb_status_code
ORDER BY client_ip desc

여기까지 ‘AWS Athena를 이용하여 ELB 액세스 로그 분석하는 방법’에 대해 소개해드렸습니다. 유익한 정보가 되셨길 바랍니다. 감사합니다.

Written by 정 민아 / SRE실

BESPIN GLOBAL

이 글 공유하기:

관련글