Log Review (ipynb)#

Google Log Explorer is a more than acceptable platform for reviewing logs. However, it lacks table view of log events as well as field aggregation (think stats count by in Splunk) that might be useful during investigations.

Below serves only as a demonstration of how table view and field aggregation can be achieved outside of Google Log Explorer, and it is NOT meant to be used in production (or cases).

Install Dependencies#

Install the dependencies ipywidgets and pandas. Skip the next cell if they had already been installed.

!pip3 install ipywidgets pandas

Imports and Configuration#

import ipywidgets as widgets
import json
import os
import pandas as pd

from IPython.display import HTML, display

# extend width of widgets
display(HTML('''<style>
    .widget-label { min-width: 18ex !important; font-weight:bold; }
</style>'''))
# extend width of cells
display(HTML("<style>.container { width:100% !important; }</style>"))
display(HTML("<style>.output_result { max-width:100% !important; }</style>"))

# extend width and max rows of pandas output
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
# [OPTIONAL] authenticate using your service account
!gcloud auth activate-service-account --key-file <json_key_file>

Query Logs#

Specify the following information

Fields

Description

Source Project

Project id of target project (that contains potentially compromised resource)

Resource Type

Resource type of logs to review

Start Date

Start date of time period of logs to review

Start Time

Start time of time period of logs to review

End Date

End date of time period of logs to review

End Time

End time of time period of logs to review

# create UI for user input
src_project = widgets.Text(description="Source Project: ", disabled=False)
resource_type = widgets.Dropdown(
    options=['bigquery_dataset', 'bigquery_resource', 'cloudsql_database', 'cloud_function', 'gce_backend_service', 'gce_disk', 'gce_firewall_rule', 'gce_instance', 'gce_instance_group', 'gce_instance_group_manager', 'gce_router', 'gce_snapshot', 'gcs_bucket', 'gke_cluster', 'http_load_balancer', 'k8s_cluster', 'k8s_container', 'k8s_node', 'k8s_pod', 'logging_sink', 'network_security_policy', 'project', 'vpn_gateway'], 
    value='gce_instance', 
    description="Resource Type: ", 
    disabled=False)
start_date = widgets.DatePicker(description='Start Date: ', disabled=False)
start_time = widgets.Text(value='hh:mm', description="Start Time (UTC): ", disabled=False)
end_date = widgets.DatePicker(description='End Date: ', disabled=False)
end_time = widgets.Text(value='hh:mm', description="End Time (UTC): ", disabled=False)

display(src_project, resource_type, start_date, start_time, end_date, end_time)
# set environment variables and construct query
os.environ['SRC_PROJECT'] = src_project.value
os.environ['QUERY'] = 'resource.type=' + resource_type.value + ' AND timestamp>="' + str(start_date.value) + 'T' + start_time.value + ':00Z"' + ' AND timestamp<="' + str(end_date.value) + 'T' + end_time.value + ':00Z"'
# request for log events that satisfy the query, limiting to 100 events (change as deem fit)
!gcloud logging read "$QUERY" --project $SRC_PROJECT --limit=100 --format=json > temp_logs.json

# store results into dataframe
with open('./temp_logs.json') as infile:
    log_results = json.load(infile)
log_results_df = pd.json_normalize(log_results)
display(log_results_df)
# aggregate values of a specified field (pprotoPayload.methodName in this case)
log_results_df['protoPayload.methodName'].value_counts()