07.1: Command and Configuration Generation

Data is often delivered from Web APIs via the Javascript Object Notation format or JSON. One tool that is helpful for quickly parsing JSON is jq. The syntax for the command has a learning curve that may be difficult for some to learn. In this exercise, we'll examine the ability for an LLM to properly analyze a JSON object and generate jq queries using natural language.

To begin with, install jq on your VM.

sudo apt install jq

Begin by visiting the Certificate Transparency site at https://crt.sh and lookup the domain kapow.cs.pdx.edu. Doing so will access the URL https://crt.sh/?q=kapow.cs.pdx.edu and the results will be returned as an HTML page. One can retrieve the same content, but in a JSON format. Within your VM, use the curl command below to download a JSON version of the results to a file.

curl "https://crt.sh/?q=kapow.cs.pdx.edu&output=json" > kapow.json

The JSON returned has a schema that you would need to parse and understand before being able to write a jq command to pull out specific information from it. Upload the JSON output or copy it to an LLM and prompt the LLM to generate a jq command that prints out the certificate ID and common name of every entry.

Show the jq command that the LLM outputs

Run the jq command on the JSON.

Does the command return the correct results?

Now, ask the LLM to directly output the certificate ID and common name directly.

Does it produce the same results?

Next, with the help of an LLM to produce all unique domain names that crt.sh reports for the domain cs.pdx.edu

How many unique hosts have registered certificates within this subdomain?

One of the benefits of chatbots is that they can provide a more natural interface for search engines. Search engines such as Google come with a rich set of features that advanced users can leverage to zero find specific content. A summary can be found here. Examples include:

Multi-word string search (e.g. "We the people" to search for documents with this phrase)
Content restricted searches (filetype:pdf for PDF files, ext:txt for files with a txt filename extension)
Site restricted searches (site:pdx.edu for documents on pdx.edu domains, @twitter for content on social media platform Twitter)
Text restricted searches for URLs containing specific text (inurl:security), pages with titles containing specific text (intitle:security), or pages with specific text (intext:disallow)
Negation operator (e.g. -filetype:pdf removes all results that are PDF files)
Logical operators (e.g. (psychology | computer science) & design for sites that match psychology design or computer science design)

Use an LLM to see if it can perform the same function as Google dork generators by having the LLM generate the dorks below.

Virtual desktop endpoints

"VNC Desktop" inurl:5800

Restricted directories and paths

Search index restriction files (robots.txt/robot.txt) indicating sensitive directories that should be disallowed.

(inurl:"robot.txt" | inurl:"robots.txt" ) intext:disallow filetype:txt

Exposed database administration interface

inurl:phpmyadmin site:*.pdx.edu

Unprotected database backups

SQL database backup files that have been left on a public web server

intitle:"index of" "database.sql.zip"

Credentials

intitle:"index of" passwords

" -FrontPage-" ext:pwd inurl:(service | authors | administrators | users)

Secret files

"Not for Public Release" ext:pdf

One of the benefits of using an LLM is its ability to use its broad knowledge base to explain code and commands that a particular user may not understand. Consider the following set of commands for configuring rules using iptables, a network firewall tool for Linux.

iptables -A INPUT -i eth0 -p tcp -m multiport --dports 22,80,443 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -o eth0 -p tcp -m multiport --sports 22,80,443 -m state --state ESTABLISHED -j ACCEPT

Prompt an LLM for a concise summary of the two rules that the command creates.

Take a screenshot of the result that includes your OdinID

Then, prompt an LLM for a prompt that could reproduce the commands exactly. Using that prompt as a basis, create a prompt that reproduces the above commands verbatim.

Take a screenshot of the prompt and result that includes your OdinID

Google Cloud's Compute Engine service allows one to set up virtual machines configured with a variety of operating systems and network configurations. As we have done previously for the WFP1 VM at the beginning of this lab, this can be done via the command-line interface provided by the Google Cloud SDK and its gcloud command.

gcloud compute firewall-rules create pdx-80 \
  --allow=tcp:80 --source-ranges="131.252.0.0/16" \
  --target-tags=pdx-80 --direction=INGRESS

gcloud compute instances create wfp1-vm \
  --machine-type e2-micro --zone us-west1-b \
  --image-project pdx-cs \
  --image wfp1-nofilter

gcloud compute instances add-tags wfp1-vm --tags=pdx-80 --zone us-west1-b

Use an LLM to explain each command
Then, in a new session, generate a prompt that an LLM can be given that reproduces the commands. What LLM and prompt gets you the closest to implementing the above commands?
Would you feel comfortable having an LLM deploy infrastructure based on a user prompt?

Consider the following nginx configuration for a web server in https://mashimaro.cs.pdx.edu .

server {
      server_name mashimaro.cs.pdx.edu;
      root /var/www/html/mashimaro;
      index index.html;
      location / {
              try_files $uri $uri/ =404;
      }
      listen 443 ssl;
      ssl_certificate /etc/letsencrypt/live/mashimaro/fullchain.pem;
      ssl_certificate_key /etc/letsencrypt/live/mashimaro/privkey.pem;
      include /etc/letsencrypt/options-ssl-nginx.conf;
      ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
}

server {
      if ($host = mashimaro.cs.pdx.edu) {
          return 301 https://$host$request_uri;
      }
      server_name mashimaro.cs.pdx.edu;
      listen 80;
      return 404;
}

Prompt an LLM for a concise line-by-line summary of the configuration above.

Take a screenshot of the result that includes your OdinID

Then, prompt an LLM for a prompt that could reproduce the configuration exactly. Using that prompt as a basis, create a prompt that reproduces the above configuration verbatim.

Take a screenshot of the prompt and result that includes your OdinID

Terraform and other infrastructure-as-code solutions provide a way of declaratively defining infrastructure that can then be deployed in a reliable, reproducible manner. Consider the Terraform specification file below that deploys a single virtual machine on Google Cloud Platform.

provider "google" {
  credentials = file("tf-lab.json")
  project     = "YOUR_PROJECT_ID"
  region      = "us-west1"
}

resource "google_compute_address" "static" {
  name = "ipv4-address"
}

resource "google_compute_instance" "default" {
  name         = "tf-lab-vm"
  machine_type = "e2-medium"
  zone         = "us-west1-b"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-jammy-v20240501"
    }
  }

  network_interface {
    network = "default"
    access_config {
      nat_ip = google_compute_address.static.address
    }
  }
}

output "ip" {
  value = google_compute_instance.default.network_interface.0.access_config.0.nat_ip
}

Prompt an LLM for a concise line-by-line summary of the configuration above.

Take a screenshot of the result that includes your OdinID

Then, prompt an LLM for a prompt that could reproduce the configuration exactly. Using that prompt as a basis, create a prompt that reproduces the above configuration verbatim.

Take a screenshot of the prompt and result that includes your OdinID

Docker containers, which can be seen as virtual operating systems, are often used to deploy services in cloud environments. Containers are instantiated from images that are specified and built from a Dockerfile configuration. For beginners, parsing a configuration can be difficult. An LLM can potentially aid in understanding such files. Below is a Dockerfile for a multi-stage container build.

FROM python:3.5.9-alpine3.11 as builder

COPY . /app

WORKDIR /app

RUN pip install --no-cache-dir -r requirements.txt && pip uninstall -y pip && rm -rf /usr/local/lib/python3.5/site-packages/*.dist-info README

FROM python:3.5.9-alpine3.11

COPY --from=builder /app /app

COPY --from=builder /usr/local/lib/python3.5/site-packages/ /usr/local/lib/python3.5/site-packages/

WORKDIR /app

ENTRYPOINT ["python3","app.py"]

Prompt an LLM for a concise summary of the configuration above.

Take a screenshot of the result that includes your OdinID

Then, prompt an LLM for a prompt that could reproduce the configuration exactly. Using that prompt as a basis, create a prompt that reproduces the above configuration verbatim.

Take a screenshot of the prompt and result that includes your OdinID

Kubernetes is a system for declaratively specifying infrastructure, deploying it, and maintaining its operation, often using containers and container images. Below is a simple configuration for a web application.

apiVersion: v1
kind: ReplicationController
metadata:
  name: guestbook-replicas
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: guestbook
        tier: frontend
    spec:
      containers:
      - name: guestbook-app
        image: gcr.io/YOUR_PROJECT_ID/gcp_gb
        env:
        - name: PROCESSES
          value: guestbook
        - name: PORT
          value: "8000"
        ports:
        - containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
  name: guestbook-lb
  labels:
    app: guestbook
    tier: frontend
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8000
  selector:
    app: guestbook
    tier: frontend

Prompt an LLM for a concise summary of the configuration above.

Take a screenshot of the result that includes your OdinID

Then, prompt an LLM for a prompt that could reproduce the configuration exactly. Using that prompt as a basis, create a prompt that reproduces the above configuration verbatim.

Take a screenshot of the prompt and result that includes your OdinID