07.1: Exercises: Commands and Configurations

In this week's exercises, your group will try out the various tasks for command and configuration generation using LLMs. Begin by completing the setup parts of the codelab. Then, attempt the exercise your group has been assigned in the following Google Slide presentation:

Week 7 slides

Add screenshots that you can use to walkthrough how you performed the exercise. Your group will present your results for the exercise during the last hour of class. After completing the exercise you've been assigned, continue to the rest of the exercises in order to prepare for the week's homework assignment.

Data is often delivered from Web APIs via the Javascript Object Notation format or JSON. One tool that is helpful for quickly parsing JSON is jq. The syntax for the command has a learning curve that may be difficult for some to learn. In this exercise, we'll examine the ability for an LLM to properly analyze a JSON object and generate jq queries using natural language.

To begin with, install jq on your VM.

sudo apt install jq

Begin by visiting the Certificate Transparency site at https://crt.sh and lookup the domain kapow.cs.pdx.edu. Doing so will access the URL https://crt.sh/?q=kapow.cs.pdx.edu and the results will be returned as an HTML page. One can retrieve the same content, but in a JSON format. Within your VM, use the curl command below to download a JSON version of the results to a file.

curl "https://crt.sh/?q=kapow.cs.pdx.edu&output=json" > kapow.json

The JSON returned has a schema that you would need to parse and understand before being able to write a jq command to pull out specific information from it. Upload the JSON output or copy it to an LLM and prompt the LLM to generate a jq command that prints out the certificate ID and common name of every entry.

Show the jq command that the LLM outputs

Run the jq command on the JSON.

Does the command return the correct results?

Now, ask the LLM to directly output the certificate ID and common name directly.

Does it produce the same results?

Next, with the help of an LLM to produce all unique domain names that crt.sh reports for the domain cs.pdx.edu

How many unique hosts have registered certificates within this subdomain?

One of the benefits of chatbots is that they can provide a more natural interface for search engines. Search engines such as Google come with a rich set of features that advanced users can leverage to zero find specific content. A summary can be found here. Examples include:

Multi-word string search (e.g. "We the people" to search for documents with this phrase)
Content restricted searches (filetype:pdf for PDF files, ext:txt for files with a txt filename extension)
Site restricted searches (site:pdx.edu for documents on pdx.edu domains, @twitter for content on social media platform Twitter)
Text restricted searches for URLs containing specific text (inurl:security), pages with titles containing specific text (intitle:security), or pages with specific text (intext:disallow)
Negation operator (e.g. -filetype:pdf removes all results that are PDF files)
Logical operators (e.g. (psychology | computer science) & design for sites that match psychology design or computer science design)

Use an LLM to see if it can perform the same function as Google dork generators by having the LLM generate the dorks below.

Virtual desktop endpoints

"VNC Desktop" inurl:5800

Restricted directories and paths

Search index restriction files (robots.txt/robot.txt) indicating sensitive directories that should be disallowed.

(inurl:"robot.txt" | inurl:"robots.txt" ) intext:disallow filetype:txt

Exposed database administration interface

inurl:phpmyadmin site:*.pdx.edu

Unprotected database backups

SQL database backup files that have been left on a public web server

intitle:"index of" "database.sql.zip"

Credentials

intitle:"index of" passwords

" -FrontPage-" ext:pwd inurl:(service | authors | administrators | users)

Secret files

"Not for Public Release" ext:pdf

Google Cloud's Compute Engine service allows one to set up virtual machines configured with a variety of operating systems and network configurations. As we have done previously for the WFP1 VM at the beginning of this lab, this can be done via the command-line interface provided by the Google Cloud SDK and its gcloud command.

gcloud compute firewall-rules create pdx-80 \
  --allow=tcp:80 --source-ranges="131.252.0.0/16" \
  --target-tags=pdx-80 --direction=INGRESS

gcloud compute instances create wfp1-vm \
  --machine-type e2-micro --zone us-west1-b \
  --image-project pdx-cs \
  --image wfp1-nofilter

gcloud compute instances add-tags wfp1-vm --tags=pdx-80 --zone us-west1-b

Use an LLM to explain each command
Then, in a new session, generate a prompt that an LLM can be given that reproduces the commands. What LLM and prompt gets you the closest to implementing the above commands?
Would you feel comfortable having an LLM deploy infrastructure based on a user prompt?

LLMs can be used to generate configuration files for services and infrastructure, potentially obviating the need to learn the syntax of a configuration language. Consider the nginx server block configuration for the web site http://mashimaro.cs.pdx.edu below. If properly prompted, an LLM can generate this configuration based if given the appropriate description.

server {
    server_name mashimaro.cs.pdx.edu;
    listen 80;
    root /var/www/html/mashimaro;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
        autoindex off;
    }

    add_header Content-Security-Policy "default-src 'none'; script-src 'self'; connect-src 'self'; img-src 'self'; style-src 'self';" always;
}

Use an LLM to explain each line of the configuration
Then, in a new session, generate a prompt that an LLM can be given that reproduces the configuration. What LLM and prompt gets you the closest to implementing the above configuration?
Would you feel comfortable having an LLM configure infrastructure based on a user prompt?

LLMs can be used to generate configuration files for Infrastructure as Code systems such as Terraform. Consider the Terraform file below for a Google Cloud Platform deployment consisting of a static IP address and a Compute Engine instance that utilizes it. As part of the deployment, the IP address is output back to the user.

provider "google" {
  credentials = file("tf-lab.json")
  project     = "YOUR_PROJECT_ID"
  region      = "us-west1"
}

resource "google_compute_address" "static" {
  name = "ipv4-address"
}

resource "google_compute_instance" "default" {
  name         = "tf-lab-vm"
  machine_type = "e2-medium"
  zone         = "us-west1-b"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-jammy-v20240501"
    }
  }

  network_interface {
    network = "default"
    access_config {
      nat_ip = google_compute_address.static.address
    }
  }
}

output "ip" {
  value = google_compute_instance.default.network_interface.0.access_config.0.nat_ip
}

Use an LLM to explain each line of the configuration
Then, in a new session, generate a prompt that an LLM can be given that reproduces the configuration. What LLM and prompt gets you the closest to implementing the above configuration?
Would you feel comfortable having an LLM configure infrastructure based on a user prompt?

Another Infrastructure as Code approach is Kubernetes. With Kubernetes, one creates a logical specification of different services that run an application and the Kubernetes controller deploys it to a cluster of machines. Consider the Kubernetes file below for a web application deployment on Google Cloud Platform. The file specifies the container image to run, the number of replicas of the container image to run, and a load balancer to route requests to the replicas.

apiVersion: v1
kind: ReplicationController
metadata:
  name: guestbook-replicas
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: guestbook
        tier: frontend
    spec:
      containers:
      - name: guestbook-app
        image: gcr.io/YOUR_PROJECT_ID/gcp_gb
        env:
        - name: PROCESSES
          value: guestbook
        - name: PORT
          value: "8000"
        ports:
        - containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
  name: guestbook-lb
  labels:
    app: guestbook
    tier: frontend
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8000
  selector:
    app: guestbook
    tier: frontend

Use an LLM to explain each line of the configuration
Then, in a new session, generate a prompt that an LLM can be given that reproduces the configuration. What LLM and prompt gets you the closest to implementing the above configuration?
Would you feel comfortable having an LLM configure infrastructure based on a user prompt?