In this week's exercises, your group will try out the various tasks for command and configuration generation using LLMs. Begin by completing the setup parts of the codelab. Then, attempt the exercise your group has been assigned in the following Google Slide presentation:
Add screenshots that you can use to walkthrough how you performed the exercise. Your group will present your results for the exercise during the last hour of class. After completing the exercise you've been assigned, continue to the rest of the exercises in order to prepare for the week's homework assignment.
Data is often delivered from Web APIs via the Javascript Object Notation format or JSON. One tool that is helpful for quickly parsing JSON is jq
. The syntax for the command has a learning curve that may be difficult for some to learn. In this exercise, we'll examine the ability for an LLM to properly analyze a JSON object and generate jq
queries using natural language.
To begin with, install jq
on your VM.
sudo apt install jq
Begin by visiting the Certificate Transparency site at https://crt.sh and lookup the domain kapow.cs.pdx.edu
. Doing so will access the URL https://crt.sh/?q=kapow.cs.pdx.edu and the results will be returned as an HTML page. One can retrieve the same content, but in a JSON format. Within your VM, use the curl
command below to download a JSON version of the results to a file.
curl "https://crt.sh/?q=kapow.cs.pdx.edu&output=json" > kapow.json
The JSON returned has a schema that you would need to parse and understand before being able to write a jq
command to pull out specific information from it. Upload the JSON output or copy it to an LLM and prompt the LLM to generate a jq
command that prints out the certificate ID and common name of every entry.
jq
command that the LLM outputsRun the jq command on the JSON.
Now, ask the LLM to directly output the certificate ID and common name directly.
Next, with the help of an LLM to produce all unique domain names that crt.sh
reports for the domain cs.pdx.edu
One of the benefits of chatbots is that they can provide a more natural interface for search engines. Search engines such as Google come with a rich set of features that advanced users can leverage to zero find specific content. A summary can be found here. Examples include:
"We the people"
to search for documents with this phrase)filetype:pdf
for PDF files, ext:txt
for files with a txt filename extension)site:pdx.edu
for documents on pdx.edu domains, @twitter
for content on social media platform Twitter)inurl:security
), pages with titles containing specific text (intitle:security
), or pages with specific text (intext:disallow
)-filetype:pdf
removes all results that are PDF files)(psychology | computer science) & design
for sites that match psychology design or computer science design)Use an LLM to see if it can perform the same function as Google dork generators by having the LLM generate the dorks below.
"VNC Desktop" inurl:5800
Search index restriction files (robots.txt/robot.txt) indicating sensitive directories that should be disallowed.
(inurl:"robot.txt" | inurl:"robots.txt" ) intext:disallow filetype:txt
inurl:phpmyadmin site:*.pdx.edu
SQL database backup files that have been left on a public web server
intitle:"index of" "database.sql.zip"
intitle:"index of" passwords
" -FrontPage-" ext:pwd inurl:(service | authors | administrators | users)
"Not for Public Release" ext:pdf
Google Cloud's Compute Engine service allows one to set up virtual machines configured with a variety of operating systems and network configurations. As we have done previously for the WFP1 VM at the beginning of this lab, this can be done via the command-line interface provided by the Google Cloud SDK and its gcloud
command.
gcloud compute firewall-rules create pdx-80 \
--allow=tcp:80 --source-ranges="131.252.0.0/16" \
--target-tags=pdx-80 --direction=INGRESS
gcloud compute instances create wfp1-vm \
--machine-type e2-micro --zone us-west1-b \
--image-project pdx-cs \
--image wfp1-nofilter
gcloud compute instances add-tags wfp1-vm --tags=pdx-80 --zone us-west1-b
LLMs can be used to generate configuration files for services and infrastructure, potentially obviating the need to learn the syntax of a configuration language. Consider the nginx
server block configuration for the web site http://mashimaro.cs.pdx.edu below. If properly prompted, an LLM can generate this configuration based if given the appropriate description.
server {
server_name mashimaro.cs.pdx.edu;
listen 80;
root /var/www/html/mashimaro;
index index.html;
location / {
try_files $uri $uri/ =404;
autoindex off;
}
add_header Content-Security-Policy "default-src 'none'; script-src 'self'; connect-src 'self'; img-src 'self'; style-src 'self';" always;
}
LLMs can be used to generate configuration files for Infrastructure as Code systems such as Terraform. Consider the Terraform file below for a Google Cloud Platform deployment consisting of a static IP address and a Compute Engine instance that utilizes it. As part of the deployment, the IP address is output back to the user.
provider "google" {
credentials = file("tf-lab.json")
project = "YOUR_PROJECT_ID"
region = "us-west1"
}
resource "google_compute_address" "static" {
name = "ipv4-address"
}
resource "google_compute_instance" "default" {
name = "tf-lab-vm"
machine_type = "e2-medium"
zone = "us-west1-b"
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-2204-jammy-v20240501"
}
}
network_interface {
network = "default"
access_config {
nat_ip = google_compute_address.static.address
}
}
}
output "ip" {
value = google_compute_instance.default.network_interface.0.access_config.0.nat_ip
}
Another Infrastructure as Code approach is Kubernetes. With Kubernetes, one creates a logical specification of different services that run an application and the Kubernetes controller deploys it to a cluster of machines. Consider the Kubernetes file below for a web application deployment on Google Cloud Platform. The file specifies the container image to run, the number of replicas of the container image to run, and a load balancer to route requests to the replicas.
apiVersion: v1
kind: ReplicationController
metadata:
name: guestbook-replicas
spec:
replicas: 3
template:
metadata:
labels:
app: guestbook
tier: frontend
spec:
containers:
- name: guestbook-app
image: gcr.io/YOUR_PROJECT_ID/gcp_gb
env:
- name: PROCESSES
value: guestbook
- name: PORT
value: "8000"
ports:
- containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
name: guestbook-lb
labels:
app: guestbook
tier: frontend
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8000
selector:
app: guestbook
tier: frontend