Secure Code Review - Path Traversal Bugs
Table of Contents
Introduction #
- In the second part of the secure code review series, we look at path traversal bugs. A pretty simple bug to exploit with a very high impact on vulnerable systems.
- The path/directory traversal bug allows an attacker to read arbitrary files on the server hosting the application. The impact may lead to the loss of sensitive information like credentials, customer data etc
- Let us look into some vulnerable Python code and see how this issue arises.
Sample Vulnerable Code #
- Can you spot where the issue is?
class MyServer(BaseHTTPRequestHandler):
def do_GET(self):
cookies = SimpleCookie(self.headers.get('Auth-Token'))
if cookies.get('auth_id'):
username=open(cookies.get('auth_id').value).readlines()[0]
else:
username='guest'
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes("<html><head><title>Welcome</title></head>", "utf-8"))
self.wfile.write(bytes("<body>", "utf-8"))
self.wfile.write(bytes("<h1>Hello %s</h1>" % username, "utf-8"))
self.wfile.write(bytes("<p>This is a protected area, please provide valid token to access</p>", "utf-8"))
self.wfile.write(bytes("</body></html>", "utf-8"))
Overview #
- Let us look at what the application does, discuss a technique for tracing user input and then identify our vulnerability.
Functionality #
- We have one class containing a method called
do_GET
- From the python documentation, the method works by mapping a request to a local file by interpreting the request as a path relative to the working directory
- This method is contained with the
http.server
in-built module - We begin by retrieving the value of the header ‘Auth-Token’ and creating a
SimpleCookie
object - We then check if there is a cookie named
auth_id
in the request headers by using theget()
method of the SimpleCookie object. When such a cookie is present, the open functionopens a file with the value of the cookie as its name
, reads the first line of that file, and assigns it to the username variable. When the cookie is not present, the username variable is assigned the value ofguest
.
Source and Sink #
- This terminology is commonly used in data flow analysis and can be applied to the analysis of code. A source is where data comes from while a sink is where data ends. Therefore, we approach our analysis by looking for
any areas of the application where a user can input data,
then look at how the data is being handled. Spot anydangerous functions
that handle the user input. Are there any ways of exploiting these functions?
Vulnerability #
- Let us apply the above method in finding the issue with the code
- We have this line where we get the header of Auth-Token.
cookies = SimpleCookie(self.headers.get('Auth-Token'))
A user can change the value of the Auth-Token ID and supply the request to the server with their own data. But this is not helpful because that data is not processed by any dangerous function in the code.
The other part of the code where we receive user input is
username=open(cookies.get('auth_id').value).readlines()[0]
We are getting the value of the auth_id cookie. This data can be modified by a user before sending it to the server. Being a point of input, let us look at any dangerous functions in use
The data is passed to the
open
function thatopens a file and returns a file object
. Because we can edit into the cookie value, it makes it possible thatwe can supply a file outside the web root directory
in the server and a file object will be returned and we can read information from the serverIn a nutshell, that is how such a bug can occur in an application. How can this be exploited?
Attack principle #
- We modify a request being sent to the server and change the “auth_id” cookie to a value such as
'../../../etc/passwd'
- The server processes the request by parsing that value and using the
open
function to load the directory. The../
character allows the attacker to move upward the directory tree - The object is accessed and returned to the body
- The example below shows how the attack works in real-time. Using the python3 IDLE terminal, we can use the open function to read the etc passwd directory as shown:
>>> open('../../../etc/passwd').readlines()[0]
'root: x:0:0:root:/root:/usr/bin/zsh\n'
>>>
Impact #
Arbitrary File read
: An attacker can read files on the system with the permissions the web server is running on. This could lead to the disclosure of sensitive infoRemote Code Execution
: Typically, this vulnerability exists commonly in applications that allow one to upload files. An attacker could potentially upload malware to a server and use the path traversal vuln to execute the file gaining remote code execution on a server
Mitigations #
- Input validation/ Sanitizing user input
- In this method, we patch the issue by sanitizing any foreign characters that may be sent as part of the cookie such as the
../
- A good way to implement input validation is by
combining it with different security measures
- In the example below, we filter the
..
by ensuring those characters do not appear in the cookie:
import os
class MyServer(BaseHTTPRequestHandler):
def do_GET(self):
cookies = SimpleCookie(self.headers.get('Auth-Token'))
if cookies.get('auth_id'):
auth_id = cookies.get('auth_id').value
if ".." in auth_id or not auth_id.isalnum():
self.send_response(400)
.
.
.
return
username=open(auth_id).readlines()[0]
else:
username='stranger'
self.send_response(200)
.
.
.
URL encoded characters like %2e%2e which represents a dot(.)
and other methods- Input validating for those URL-encoded characters and other potential bypasses
is not enough
because attackers are constantly finding new creative ways to bypass filters
- Whitelisting
- This makes it way harder to perform a path traversal because only a select number of paths are allowed. It prevents one from accessing any other files. In the example below we use an array called whitelist which contains the directories the user can access.
class MyServer(BaseHTTPRequestHandler):
def do_GET(self):
cookies = SimpleCookie(self.headers.get('Cookie'))
if cookies.get('session_id'):
session_id = cookies.get('session_id').value
whitelist = ["/path/to/file1", "/path/to/file2", "/path/to/file3"]
if session_id not in whitelist:
self.send_response(404)
.
.
.
return
username=open(session_id).readlines()[0]
else:
username='stranger'
- Principle of least privilege
- Ensure the web server user is running with only the minimum necessary privileges. This reduces the impact of the vulnerability.
- Sandboxing
- Sandbox environments are hardened environments that create a string boundary between the running programs and the operating system.
- Even if an attacker were to gain access, they won’t be able to reach other areas of the system, network, OS etc
Conclusion #
- The path traversal bug is very simple to exploit and is still occurring in modern applications. We discussed a few use cases but there’s still a tonne of possible scenarios that may lead to this bug.
- This has been captured in the CVE details website, with cases stemming as late as 26th January 2023