Identifying XML External Entity: How Tenable.io Web Application Scanning Can Help
XML External Entity (XXE) flaws present unique mitigation challenges and remain a common attack path. Learn how XXE flaws arise, why some common attack paths are so challenging to mitigate and how Tenable.io Web Application Scanning can help.
Modern applications have more and more tendencies to process data from user-supplied inputs, directly or indirectly. A side effect of this data processing opens the door for attackers to exploit XML External Entity, a complex vulnerability with many vectors.
An XXE is a web vulnerability that allows an attacker to interfere with a feature that performs XML processing. If exploited, it would allow an attacker to read files on the system and to interact with other systems with which the application itself can interact. An XXE attack can have devastating effects and is often considered critical.
Although XXE was ranked No. 4 in the 2017 OWASP Top 10 — and even had a category named after it — this vulnerability has lost some of its relevance in the intervening years as the libraries used to parse XML have become increasingly robust. The 2021 OWASP TOP 10 placed XXE at No. 5, however it is no longer a standalone category, as it was merged with the “Security Misconfiguration” category. This fusion of categories on the OWASP list reflects how an increasing reliance on third-party libraries and software to perform processing — and the lax configuration of these libraries — has become a leading cause of XXE vulnerabilities.
What is XXE?
XXE is a vulnerability that allows an attacker to abuse an application's XML parser by sending a malicious document or by modifying a request that already contains XML.
XXE vulnerabilities are most commonly used to read files on a system. However, this vulnerability can also be exploited for Denial Of Services (DoS) or Server Side Request Forgery (SSRF) attacks.
An XXE can be used by an attacker to retrieve resources, such as configuration files containing credentials and other secrets, or to access internal services that may be sensitive. The exploitation of this vulnerability can therefore allow an attacker to pivot from XXE to SSRF.
Before understanding how to detect an XXE, it is important to understand the syntax of an XML envelope.
The XML envelope represents the payload sent. At the beginning of this envelope is the DOCTYPE which defines the set of rules and properties that the XML document must follow.
This is where an attacker can define either an internal or an external entity to be called.
- An Internal Entity : If an entity is declared within a Document Type Definition (DTD) it is called an internal entity.
- Syntax: <!ENTITY entity_name "entity_value">
- An External Entity: If an entity is declared outside a DTD it is called an external entity (Identified by SYSTEM).
- Syntax: <!ENTITY entity_name SYSTEM "entity_value">
While an external entity represents the heart of the operation, an internal entity is still the easiest way for an attacker — or a security practitioner — to detect a possible injection.
The exploitation of an XXE is mainly made possible due to DTDs which, in some cases, are activated by default.
Java seems to be the language that has the most libraries with DTDs enabled by default. In our experience, it’s common for an attacker to detect an XXE on a Java application using an XML parser (like Javax XML Bind library). In many other languages, DTDs must be explicitly activated. Many sites are therefore vulnerable due to a default configuration or because the site owner is unaware that the library in use enables this option by default.
How to detect XXE
In the following example, we show how an attacker using the internal entity will replace the content of the variable when the server parses the document.
This technique has no impact, it simply confirms that the DTDs are activated, but it can only work in the case where the server parsing the request returns a response to the user.
The complexity of an XXE lies in the many possible, and sometimes exotic, injection points.
A page that allows a user to send a Microsoft Office document, for example, is the kind of thing that immediately alerts an attacker to the possible presence of the vulnerability.
Behind the extension '.docx' or '.xlsx' hides a simple zip archive containing many files in XML format.
In the above example, it would be enough for an attacker to modify the file “xl/workbook.xml” to add your XXE payload then to recompress the files in zip with the extension .xslx.
The payload would be executed during the processing of the XML file(s).
Other less logical cases can, however, be present. On a file upload, for example, it is often common for an attacker to try to exploit this feature by sending an arbitrary file allowing them to trigger Cross-Site Scripting (XSS) or upload a WebShell.
But, in the case of a photo (taken directly from a camera or a smartphone), an application could try to parse the Extensible Metadata Platform (XMP) data of the image, which is actually XML, and can therefore also contain a payload.
Three common exploits for XXE
XXE provides attackers with multiple exploitation options. Three examples of common attack paths are:
- Read arbitrary files on a server
- Direct output in the target application response
- Via an out-of-band interaction (blind injection)
- Perform a DoS
- Perform a SSRF through XXE
Read arbitrary files on a server
Going back to our example above, now that an attacker has confirmed the injection, they can try to read arbitrary files on the server.
We define an external entity identified by the SYSTEM directive and the payload 'file:///etc/passwd', which will be executed when parsing the XML document. The end result is that the contents of the file in the payload will be returned to the user. An attacker could exploit this vulnerability in order to read configuration files that could allow them to obtain additional access in order to compromise an organization.
An alternative to this technique is to force an error in the application's XML parser to make it display the contents of a file.
An attacker would force the third party application to load an external DTD that contains the malicious payload, which will force an error in the application by asking it for a file that does not exist.
When the error is displayed, the application will show the contents of the file requested by the attacker.
One common mitigation technique is to apply a strict network filtering to prevent outbound connections. It’s a technically complicated technique; without diving into the details in this post, suffice to say it is possible to use a DTD already present on the system. For example, using software such as Tomcat, VMWare or Nmap, it will then be possible to call it in our payload and redefine on the fly parameters inside the loaded DTD.
In this example, our vulnerable machine has Nmap installed which includes the “/usr/share/nmap/nmap.dtd” file, which can be used for the attack.
We can also use the "out-of-band" (blind) technique in order to retrieve the content of a document and send it to an external server. This is useful in cases where the attacker is not able to solicit a response or error message from a server because the request is then processed by another service which returns a generic response.
Rather than confirming the possibility of injection with an internal entity, we will directly use an external entity and try to perform an HTTP call on our remote server.
Once confirmed, we will rely on the call of an external DTD which will contain the payload to execute.
The call to this external DTD will be executed by the server and will allow it to retrieve the content of a file and to send it to the attacker's server.
Denial of Service
Denial of service via XXE is specifically called XML Entity Expansion because this attack takes advantage of calling several entities recursively.
As recursion is a costly operation at CPU level, the more entities are called, the longer the computation time will be.
The objective of the attack is to make a large number of recursive calls in order to crash the application or the server itself.
Example :
- The entity "a2" recursively calls “a1” which itself calls “a0”. Through “a2” we call “a1” 15 times
- The operation here is inexpensive in CPU time and is executed quickly causing no problems
- Now “a6” calls “a5” which calls “a4” and so on recursively
- The operation becomes expensive in CPU time and the application starts to have lag and takes time to respond
This attack is therefore very easy to automate to crash the server in a loop.
Server-Side Request Forgery
SSRF attacks have already been explained in a previous blog post, so without diving into technical details the important takeaway is that it is possible to perform an SSRF through XXE.
Instead of trying to read a file on the system, it is possible to specify the URL of an internal server in order to make the application perform a request to it.
When parsing the XML envelope, the server itself will make the request to the target server. An attacker can therefore have access to internal resources or folders that would not be available from the outside, like an internal vulnerable web application framework manager, which could lead to remote code execution (RCE).
Prevention and mitigation strategies
There are multiple best practices available to protect web applications and users in order to avoid vulnerabilities like XXE, but the safest way to prevent XXE is always to disable DTDs (External Entities) completely.
If it is not possible to disable DTDs completely, then external entities and external document type declarations must be disabled in the way that's specific to each parser.
If this is not possible, it is necessary to put in place in-depth defense mechanisms such as:
- Sanitize user-supplied inputs: In general, it is always recommended to filter user entries. If the web application can only send a request to identified and trusted applications, one possible countermeasure is to apply the allowlist approach. In the event that the applications in question are not known, input validation techniques can be added to ensure that the input string respects the expected format.
- If possible, verify if the data received has the valid and expected format. When possible, validation should be done through available libraries, because regexes for complex formats are difficult to maintain and are error-prone.
- Filter outgoing traffic to not allow loading of external DTDs.
- Do not display application error messages to the user but rather display generic messages.
Use Tenable.io Web Application Scanning to detect XML External Entity flaws
Tenable.io Web Application Scanning helps identify XXE vulnerabilities through the classique scan & API scan feature, including the following dedicated plugins:
- Plugin 98113 can detect generic XXE (Blind & Non blind) issues and helps identify commonly associated XXE vulnerabilities.
- Plugins 98886, 98897 can detect various Apache SOLR XXE vulnerabilities.
- Plugin 113199 can detect Jolokia is a JMX-HTTP bridge XXE vulnerability.
Get more information
- Tenable.io Web App Scanning
- OWASP - Server Side Request Forgery Prevention Cheat Sheet
- All WAS XXE Plugins
All Icons are made by www.flaticon.com
Related Articles
- Threat Intelligence
- Threat Management
- Vulnerability Management
- Vulnerability Scanning