url_ripper – URL Extraction
The url_ripper module is designed to be a comprehensive toolkit for DNS-based content correlation. The url_ripper functions as a normal DNS block list (DNSBL) on the connecting IP address as well as a right-hand side block list (RHSBL) on the domain of the envelope sender, the domains of email addresses in any headers, the URLs contained in the email body (even in transfer encoded content), and content that has been obfuscated with HTML encoding.
Additionally, the URLs and domains found can be resolved to IP addresses and those IP addresses will be looked up in the DNSBL.
Note
To use the url_ripper module, you must choose to install the Policy Tools suite during installation.
The following is an example configuration:
url_ripper "url_ripper1" { base = "multi.surbl.org" bits = [ 0.0.0.1 = "list1_hits" 0.0.0.2 = "list2_hits" 0.0.0.4 = "list3_hits" 0.0.0.8 = "list4_hits" 0.0.0.16 = "list5_hits" 0.0.0.32 = "list6_hits" 0.0.0.64 = "list7_hits" 0.0.1.0 = "list8_hits" ] values = [ 127.0.0.2 = "simple_hits" ] address_headers = ( "Return-Path" "From" "Sender" "Reply-To" "Errors-To") }
Note
This module no longer supports the checklist_suppress_hostnames
and checklist_suppress_ips
options, which were dependent upon the deprecated checklist
module. You can replace this functionality with Lua datasource functions. For more information, see “ds_core - Datasource Query Core” and msys.dp_config.whitelist.
The following are the configuration options defined within this module:
- address_headers
-
Explicitly specifies headers from which emails (and in turn mailbox domains) should be extracted.
- base
-
Describes the base domain under which prospects should be resolved.
- bits
-
Allows for multi-value lists to be used. If a bitwise AND between the provided key and the list-resolved IP address in question is non-zero, then the key is considered a match and the context key associated with the value is incremented by one.
- forward
-
If set to
false
, this option disables the conversion of hostnames to IPs and the DNSBL lookups of those IPs. Default value istrue
. - max_lookups
-
Limits the number of DNSBL lookups acting as a brake in case of a possible denial of service attack. Default value is
100
. - values
-
Traditional "exact match" style list check. If the list-resolved IP address in question exactly matches the key, the context key associated with the value is incremented by one.
Use of the url_ripper module is complicated, mostly due to abusers using complicated methods to avoid detection. As such, an example of operation is warranted. In this example, multi.surbl.org will be the DNSBL base domain.
<<< 220 server ESMTP ecelerity 1.2 (r4169) Mon, 16 May 2005 09:45:48 -0400 >>> EHLO sender.example.com <<< 250-edge.omniti.com says EHLO to 192.0.2.100 <<< 250-8BITMIME <<< 250 PIPELINING
At connect, the module will resolve 100.2.0.192.multi.surbl.org to an IP address. This resulting IP address will be processed through the configuration, and the appropriate context variables will be updated to reflect any matches.
>>> MAIL FROM:<sender@mail.example.com>
<<< 250 MAIL FROM accepted
Now the mailbox domain mail.example.com is found in the envelope sender. The base domain example.com is mapped to example.com.multi.surbl.org, is resolved to an IP, and is processed. Additionally, mail.example.com is resolved to an IP address, and that IP is reversed and looked up, just as was the connecting IP address.
>>> RCPT TO:<test@test.omniti.com> <<< 250 RCPT TO accepted. >>> DATA <<< 354 continue. finished with "\r\n.\r\n" >>> From: "Abuser" <superabuser@superabuser.com> Subject: Abuse! Content-Type: text/html Content-Transfer-Encoding: base64 PGh0bWw+Cjxib2R5Pgo8YSBocmVmPSJodHRwOi8vd3d3LmNvdmVydGFidXNlci5jby51ayI+Q2xp Y2sgaGVyZSB0byBidXkgc29tZXRoaW5nPC9hPi4KPC9ib2R5Pgo8L2h0bWw+Cg== .
At this point, a large amount of information is extracted from the above message part. First, superabuser.com is extracted from the From: header. Next the body is base64 decoded to:
<html>
<body>
<a href="http://www.covertabuser.co.uk">Click here to buy something</a>.
</body>
</html>
And www.covertabuser.co.uk is extracted.
Both superabuser.com and www.covertabuser.co.uk are resolved via A records to IP addresses (192.0.2.10 and 192.0.2.20, respectively) and inverted. Next, they are normalized to RHSBL format (superabuser.com and covertabuser.co.uk, respectively).
The list is consulted by resolving A records for:
-
10.2.0.192.multi.surbl.org
-
20.2.0.192.multi.surbl.org
-
superabuser.com.multi.surbl.org
-
covertabuser.co.uk.multi.surbl.org
Any A records found are checked against the configuration file, and the local message context is updated to reflect any matches.
In the example above, if forward
was set to false
, the EHLO hostname (sender.example.com) would not be converted to an IP address and queried against DNSBL, and mail.example.com from the MAIL FROM and test.omniti.com from RCPT TO would only be looked up as domains. Also, there would be no lookup for 10.2.0.192.multi.surbl.org or 20.2.0.192.multi.surbl.org from the body, just superabuser.com.multi.surbl.org and covertabuser.co.uk.multi.surbl.org.