Skip to main content

url_ripper – URL Extraction

Last updated March 2020

The url_ripper module is designed to be a comprehensive toolkit for DNS-based content correlation. The url_ripper functions as a normal DNS block list (DNSBL) on the connecting IP address as well as a right-hand side block list (RHSBL) on the domain of the envelope sender, the domains of email addresses in any headers, the URLs contained in the email body (even in transfer encoded content), and content that has been obfuscated with HTML encoding.

Additionally, the URLs and domains found can be resolved to IP addresses and those IP addresses will be looked up in the DNSBL.


To use the url_ripper module, you must choose to install the Policy Tools suite during installation.


The following is an example configuration:

url_ripper "url_ripper1" {
  base = ""
  bits = [ = "list1_hits" = "list2_hits" = "list3_hits" = "list4_hits" = "list5_hits" = "list6_hits" = "list7_hits" = "list8_hits"
  values = [ = "simple_hits"
  address_headers = ( "Return-Path" "From" "Sender" "Reply-To" "Errors-To")



This module no longer supports the checklist_suppress_hostnames and checklist_suppress_ips options, which were dependent upon the deprecated checklist module. You can replace this functionality with Lua datasource functions. For more information, see “ds_core - Datasource Query Core” and msys.dp_config.whitelist.

The following are the configuration options defined within this module:


Explicitly specifies headers from which emails (and in turn mailbox domains) should be extracted.


Describes the base domain under which prospects should be resolved.


Allows for multi-value lists to be used. If a bitwise AND between the provided key and the list-resolved IP address in question is non-zero, then the key is considered a match and the context key associated with the value is incremented by one.


If set to false, this option disables the conversion of hostnames to IPs and the DNSBL lookups of those IPs. Default value is true.


Limits the number of DNSBL lookups acting as a brake in case of a possible denial of service attack. Default value is 100.


Traditional "exact match" style list check. If the list-resolved IP address in question exactly matches the key, the context key associated with the value is incremented by one.

Operational Example

Use of the url_ripper module is complicated, mostly due to abusers using complicated methods to avoid detection. As such, an example of operation is warranted. In this example, will be the DNSBL base domain.

<<< 220 server ESMTP ecelerity 1.2 (r4169) Mon, 16 May 2005 09:45:48 -0400
>>> EHLO
<<< says EHLO to
<<< 250-8BITMIME

At connect, the module will resolve to an IP address. This resulting IP address will be processed through the configuration, and the appropriate context variables will be updated to reflect any matches.

>>> MAIL FROM:<>
<<< 250 MAIL FROM accepted

Now the mailbox domain is found in the envelope sender. The base domain is mapped to, is resolved to an IP, and is processed. Additionally, is resolved to an IP address, and that IP is reversed and looked up, just as was the connecting IP address.

>>> RCPT TO:<>
<<< 250 RCPT TO accepted.
>>> DATA
<<< 354 continue.  finished with "\r\n.\r\n"

From: "Abuser" <>
Subject: Abuse!
Content-Type: text/html
Content-Transfer-Encoding: base64


At this point, a large amount of information is extracted from the above message part. First, is extracted from the From: header. Next the body is base64 decoded to:

<a href="">Click here to buy something</a>.

And is extracted.

Both and are resolved via A records to IP addresses ( and, respectively) and inverted. Next, they are normalized to RHSBL format ( and, respectively).

The list is consulted by resolving A records for:





Any A records found are checked against the configuration file, and the local message context is updated to reflect any matches.

In the example above, if forward was set to false, the EHLO hostname ( would not be converted to an IP address and queried against DNSBL, and from the MAIL FROM and from RCPT TO would only be looked up as domains. Also, there would be no lookup for or from the body, just and

Was this page helpful?