3

Posting raw e-mail addresses on a website is a guaranteed way to get an inbox full of spam. If your website can be picked up by search engine indexing bots, spambots can also find you. In this post, I present several solutions to this dilemma and the actual code to solve it using Ruby on Rails.

The Problem

E-mail Harvesters use spambots to crawl websites and collect e-mail addresses to add to bulk e-mail lists. Spambots scan html looking for strings that match a Regular Expression for an e-mail address. Fortunately, spammers are easy to deceive because 99% of all them are complete idiots. Let’s face it, if spammers were intelligent people with legitimate programming skills, then they would have real jobs and be making a lot more money writing software that’s actually useful.

The Simple Solution – Obscuring e-mail addresses via html entity encoding

Before I begin, you will need to understand what an html entity is. These articles do a good job of explaining them: HTML Character Entities, Entities

So what is obscuring an e-mail address via html encoding? This simply means that instead of using normal ASCII characters for you email address, you instead use the ASCII equivalent html entity. Web browsers will display the html entities like normal ASCII characters and the end user will never know the difference. For example, an email address like email@web.com would look like the following using html enities:

email@web.com

According to a study by CDT in March of 2003, "obscuring an e-mail address with html entities is an effective way to avoid spam from harvesters." They claim that html entity encoding was 100% effective in their study. I would imagine this is because spammers are too lazy or simply can’t figure out how to detect and convert html entities back to a normal ASCII characters before running a regular expression on it. Regardless, saying that anything is 100% effective should be looked at critically and as such, I decided to see for myself.

I have setup two e-mail addresses: obscured@truespire.com and un-obscured@truespire.com. If you look at the source code for this post you will notice that the first e-mail address has been encoded and the second e-mail address has not. For the sake of this test, I have disabled all spam protection for both accounts so that every possible spam message can get through. I will check back in a few weeks to see which account has more spam and post my findings here.

Findings Update: See comments at the end of this article.

Some Overkill Solutions Using Javascript

If my simple solution above isn’t enough for you, try these:

  1. Obscuring e-mail addresses via javascript:
    This approach is a bit more complex. It involves using e-mail links that are essentially broken in your html (i.e. <span class="e-mail">"Clark Kent" &lt;clark [at] dailyplanet [dot] com&gt;</span>) and then using javascript to parse that address back to a usable form. This approach works well but seems overkill to me. I won’t post the code to this solution because you can find it here: http://nicknotfound.com/2008/12/12/e-mail-address-obfuscation/
  2. Obscuring e-mail addresses via encrypted javascript:
    If you really hate spam and are willing to go to the utter extreme to stop it, then go with a solution like this: http://hivelogic.com/enkoder

Ruby To The Rescue

Alright, time to get our hands dirty and obfuscate some e-mail addresses using Ruby. Here is the basic method that takes an e-mail string as a parameter (see method posted below). First, the method creates two arrays for lower and uppercase letters. Please note that this method does not convert numbers, dashes or periods etc, but this will be enough to fake out the spam bots (it would be pretty easy to add this functionality). These arrays and the index for each item in the array is used to calculate the html entity number. See HTML ISO-8859-1 Reference for more information on entities.

The method iterates thru the e-mail string mapping each char to either the lower and upper case array index value and then calculates the entity number. Before the split method pushes the calculated value to an array, it adds an ampersand and ; to make the entity number syntactically correct for html. It also manually checks for the @ sign and manually inserts the entity for @. Before the method returns, it flattens/joins the array back to a string. Done, that was easy.

    def obscure_email(email)
        return nil if email.nil? #Don't bother if the parameter is nil.
        lower = ('a'..'z').to_a
        upper = ('A'..'Z').to_a
        email.split('').map { |char|
            output = lower.index(char) + 97 if lower.include?(char)
            output = upper.index(char) + 65 if upper.include?(char)
            output ? "&##{output};" : (char == '@' ? '&#0064;' : char)
        }.join
    end

Options For Using This Method In Rails

We have several ways that we can plug this method into the Rails Framework to make it usable in your views:

  1. Create a helper method to be used in your Rail’s Views
    This is probably the preferred approach by most Rails programmers.
  2. Use the method in your Active Record model so you can call it directly on your model
    All you would have to change is grabbing the e-mail from the AR object (email = self[email]) instead of passing it in as a parameter.
  3. Make the method available on any string object
    This approach would allow you to call this method on any string from anywhere in rails (including views, controllers, & models). To do this, add the above method to a module in /configuration/initializers. This essentially will add the method to the Rails’ ActiveSupport String Inflections class. So the module would be like this: module ActiveSupport::CoreExtensions::String::Inflections, you can name the file whatever you want. The only change you would have to make is how you grab e-mail string parameter. Since you just want to return a string and not manipulate it in place, you will want to do something like this: email = self.clone as the first line of the method and remove the parameter passing. This approach allows you to call it directly in your views like this for example: @person.email.obscure_email. Some Rails programmers will certainly frown on this approach as this method is only relevant to certain strings and not all strings. Thus, I would say no to this approach. I simply posted it because its a slick way to work with Rails and Ruby.
  4. If you can think of another good approach, I welcome comments.

Feel free to try out the above method from your Interactive Ruby shell (IRB) and of course you have my permission to steal this code.

3 Responses to “Obfuscating Email Addresses In Ruby on Rails”

  1. And, what happened? Did it work?
    Best, Joris

  2. Wesley Gooch says:

    Joris,

    Sorry about not posting the results… I totally forgot about it. I just looked at the inboxes for both accounts and they each had only one spam message that were identical for both inboxes. I would imagine that someone manually sent that email.

    I looked into why both accounts were lacking any spam and it turns out, that although I turned off the spam protection for those individual accounts, I had completely forgot that I had server wide spam protection turned on. So the test was doomed to fail from day one. At least I know my spam settings are working!

    Although I can attest to my obscure_email method above working on some of our commercial projects that are running on different servers. Hope the article helped anyways.

    Wes

  3. Wesley Gooch says:

    Joris,

    Here is a link to someone who ran the same test as I did: http://blog.macromates.com/2006/obfuscating-email-addresses/

    His findings were 286 to 1. That is 286 spam messages were received on the un-encoded email and only 1 to the encoded email. So it seems that other people are also having great results with this approach.

    I hope that helps,
    Wes

Leave a Reply