<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>HostingFu &#187; comment</title>
	<atom:link href="http://hostingfu.com/tag/comment/feed" rel="self" type="application/rss+xml" />
	<link>http://hostingfu.com</link>
	<description>Web Hosting Blog by a Software Developer</description>
	<lastBuildDate>Mon, 19 Jul 2010 09:27:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Fighting Comment Spams &#8211; There Gotta Be A Better Way</title>
		<link>http://hostingfu.com/article/fighting-comment-spams-there-gotta-be-better-way</link>
		<comments>http://hostingfu.com/article/fighting-comment-spams-there-gotta-be-better-way#comments</comments>
		<pubDate>Sat, 07 Jun 2008 11:35:21 +0000</pubDate>
		<dc:creator>scotty</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[comment]]></category>
		<category><![CDATA[mail]]></category>
		<category><![CDATA[mollum]]></category>
		<category><![CDATA[spam]]></category>

		<guid isPermaLink="false">http://hostingfu.com/?p=161</guid>
		<description><![CDATA[People usually associate spams with unsolicited commercial emails that try to either sell you the &#8220;little blue pill&#8221;, or Nigerians phishing for your bank account details. There are many techniques fighting email spams, either at the server side or at your email client. However if you run a blog or a forum on the Internet, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://hostingfu.com/files/images/spam-food.jpg" width="250" height="188" alt="SPAM!" style="float:right;margin:0 0 1ex 1ex;padding:3px;border:#ccc solid 1px;"/> People usually associate spams with <a href="http://en.wikipedia.org/wiki/E-mail_spam">unsolicited commercial emails</a> that try to either sell you the &#8220;little blue pill&#8221;, or Nigerians phishing for your bank account details. There are many techniques fighting email spams, either at the server side or at your email client. However if you run a blog or a forum on the Internet, you would also have experienced fighting <a href="http://en.wikipedia.org/wiki/Spam_in_blogs">comment spams</a> (unless, of course, that you run a spam blog yourself :). I have been blogging since 2001 and have employed various techniques to keep the spams at bay. Some of them worked well &#8212; at the beginning &#8212; but sooner or later spammers got smartened up and they can <em>almost</em> slip in a few spammy comments.</p>
<p>When I launched this blog 2 years ago, it was running <a href="http://akismet.com/">Akismet</a> for Drupal, and recently changed to <a href="http://mollom.com/">Mollom</a>, one of <a href="http://buytaert.net/">Dries</a>&#8216; startup company/project. It has been <em>effective</em> (except for the last few days). Mollom is sort-of similar to Akismet that it (1) uses a classifier to determine the likeliness of incoming comment being a spam (2) acts as a centralised database to collaboratively identify spams. Mollom does a few extra things when the comment is in a &#8220;not-so-sure&#8221; state, but discussing this would be beyond the scope of this blog post.</p>
<p><span id="more-161"></span></p>
<p>Another interesting feature for Mollom is its <a href="http://flex.org/">Flex</a> based statistics panel, showing the number of spams verses the number of legitimate comments. This is mine over the last 12 days:</p>
<p style="text-align:center"><img src="http://hostingfu.com/files/images/spam-comment.png" width="554" height="369" alt="Spam comments from Mollom" style="padding:3px;border:#ccc solid 1px;"/></p>
<p>As you can see the ratio between noise and signal is <b>huge</b> &#8212; there are many more spams than real comments. By the way, even for many real comments I am still not so sure about their legitimacy especially those one liner generic comments. As you can see there was a big jump today, because quite a few spams slipped through.</p>
<p>Although the ratio seems to be inline with most studies online, it still surprises me, when I compare it with the SNR of my email spams. I am running my main MTA at home with Postfix 2.4, and spams are filtered with <a href="http://dspam.nuclearelephant.com/">DSPAM</a>, a fast and light-weight email classification system that yields pretty good result (99.12% currently for my account). Here is the analysis graph over the same period of time (generated by dspam-web).</p>
<p style="text-align:center"><img src="http://hostingfu.com/files/images/spam-email.png" width="500" height="200" alt="Spam emails from DSPAM" style="padding:3px;border:#ccc solid 1px;"/></p>
<p>As you can see from the graph &#8212; there are only around <b>20% more</b> email spams than legitimate emails! Not huge difference in the case of this blog&#8217;s comment vs. spams.</p>
<p>Now, do we actually get relatively less email spams than comment spams? Not really. But what we do have is better email-spam fighting techniques that block out <em>most</em> email spams before they reach the classifying system (DSPAM in this case). Therefore what DSPAM gets is already a filtered subset of all the incoming email spams. A few techniques deployed:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Greylisting">Greylisting</a> (see my <a href="http://hostingfu.com/article/greylisting-spams-with-postfix-gld">previous article</a> on this subject). I actually don&#8217;t put greylisting on my primary MX anymore due to undesirable delay. However I found by putting greylisting on my secondary MX it is just about as effective, as most spambots pick the last MX entry in DNS to send spam to.</li>
<li>DNS-related filtering. For example ensuring sender has a FQDN, and a valid hostname, etc. I am surprised to see how many spams are actually filled with invalid sender addresses.</li>
</ul>
<p>That&#8217;s about it, but many people I know also employ <a href="http://en.wikipedia.org/wiki/DNSBL">RBL</a>, domain-key, etc. At least in my case they have effectively reduced the work for the classifier, which also result less spam getting into my INBOX at the end.</p>
<p>There gotta be a better way to fight comment spams &#8212; something between the web server and the actual application that filters out the obvious spams. <a href="http://www.bad-behavior.ioerror.us/">Bad-Behavior</a>? <a href="http://www.modsecurity.org/">Mod_Security</a>? They also increase undesirable false negatives though.</p>
<p>Any more thoughts?</p>
]]></content:encoded>
			<wfw:commentRss>http://hostingfu.com/article/fighting-comment-spams-there-gotta-be-better-way/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>
