fmII
Sun, May 11th home | browse | articles | contact | chat | submit | faq | newsletter | about | stats | scoop 21:10 PDT
in
Section
login «
register «
recover password «
[Article] add comment [Article]
Theme topics | Apps | Resources | Window Managers | Afterstep | Blackbox | Enlightenment | Fluxbox | GTK | IceWM | KDE | MetaCity | Sawfish | Window Maker

 Spam Filters
 by Sam Holden, in Category Reviews - Sat, Aug 23rd 2003 00:00 PDT

Spam is a growing problem for email users, and many solutions have been proposed, from a postage fee for email to Turing tests to simply not accepting email from people you don't know. Spam filtering is one way to reduce the impact of the problem on the individual user (though it does nothing to reduce the effect of the network traffic generated by spam). In its simplest form, a spam filter is a mechanism for classifying a message as either spam or not spam.


Copyright notice: All reader-contributed material on freshmeat.net is the property and responsibility of its author; for reprint rights, please contact the author directly.

There are many techniques for classifying a message. It can be examined for "spam-markers" such as common spam subjects, known spammer addresses, known mail forwarding machines, or simply common spam phrases. The header and/or the body can be examined for these markers. Another method is to classify all messages not from known addresses as spam. Another is to compare with messages that others have received, and find common spam messages. And another technique, probably the most popular at the moment, is to apply machine learning techniques in an email classifier.

Bayesian Filtering

Paul Graham kicked off a flood of mail filters implementing Bayesian filtering with his "A Plan for Spam" article in August 2002, though it was far from a new concept. In fact, ifile has used a Naive Bayes classification algorithm since August 1996 to automatically file mail into folders. In academic circles, Bayesian methods have been used in text classification for many years, and for spam detection prior to Graham, as evidenced by the 1998 workshop paper A Bayesian Approach to Filtering Junk E-Mail by Sahami, et al.

In a nutshell, the approach is to tokenize a large corpus of spam and a large corpus of non-spam. Certain tokens will be common in spam messages and uncommon in non-spam messages, and certain other tokens will be common in non-spam messages and uncommon in spam messages. When a message is to be classified, we tokenize it and see whether the tokens are more like those of a spam message or those of a non-spam message. How we determine this similarity is what the math is all about. It isn't complicated, but it has a number of variations.

There's a lot more to it than that (Bayesian methods are used a lot in the AI field, for example, in machine learning and user modelling), but that's all we need to know.

Some Spam Filters

In order to compare some spam filters, a number of filters had to be selected from the large list that is the Freshmeat Topic :: Communications :: Email :: Filters category.

The selection was restricted by only considering free software and only filters that didn't use network resources in their classification. The filters were further restricted to those that could be executed as standalone programs, read a message from standard input, and indicate via their output or their exit value whether it was spam or not.

Several satisfying the restrictions were downloaded, and a few of those removed due to problems with installation or execution. In the end, seven filters were used, five of which were Bayesian.

The version of each filter that was available for download on the Third of July 2003 was used. This was done because, though the email was filtered in bulk in August, the actual email was received during July; it should be used with July's versions of the programs. The filters are:

Bayesian Mail Filter
A Bayesian filter that aims to be smaller, faster, and more versatile than other Bayesian filters. Version 0.9.4 was used.
Bogofilter
A Bayesian filter designed for speed, for use at sites which process a large amount of mail. Version 0.13.7.2 was used.
dbacl
A digramic Bayesian filter, not restricted to just spam and non-spam. This mail filter will classify a message into one of many categories. Version 1.3.1 was used.
Quick Spam Filter
Another small, fast Bayesian filter. Version 0.5.9 was used.
SpamAssassin
A filter which uses a wide range of heuristic tests to determine whether a message is spam, each test adding or subtracting from a score. Messages over a threshold score are declared spam. Version 2.55 was used.
SpamProbe
The final Bayesian filter. Version 0.8b was used.
SPASTIC
A collection of procmail recipes which tag a message as spam if it matches any of a number of heuristic tests. This filter is not really a standalone program, but since a previous comparison with SpamAssassin was criticized, I provided a procmail wrapper so it could be included in the comparison. Version 3.0 was used.

The Email Data

The email used in the testing consisted of my email from the month of July 2003. The mail consisted of 1,273 messages, of which 1,073 were spam. For the Bayesian filters, a training set of 68 spam messages and 68 non-spam messages was used (my email from the second half of June, with a random sample of spam messages from the same period).

The messages used were all hand-classified as spam or non-spam.

Methodology

Each program was installed according to its documentation. For the filters that required training, the training set data was supplied. Each filter was then taken in turn and executed once for each email in the spam and non-spam sets, and the classification it gave was recorded.

Default options were used for the filters in all cases.

The aim was to examine the filtering abilities of the packages. Hence, whitelists were not used, even though, in practice, they probably would be. Some analysis was done to see how much performance would be improved by whitelists.

Results

The standard metrics for text classification are recall and precision. For spam filtering, we are trying to correctly classify spam messages as spam and not incorrectly classify non-spam messages messages as spam. Spam classified as non-spam is known as a false negative. Non-spam classified as spam is known as a false positive.

Precision is the percentage of messages that were classified as spam that actually are spam. High precision is essential to prevent the messages we want to read being classified as spam. A low precision indicates that there are many false negatives.

Recall is the percentage of actual spam messages that were classified as spam messages. High recall is necessary in order to prevent our inbox filling with spam. A low recall indicates that there are many false positives.

False positives are generally considered far worse than false negatives. Viewing a spam is better than not getting an important message. Hence, precision is a more important measure than recall, though, of course, a low recall makes a filter useless.

Experiment One

For the first test of the filters, the 68 spam and 68 non-spam training messages were used to train the filters that required training. Then, the set of 1,273 messages was classified by each of the filters, the results of which are shown in Table 1:

Table 1: Experiment One
FilterPrecisionRecallFalse PositivesFalse NegativesCorrect Classifications
SpamProbe100.0%47.9%0559714
Bogofilter100.0%34.4%0704569
Bayesian Mail Filter100.0%11.0%0955318
SpamAssassin99.9%80.0%12151057
dbacl99.0%47.0%5569699
Quick Spam Filter97.0%56.5%19467787
SPASTIC89.0%46.0%61579633

SpamAssassin is the only filter that has a recall rate worth using.

I think it's reasonably clear that the Bayesian filters did not have large enough training sets, and hence are only achieving low recall rates.

Experiment Two

For the second test, the training data consisted of the original 68 spam and 68 non-spam training messages, plus the first 100 non-spam messages and the first 500 spam messages of the email data.

All the filters were run on the remaining email data, 100 non-spam messages and 573 spam messages, producing the results shown in Table 2:

Table 2: Experiment Two
FilterPrecisionRecallFalse PositivesFalse NegativesCorrect Classifications
Bogofilter100.0%81.7%098538
SpamProbe99.8%97.2%115620
SpamAssassin99.8%78.7%1114521
Bayesian Mail Filter99.6%93.8%233601
dbacl99.2%89.0%459573
Quick Spam Filter94.9%79.1%23112501
SPASTIC88.5%43.3%30304302

Those results are more along the lines of how Bayesian filters are expected to perform. Quick Spam Filter and Bogofilter have noticeably lower recall than the other Bayesian filters, and Quick Spam Filter's precision is too low to be useful.

SpamAssassin is now showing a significantly lower recall rate than most of the Bayesian filters. It should be noted that, in practice, SpamAssassin will likely use a few more metrics (using network resources), and hence should do a little better than these results indicate. Also, SpamAssassin has a Bayesian classifier built in, but it wasn't used in these tests, since having five was enough.

That SpamAssassin is not better than the bulk of the other filters is a good sign for email filtering. Bayesian filters are reasonably easy to implement and require no knowledge of what differentiates spam from other email. SpamAssassin's rules, on the other hand, need to be developed by people and probably account for most of the work in creating the software.

SPASTIC has both significantly lower precision and recall than the other filters. Since people actually do use it to filter mail, it must be suitable for some email profiles, but for my email, it isn't usable.

Examining the False Positives

SpamProbe and SpamAssassin both generated one false positive, and it was caused by the same message. That message was essentially an advertisement for a conference, and many people would classify it as spam. However, I attended the previous conference, and I don't mind this showing up in my inbox. It has a number of spam-like properties. "HTML only" is a big one. It is also generically addressed ("Dear Friends"). The From: address looks like it might be auto-generated due to some digits (icce2003@...). Basically, it's spam that I didn't mind receiving. The address it's from could easily be entered into a whitelist to solve the problem, but it could also be argued that it should be classified as spam. I actually didn't read it when it turned up in my inbox in real life (I don't bother with HTML-only email), though it did remind me of the conference.

Bayesian Mail Filter also misclassified the message discussed above, as well as a message from my Web hosting provider announcing a server move and a little resulting downtime. Clearly, that is a message I want to receive. However, it was sent from the email address of my hosting provider, an address from which I expect to receive mail I want and which could easily be entered into a whitelist. In fact, it's the type of address that should be put on a whitelist, since valid commercial messages look a lot like unsolicited commercial messages.

dbacl gave four false positives, one of which was the conference advertisement mentioned above. Another was a message detailing administrative responsibilities of staff. It was from someone who doesn't send spam, and that address could easily be added to a whitelist.

It also flagged a forwarded IBM PhD Program nomination advertisement. This is another message that is essentially spam, but it was intentionally sent to a list I am on by a staff member. Again, a whitelist would catch this. The final false positive was a second copy of the IBM PhD Program email, this time forwarded by someone else to another list I am on.

Quick Spam Filter produced 23 false positives. These included the conference announcement and the hosting provider announcement mentioned above. A dozen or so newletters were flagged as spam, as were a few commercial messages that were not unsolicited and a couple of messages from my wife. Whitelists can solve these problems quickly and easily. The false positives that are not easily fixed are the problem, so I'll focus on those.

An email bounce notice was flagged as spam. A whitelist can't solve this without a fair amount of effort, since the address is determined by the machine on which I happen to run the "netfile" command.

A message requesting I contact a person about something which "needs urgent attention" was flagged as spam. This is what spam filtering nightmares are made of, especially when the email originates from an Associate Dean. Whitelists don't help, since Associate Deans change and I had never heard of this person before I received this message. The reply to my reply to this message was also flagged as spam.

Four seminar announcements were flagged as spam. Since the sender is often different, a whitelist won't fix this.

SPASTIC produced 30 false positives. The vast majority of these were newsletters, solicited commercial messages, and "calendar" reminder messages (which have no subjects), all of which cause problems easily solved by a whitelist.

SPASTIC also flagged an important message as spam, this time from my supervisor with the subject "URGENT". Putting my supervisor in a whitelist is reasonable, I guess, but this highlights the problem with SPASTIC's method of tagging a message as spam if any single test for spam succeeds. This particular message was not spam-like in any way, except for the subject.

Two more messages were tagged as spam which were not spam, but not from people I would put on a whitelist, since I wouldn't expect email from them.

So, allowing for whitelists, we generate the false positives shown in Table 3:

Table 3: With Whitelists
FilterFalse Positives
Bogofilter0
SpamProbe0
SpamAssassin0
Bayesian Mail Filter0
dbacl0
SPASTIC3
Quick Spam Filter6

Experiment Three

For the third test, the 1,273 pieces of July's mail were used as the training set. The testing set was the first week of August's mail: 252 mails, 210 of which were spam. The results are shown in Table 4. The low SpamAssassin and SPASTIC recalls indicate that my spam was quite different from what they expect spam to look like.

Table 4: Experiment Three
FilterPrecisionRecallFalse PositivesFalse NegativesCorrect Classifications
Bayesian Mail Filter100.0%99.0%02250
SpamProbe100.0%98.1%04248
Bogofilter100.0%86.2%029223
SpamAssassin100.0%59.0%086166
dbacl99.3%64.8%174177
Quick Spam Filter98.3%85.8%767439
SPASTIC84.8%31.9%1214397

Experiment Four

For the fourth experiment, the 200 non-spam messages from July's mail were combined with 200 spam messages randomly selected from July's mail to make the training set. The testing set was the same as in the previous experiment. Therefore, SpamAssassin and SPASTIC were not tested; since they don't use the training data, they would have the same results as in Table 4.

Table 5: Experiment Four
FilterPrecisionRecallFalse PositivesFalse NegativesCorrect Classifications
Bayesian Mail Filter100.0%91.0%019233
SpamProbe100.0%87.1%027225
Bogofilter100.0%67.1%069183
Quick Spam Filter95.7%63.8%676170
dbacl99.2%55.7%193158

The results in Table 5 show that all the Bayesian filters do worse than they did in Experiment Three, so a training set with a large amount of spam is better than a smaller, balanced training set. This conflicts with the documentation for sa-learn, SpamAssassin's Bayesian classifier (not used in these tests), which says, "You should aim to train with at least the same amount (or more if possible!) of ham data [as] spam."

Experiment Five

All the previous experiments haven't been very scientific, and have merely indicated how the various filters performed on various data sets. In order to produce some numbers with which it may be possible to objectively compare the filters, we will follow the methodology used in a technical report by Androutsopoulos, et al.: Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach.

The data set used was all my email from the month of July. This was partitioned randomly into ten equally-sized sets, each containing 107 spam messages and 20 non-spam messages. Three spam messages were left over and were discarded. For each of the ten sets, the other nine sets were combined and made up the training set, and it was tested. Hence, each filter was run ten times. The average precision and recall of the filters over those ten tests is shown in Table 6:

Table 6: Experiment Five Precision and Recall
FilterPrecisionRecall
MeanStd. Dev.MeanStd. Dev.
Bogofilter99.9%0.291%94.4%1.564%
SpamAssassin99.9%0.370%80.0%2.871%
dbacl99.4%0.844%79.2%4.541%
SpamProbe99.1%0.720%99.0%0.882%
Bayesian Mail Filter98.9%1.221%98.9%0.815%
Quick Spam Filter96.5%1.935%89.5%3.894%
SPASTIC89.1%3.640%46.1%4.053%

For our objective analysis, we will use the metrics defined in the technical report linked above. Some measure of the relative cost of false positives to false negatives is needed in order to do this. Androutsopoulos, et al. suggest using a measure in which each non-spam is treated as equivalent to a number of spam messages. That number can be tweaked to represent just how bad false positives are to the user. We'll call this weight FPW (false positive weight).

The variables we will define are:

FPW:False Positive Weight.
CCNS:Correctly Classified Non-Spam Messages.
CCS:Correctly Classified Spam Messages.
ICNS:Incorrectly Classified Non-Spam Messages.
ICS:Incorrectly Classified Spam Messages.
NS:Non-spam messages.
S:Spam Messages.

The Weighted Accuracy of the filter is then defined as: (FPW*CCNS + CCS])/(FWP*NS + S). It will be expressed as a percentage.

The Total Cost Ratio (please see the technical report for the justification of this metric) is then defined as: S/(FPW*ICNS + ICS). If the Total Cost Ratio is less than one, the filter is doing worse than no filter at all. The higher the Total Cost Ratio, the better.

Tables 7, 8, 9, and 10 show the results for three values of FPW. The Weighted Accuracy and Total Cost Ratio were calculated by summing all the variables across all ten runs, and not by calculating them ten times, then averaging. Doing this prevents infinite Total Cost Ratio scores (when no mistakes are made by a filter on one run).

Table 7: False Positive Weight: 1
FilterWeighted AccuracyTotal Cost Ratio
SpamProbe98.3%51.0
Bayesian Mail Filter98.1%45.0
Bogofilter95.2%17.5
Quick Spam Filter88.4%7.28
SpamAssassin83.1%4.98
dbacl82.0%4.69
SPASTIC49.8%1.68

Table 8: False Positive Weight: 9
FilterWeighted AccuracyTotal Cost Ratio
Bogofilter97.6%15.5
SpamProbe96.5%10.6
Bayesian Mail Filter95.8%8.92
SpamAssassin92.2%4.80
dbacl90.7%3.99
Quick Spam Filter85.1%2.51
SPASTIC60.8%0.95

Table 9: False Positive Weight: 99
FilterWeighted AccuracyTotal Cost Ratio
Bogofilter99.2%6.73
SpamAssassin98.5%3.42
dbacl96.6%1.49
SpamProbe95.2%1.07
Bayesian Mail Filter94.3%0.89
Quick Spam Filter82.9%0.30
SPASTIC68.3%0.16

Table 10: False Positive Weight: 999
FilterWeighted AccuracyTotal Cost Ratio
Bogofilter99.5%1.01
SpamAssassin99.4%0.88
dbacl97.4%0.21
SpamProbe95.0%0.11
Bayesian Mail Filter94.0%0.09
Quick Spam Filter82.5%0.03
SPASTIC69.4%0.02

If the Total Cost Ratio is greater than 1, the filter is worth using if the False Positive Weight is an accurate representation of the relative costs of errors. A False Positive Weight of 1 is only realistic for the case in which email is being marked by the filter, but still placed in your inbox for manual removal. If that is how you plan to use a filter, SpamProbe or Bayesian Mail Filter are the best options, according to Table 7.

A False Positive Weight of 9 might be appropriate if you are filtering spam messages to a folder which you check every day. In that case, Bogofilter, SpamProbe, and Bayesian Mail Filter all look reasonable, according to Table 8.

A False Positive Weight of 99 might be an accurate representation for someone who checks the spam folder each week for false positives. In this case, Bogofilter and SpamAssassin are the most worthwhile filters.

A False Positive Weight of 999 would represent a set-and-forget spam filter which sends spam to the bit bucket. In this case, Bogofilter is the only option, and it isn't any better than no filter.

Personally, I check my spam folder a few times each day. It only takes a second to glance at the new subjects and check the sender for the subjects that look like they might not be spam. So, for me, a False Positive Weighted Accuracy of 9 is appropriate.

The graph below gives an indication of how the filters compare at a range of False Positive Weights:

FPW Graph

It's important to note that the Total Cost Ratio isn't a perfect metric. It scores classifying a forwarded joke from an annoying coworker as spam, just as it scores classifying an urgent message from your boss or partner as spam.

Conclusion

The Bayesian filters, after training, offer better recall than the two heuristic filters. Catching a higher proportion of spam is clearly good, since that is the reason people use them. With insufficient training, however, the Bayesian filters perform poorly in comparison with SpamAssassin in terms of recall.

Based upon the results for my email, SpamProbe and Bayesian Mail Filter have usable recall percentages and acceptable precision. Four spam messages a week is much more bearable than 210, and well worth the minor effort involved in setting up one of these filters. If false positives are especially bad to you, Bogofilter is the best choice, according to my email.

SPASTIC is useless for my email, since it lets through far too much spam and marks some legitimate messages as spam messages. SpamAssassin is better; it lets through more spam than the Bayesian filters, but has enough precision to at least not hide wanted email.

Quick Spam Filter performs poorly when compared with the other Bayesian filters. I suspect it will improve in future versions, since clearly the underlying mechanism (Bayesian filtering) isn't the problem.

dbacl is similar to SpamAssassin in performance. However, it should be noted that dbacl can classify into multiple folders, not just spam and non-spam. This extra functionality may cause its performance to be less than that of the other Bayesian filters, but if you use that functionality, the tradeoff might be worthwhile.

Recommendations

If you want to filter spam out of your email, I strongly suggest not automatically deleting messages. File the spam away, just in case you get false positives. Any spam which isn't picked up by your filters should be manually moved to the spam folder, not deleted. The same is true for your real email; instead of deleting it, move it to another folder. That way, you'll build a collection of spam and non-spam messages, which will come in handy for training filters.

Start by filtering with SpamAssassin. The Bayesian filters don't work well if you don't train them, and you can't train them without having a collection of your past email (both spam messages and non-spam messages). A non-learning filter makes it easy to build this collection.

Watch for false positives. You really do need to scan the spam folder every so often to check for items that shouldn't have been flagged as spam, especially if you ever move to a learning filter. Otherwise, it will learn that some valid messages are spam messages. If your filter supports whitelists (if not, you can always add a whitelist to a chain of filters), use them. If friends' email gets flagged as spam, add them to the whitelist. It will save you time and lost messages in the end. If you can find the inclination, adding people to your whitelist preemptively should help avoid false positives.

Once you have enough spam messages and non-spam messages correctly classified, you can think about using a Bayesian filter. You really want a few hundred of each type, preferably more. You also want to make sure there isn't an unintended identifying feature of the spam messages or non-spam messages. For example, don't use non-spam messages from the past 6 months and only the last month of spam messages; the learning algorithm might decide that messages with old dates are non-spam messages and messages with new dates are spam messages. Don't try to pad the numbers with duplicates; it will overtrain the filter on the features in those messages.

Moving to a learning filter is a good thing, since keeping up-to-date with the latest rules isn't necessary. The learning algorithm won't get worse with time, since it will learn the ever-changing look of spam. (At least until spammers make their spam look very much like non-spam messages.)

Once you are using a learning filter, you must remember to train it every so often. If you don't, the performance will deteriorate as your email usage changes. Of course, deteriorating performance is a great reminder to do some training. Training will be easy, since you will have a nice collection of classified spam messages and non-spam messages, and you will have corrected by hand any misclassifications the filter makes. Don't just blindly feed the filter's own classifications back in as training data; it will reinforce any mistakes. Another option is to simply train it on the messages it classified as false positives or false negatives, to correct the mistakes.

Try spam filtering. It puts the joy back into email.


Author's bio:

Sam Holden is a seemingly eternal student who is expecting his first child in mere weeks, and hence will actually be finishing university and getting a "real job" Real Soon Now.


T-Shirts and Fame!

We're eager to find people interested in writing articles on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an article gets a t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.

[Comments are disabled]

 Referenced categories

Topic :: Communications :: Email :: Filters

 Referenced projects

Apache SpamAssassin - An extensible email filter that is used to identify spam.
Bayesian Mail Filter - A fast and efficient Bayesian mail filter.
bogofilter - A Bayesian spam filter.
dbacl - A system for statistically classifying text into user-defined categories.
ifile - A Bayesian email filter.
Quick Spam Filter - A fast statistical spam filter.
SpamProbe - A content-based spam detection program.
SPASTIC - Simple Procmail Anti-Spam Templates (improved code).

 Comments

[»] Excellent Article
by Dan - Apr 12th 2006 05:54:21

I know I'm very late in here, but just wanted to say well done on an excellent article Sam! Even though your article is now very old it still has a lot of relevance today!

--
Dan Field ClearMyMail Ltd. dan.field@clearmymail.com Guaranteed 100% successful spam filter. + Requires no client software + Works on any platform (Unix, Windows, Mac etc) + Clears all spam e-mail, for good! + Guaranteed success + Affiliate programme now available!

[reply] [top]


[»] Banning foreign sites - No SPAM now
by Kevin B Evv - Feb 21st 2006 08:47:54

At my little community-oriented Internet site, I have banned almost all (the one's I could find) foreign IP addresses from Europe, Asia, South America, etc. I was getting 3 SPAM's per day to my system accounts, and now I get less than 3 SPAM's per month. I also have much fewer assauts via the network on my web server, email server, etc. This solution won't fit many of your situations, but it would fit some. Do you really care if someone in China cannot send you e-mail or browse your website? I'm sure you wouldn't care if someone in Europe could not remotely admin your Internet site. You may send email directly to me if you would like my list of foreign IP's or an detailed description of how I do this. No marketing, just my free tech info. Kevin

--
http://www.evvpages.com http://www.bachcottage.com

[reply] [top]


[»] Antispam relay!
by vint - Jan 22nd 2006 05:07:52

Thanks for this article, it helped me too. I perefer spamassasin+clamAV+mailfilter. I configured smtp relay server, which kills 99% of spam. Other companies sell analog hardware spam filters for $9K, but we can get it almost for free!

[reply] [top]


[»] Checked and marked very usefull.
by Micky Willow - Apr 8th 2005 08:18:45

Thanks for that greate article. I work at an small webhosting company, your text helped answering most of the major questions about implementing an Spamfilter for our Mailservers. I hope you will write more articles like this one in the future.

--
Just asking ... http://www.basharama.com

[reply] [top]


[»] Bookmarked
by sid007 - Dec 4th 2004 11:54:31

Thank you so much for this much needed article! Spam :Grr:

Looking for other alternative to fight spam and trying to implement new anti-spam in addition to spamassasign at the hosting site

Page bookmarked!

--
FavWebLinks Affordable Web Hosting Free Web Templates

[reply] [top]


[»] Why not just Stop Spam on the servers - period.
by telnetfusion - Aug 12th 2004 02:53:45

Hi All,

With all the Spam as well as security scare around emails and the need for an anti-virus software to protect from those Spam as well as worms, trojans, and viruses that spread thru email, why not simply add a simple XML based "RSS feed" like feature on top of the email server.

A user then simply selects their server as an RSS Feed Source that they subscribe to.

A sligthly modified "email client" with XML parser could then be built on top of an RSS Feed Reader / Aggregator that would stop emails and Spams from being downloaded and pushed down to clients like in POP3 but users will get only the email subject and headers like "news-feeds" so they could decide and select which ones they need to read (download from the server) and the rest can be nuked straight at the server ?

I think - once most spams can be stopped this way in their tracks on the servers - and end-users (many now-a-days PC Neophytes) stop helping in its propagation over the net - the tidal wave of spam can be greatly reduced and managed.

Sounds like quite a simple solution isn't it ? Can any one see anything wrong with this ?

- Syed Ahmad

--
Resistance is Futile, You will be assimilated"

[reply] [top]


    [»] Re: Not that easy
    by Melvin - Oct 17th 2004 23:15:03

    It ain't that easy, there are many messages and addresses NOT SPAM that could be trapped in those rules, I have for example, an old e-mail address (since 1997) that have been used by some infected servers to send spam, but it wasn't in fact me or my PC at all. An blacklist server would classify my e-mail as spam but in fact I never sent one of those e-mails and it would get banned. This spam crap is getting very annoying. Every e-mail should be signed by default to track spammers.

    [reply] [top]


    [»] Re: Why not just Stop Spam on the servers - period.
    by modbot - Dec 8th 2004 10:47:48

    Syed, Interesting point and this would be much easier to add with a Virtual Private Server

    --
    eYell.com :: Where the fun never stops Web Hosting Reviews

    [reply] [top]


    [»] Re: Why not just Stop Spam on the servers - period.
    by Kevin - Dec 20th 2004 10:58:50

    I just now stumbled across this great article. Syed, I wanted to respond to your post about RSS feed.

    I'm using an outsourced spam/virus filter: Sentinare PostGuard that uses both Bayesian Filtering and SpamAssassin as well as 'greylisting', and 'tarpitting' a multi pronged approach that really works great.

    But the main feature is the web based quarantine for training the filter. Really quite impressive, and as you mentioned, they have an RSS feed of your quarantined items so you dont have to login to the quarantine to check for any false positives. Even though the accuracy is like 99.87% for me, its nice to have the RSS feed anyways.. and add in IMAP and TLS support, geeez. The best. Sentinare knows email!


    > Hi All,
    >
    > With all the Spam as well as security
    > scare around emails and the need for an
    > anti-virus software to protect from
    > those Spam as well as worms, trojans,
    > and viruses that spread thru email, why
    > not simply add a simple XML based
    > "RSS feed" like feature on top
    > of the email server.
    >
    > A user then simply selects their server
    > as an RSS Feed Source that they
    > subscribe to.
    >
    > A sligthly modified "email
    > client" with XML parser could then
    > be built on top of an RSS Feed Reader /
    > Aggregator that would stop emails and
    > Spams from being downloaded and pushed
    > down to clients like in POP3 but users
    > will get only the email subject and
    > headers like "news-feeds" so
    > they could decide and select which ones
    > they need to read (download from the
    > server) and the rest can be nuked
    > straight at the server ?
    >
    > I think - once most spams can be stopped
    > this way in their tracks on the servers
    > - and end-users (many now-a-days PC
    > Neophytes) stop helping in its
    > propagation over the net - the tidal
    > wave of spam can be greatly reduced and
    > managed.
    >
    > Sounds like quite a simple solution
    > isn't it ? Can any one see anything
    > wrong with this ?
    >
    > - Syed Ahmad

    [reply] [top]


[»] Spamassassin
by sk6307 - Jul 20th 2004 10:20:25

While this is an excellent article about spam filtering software, it should be noted that spamassassin is not just a spam filter. It is a frontend to a number of different filters and rlb checks and other tests for spam. For example, spamassassin can be configured to check the razor, pyzor, and dcc content based blacklistst for the message, it can also run other spam filters as part of the filtering process. It is very easy to extend using perl scripting. There are a number of additional checks that are being created by users for spamassassin which keeps it up to date with modern spamming techniques, e.g. Rules de Jour and rules emporium. Also, it is possible to configure spamassassin to use any of the filters mentioned in the article (and most others) as part of the filtering process. I personally recommend dspam or crm114. Spamassassin also integrates very well with MTAs, and can run in a high performance daemon mode.

--
sk6307@btinternet.com

[reply] [top]


    [»] Re: Spamassassin
    by Henrik - Sep 14th 2004 07:34:38

    Agreed, I also recommend dspam.

    Henrik
    autofaaaaaaahn.com

    [reply] [top]


    [»] Re: Spamassassin
    by modbot - Dec 8th 2004 10:31:04

    Yes, this was an excellent article about spam filtering software. Along with SpamAssassin and a module plugin sid you can have the ultimate spam protection.

    --
    eYell.com :: Where the fun never stops Web Hosting Reviews

    [reply] [top]


    [»] Re: Spamassassin
    by julia12 - Apr 27th 2005 00:34:17


    > While this is an excellent article about
    > spam filtering software, it should be
    > noted that spamassassin is not just a
    > spam filter. It is a frontend to a
    > number of different filters and rlb
    > checks and other tests for spam.
    >
    > For example, spamassassin can be
    > configured to check the razor, pyzor,
    > and dcc content based blacklistst for
    > the message, it can also run other spam
    > filters as part of the filtering
    > process.
    >
    > It is very easy to extend using perl
    > scripting.
    >
    > There are a number of additional checks
    > that are being created by users for
    > spamassassin which keeps it up to date
    > with modern spamming techniques, e.g.
    > Rules de Jour and
    > rules emporium.
    >
    > Also, it is possible to configure
    > spamassassin to use any of the filters
    > mentioned in the article (and most
    > others) as part of the filtering
    > process. I personally recommend dspam or
    > crm114.
    >
    > Spamassassin also integrates very well
    > with MTAs, and can run in a high
    > performance daemon mode.


    I have installed Spamassassin on FreeBsd. Yes, the Spamassassin is very handy but in my case it take more processor time and overload the system

    --
    Anti Spam Tips from Julia

    [reply] [top]


[»] Bayesian observations and applications
by Mat Farrington - May 5th 2004 19:49:22

1. I noticed a steady drop-off in the performance of my bogofilter after a period of excellent performance.

After some experimentation I had to conclude that my problem was with using too much email for training. (Would it be correct to refer to this as over-training?)

I decided to freshen up my email, and train bogofilter using only the most recent 6 months worth (and I wrote a script to automate and maintain this window).

Performance has gone to excellent again.

2. I am experimenting with using bayesian filtering for general mailbox direction. Instead of training with the classes "spam" and "non-spam" I am also using my work, mailing lists, domain-rego etc mailboxes to train so that the filter can redirect mail into those boxes too.

There are probably a wide number of tasks that can benefit from this approach.

[reply] [top]


[»] Wow, great article.
by Buy Gifts - Apr 10th 2004 00:15:13

That is just an amazing about of information. I've been reading up alot about Spam lately, it is becoming more and more of an issue for everyone. Thanks again for agreat article.

--
Digital Cameras - Biz

[reply] [top]


    [»] Re: Wow, great article.
    by Polesoft Inc. - Jun 22nd 2004 23:55:07

    Has anyone tried Lockspam from www.polesoft.com? It will work with all POP3 mail clients and is free to use. You could choose to order a Pro version, but you could still choose to use the Free version for ever.

    [reply] [top]


      [»] Re: Wow, great article.
      by Free_Anti_Spam - Sep 4th 2004 04:29:31

      Thanks for your message! We've tested your Lockspam Free and it's really impressive! We've recommended it at our www.free-anti-spam.com. Please have a look there. Have a good day! Jerry
      > Has anyone tried Lockspam from
      > www.polesoft.com? It will work with all
      > POP3 mail clients and is free to use.
      > You could choose to order a Pro version,
      > but you could still choose to use the
      > Free version for ever.
      >
      >

      [reply] [top]


    [»] Another way to stop Spam
    by BozMo - Jul 23rd 2004 02:43:33

    The system I have sends back a challenge to the first email from any source and then if it is replied to correctly adds the source to a whitelist. It seems to work pretty well BozMo

    --
    BozMo

    [reply] [top]


[»] spamd
by Jacek Artymiak - Mar 4th 2004 11:58:54

When you find out that you get far too much spam from one or more addresses, you might use spamd. Uses less resources, and is more effective and spam filters. I'd call it a spammer filter :-)

You can learn more about it from:

Using spamd

--
Jacek Artymiak freelance consultant & writer http://www.artymiak.com e-mail: jacek|at|artymiak|com

[reply] [top]


[»] Better Bayes
by lomedhi - Feb 5th 2004 11:02:21

I think we're overlooking a possibility that would combine the effectiveness of rule-based and Bayes filtering. Words and phrases need not be the only Bayes tokens scored. Any property of a message, i.e. the result of evaluating any rule, can be a Bayes token. POPFile implements this to some degree with "pseudowords", but I would really like to see what would happen if a large ruleset like SpamAssassin's were fully tokenized. Forget about manual scoring -- why not let Bayes figure it out?

[reply] [top]


[»] SpamBully - Bayesian spam filter
by Gregory - Dec 3rd 2003 03:51:34

I'm an experienced user spam filter user. I used to use SpamInspector, plus used SpamArrest (online server) to add a challenge email step to one of my accounts...
recently I discovered another filter at www.spambully.com.
SpamBully was rated BEST BUY in October 2003 WIRED Magazine:
http://www.wired.com/wired/archive/11.10/play.html?pg=10

I have downloaded trial, and love it!
SpamBully does it all - I have even been able to delete all of my old, manually entered spam-filtering rules - the Bayesian wizard is phenominal out-of-the-box! Now mine is trained, I am seeing 99%+ accuracy (in fact, for the past 24 hours, I've seen 100% accuracy!)

[reply] [top]


    [»] Re: SpamBully - Bayesian spam filter
    by Hal F. Gottfried - Dec 20th 2003 19:08:51

    I used a SpamBayse which is free and works well but I also use Spam Arrest and I must say it is worth its weight in gold. It stops spam at the source unlike some other services that say they do the same thing. I'm going try your Spam Bully, if it does better than the free one we'll see.


    > I'm an experienced user spam filter
    > user. I used to use SpamInspector, plus
    > used SpamArrest (online server) to add a
    > challenge email step to one of my
    > accounts...
    >
    > recently I discovered another filter at
    > www.spambully.com.
    > SpamBully was rated BEST BUY in October
    > 2003 WIRED Magazine:
    > http://www.wired.com/wired/archive/11.10/play.html?pg=10
    >
    >
    > I have downloaded trial, and love it!
    > SpamBully does it all - I have even been
    > able to delete all of my old, manually
    > entered spam-filtering rules - the
    > Bayesian wizard is phenominal
    > out-of-the-box! Now mine is trained, I
    > am seeing 99%+ accuracy (in fact, for
    > the past 24 hours, I've seen 100%
    > accuracy!)
    >

    [reply] [top]


    [»] Re: SpamBully - Bayesian spam filter
    by Vitalie Esanu - Mar 16th 2004 07:05:16

    Yes, i like also spam bully works fine for me, and tested for arround 50000 messages.

    > I'm an experienced user spam filter
    > user. I used to use SpamInspector, plus
    > used SpamArrest (online server) to add a
    > challenge email step to one of my
    > accounts...
    >
    > recently I discovered another filter at
    > www.spambully.com.
    > SpamBully was rated BEST BUY in October
    > 2003 WIRED Magazine:
    > http://www.wired.com/wired/archive/11.10/play.html?pg=10
    >
    >
    > I have downloaded trial, and love it!
    > SpamBully does it all - I have even been
    > able to delete all of my old, manually
    > entered spam-filtering rules - the
    > Bayesian wizard is phenominal
    > out-of-the-box! Now mine is trained, I
    > am seeing 99%+ accuracy (in fact, for
    > the past 24 hours, I've seen 100%
    > accuracy!)
    >

    [reply] [top]


[»] Virus and virus bounces...
by Sam - Oct 4th 2003 09:15:00

I was planning on rerunning the tests with more data last month (when I would have more mail). However, I've been flooded with what I think are virus, virus bounce, "we detected a virus" notices, and fake microsoft patches.

For August there are 1335 emails in my junk folder. For September there are 23231. 22MB for August, 2902MB for September.

It's probably too late for anyone to read the comments on this article (being a couple of months old) but just in case I'd like to ask for opinions on what should be done with those emails?

Should they be eliminated from the tests, since they aren't spam? Should they be classified as spam? Should they be classified as non-spam?

Or should "junk" filters be tested as opposed to "spam" filters?

After all not having to wade through 20,000 virus/worm/whatever emails a month is of more value to me than not having to wade through 'only' 1,300 spams...


[reply] [top]


[»] Quick Spam Filter's accuracy
by Andrew Wood - Aug 30th 2003 05:24:31

Just thought I'd point out that since this article was written, QSF has got a lot more accurate - version 0.9.0 now comes in about third in the final test above, according to the article's author.

[reply] [top]


[»] Details(?)
by antrik - Aug 27th 2003 17:49:11

Thank's for the article -- you have taken considerable measures to get a useful comparision, and except for the omission of SpamAsassin with learning and for a few wrong conclusions (not important enough to go into), it's really good.

However, two people already asked about products not included, and I'm the third one to do so: What about ifile? This was the one I was most interested in. (As it was the only one I found so far doing not only binary classification. The one in this test performs too poor to be useful.) I understand that you didn't want to fiddle with programs that didn't want to work out of the box; but at least you could mention which ones failed, and why.

I really hope you can get some more programms running, and repeat the tests. (With an even bigger test set, if possible.)

[reply] [top]


    [»] Re: Details(?)
    by Sam - Aug 27th 2003 21:45:33


    > Thank's for the article -- you have
    > taken considerable measures to get a
    > useful comparision, and except for the
    > omission of SpamAsassin with learning
    > and for a few wrong conclusions (not
    > important enough to go into), it's
    > really good.

    There wasn't enough training data for spamassassin to turn on its learning algorithm anyway - and I tested all of them without any tweaking of config options.


    > However, two people already asked about
    > products not included, and I'm the third
    > one to do so: What about ifile? This was
    > the one I was most interested in. (As it
    > was the only one I found so far doing
    > not only binary classification. The one
    > in this test performs too poor to be
    > useful.) I understand that you didn't
    > want to fiddle with programs that didn't
    > want to work out of the box; but at
    > least you could mention which ones
    > failed, and why.

    It should be noted that my use of dbacl wasn't perfect. I didn't tell it I was classifying email (via a command line option) and hence it's email specific parsing features weren't used. I've exchanged mail with the author of dbacl about this and I'll enable those options next time. I reran the final experiment for dbacl with those options and the precision improved (though not significantly) and the recall dropped by almost half. I suspect I did something wrong :)

    As for ifile I didn't use it because for some reason I couldn't find the freshmeat entry for it (I swear I searched for it...). Obviously it has one, since it is linked from the article now :)

    I'll include it next time. I plan to rerun the tests in a month or so. I'll post the results summary here, but won't do another freshmeat article since it would be a bit redundant and boring. I'll probably stick a longer write up on the web somewhere.


    >
    > I really hope you can get some more
    > programms running, and repeat the tests.
    > (With an even bigger test set, if
    > possible.)

    I plan to. I should have a couple more months of email so the data set will be three times the size. I was pointed to the spamassassin corpus as well which I'll use (seperately). It will suffer from the same spamassassin problem - the blacklists can't be used for old emails...


    [reply] [top]


[»] Use multiple filters
by Tom Van Vleck - Aug 25th 2003 09:09:32

(Evaluating SpamAssassin without its Bayes feature enabled is probably non-useful. Who would do that? The question is, what slips past SA with Bayes turned on?)

I run four spam filters. The combination gets just about everything. First is a short blacklist of repeated spammers. Then SpamBouncer (www.spambouncer.org), SpamAssassin with Bayes enabled, and simple filtering on the last hop before the mail was delivered to my ISP. (90% of all spam is sent from servers that either have no reverse DNS or are identified as a dialup, cable, or DSL line.)

Every day, I get some spam that is caught by only one of these filters. Since I got the last three set up, I get one or two false negatives a month: i.e. spam seen as legitimate. I see a few pieces of mail a month that get false positives until whitelisted. Last month I got over 17,000 mail messages, 95% spam, and the percentage is gradually rising every month.

The spammers and the filters are in a continual arms race. Whoever commits first loses: you can filter any fixed kind of spam, and you can design messages to evade any fixed filter. But the delay in designing filters will leave some incentive for spammers, and the rising cost of filtering may eventually cause some people to give up.

As one of the first creators of an email program I am saddened by its misuse. See www.multicians.org/thvv/mail-history.html for what I remember.

[reply] [top]


[»] Pre-server spam filtering to be reviewed in Network World
by Joel Snyder - Aug 24th 2003 08:41:21

As an interesting corollary to this article, I have submitted a review to Network World which will be published in two weeks on 16 enterprise-sized spam filters, including their spam filtering performance and speed, as well as a host of other features. I'll come back and post a URL for reference when it is available on the web.

I can't share the results before publication, but I can say that we looked at a very different set of products, specifically those which take SMTP in and feed out SMTP, so these would be considered prefilters before an enterprise mail server. Because Network World is aimed at a corporate networks managers, I didn't directly review any open source products, but several of the commercial products, of course, have open source cores. I also considered a very different set of requirements. For example, in a network with 10,000 users, individual training of filters wouldn't be practical except in a whitelist/blacklist sense.

Anyway, if you found this interesting but think that you need a more commercial answer to spam's problems for large numbers of users, then I'd recommend you take a look at that article when it comes out.

[reply] [top]


    [»] Re: Pre-server spam filtering to be reviewed in Network World
    by Joel Snyder - Sep 15th 2003 07:36:27

    The review is now published and can be read at:

    http://www.nwfusion.com/reviews/2003/0915spam.html

    Joel Snyder

    [reply] [top]


      [»] Re: Pre-server spam filtering to be reviewed in Network World
      by David Walker - May 13th 2004 02:06:52


      > The review is now published and can be
      > read at:
      >
      >
      >
      > http://www.nwfusion.com/reviews/2003/0915spam.html
      >
      >
      > Joel Snyder


      I have used several spam filters, but the most consistently effective one I have run into is MailMate. I'm very happy with it and would like to recommend it to anyone.

      You can find it at: Here

      --
      David Walker

      [reply] [top]


        [»] Re: Pre-server spam filtering to be reviewed in Network World
        by andreo - Feb 18th 2006 21:06:58


        >

        > % The review is now published and can

        > be

        > % read at:

        > %

        > %

        > %

        > %

        > http://www.nwfusion.com/reviews/2003/0915spam.html

        > %

        > %

        > % Joel Snyder

        >

        >

        >

        >

        > I have used several spam filters, but

        > the most consistently effective one I

        > have run into is MailMate. I'm very

        > happy with it and would like to

        > recommend it to anyone.

        >

        > You can find it at: Here

        >

        thanks for the link i had tried many spamfilters but din't get any sucess i tried this and looks good for now

        --
        andreo

        [reply] [top]


    [»] Re: Pre-server spam filtering to be reviewed in Network World
    by lagasek - Jan 15th 2004 02:12:49


    > ...
    > enterprise mail server. Because Network
    > World is aimed at a corporate networks
    > managers, I didn't directly review any
    > open source products, but several of the
    > commercial products, of course, have
    > open source cores...
    >

    Hmm... I'm a corporate "network" manager, if one
    considers system management to be network
    management. Anyway, I just wanted to point out that
    you aren't doing the readership of a supposedly
    technical magazine any favors when leaving out open-
    source in comparison. You might be doing the
    advertisers a favor but the fact is that most managers
    are going to be familiar with open-source
    implementations before commercial ones in nearly all
    cases.

    Including SpamAssassin directly is the obvious
    benchmark in my book but probably not a favorable
    comparison to knockoffs when price is factored in.

    [reply] [top]


[»] suggestion
by Jonas Bofjall - Aug 24th 2003 01:07:56

Thank you for an interesting article. I had hoped
to see crm114.sourceforge.net included as well. It
uses a superset of Bayesian classification and is
said to perform better.

[reply] [top]


    [»] Re: suggestion
    by Sam - Aug 24th 2003 04:58:43


    > Thank you for an interesting article. I
    > had hoped
    > to see crm114.sourceforge.net included
    > as well. It
    > uses a superset of Bayesian
    > classification and is
    > said to perform better.
    >

    It was one of the packages that didn't compile/install/run on my first attempt, and hence wasn't used.

    [reply] [top]


[»] You're asking way too much out of a spam filter comparison
by Nuclear Elephant - Aug 23rd 2003 21:07:56

I think a lot of you are asking far too much out of a spam filter comparison article. Yes, I think a feature matrix would be nice (client-side, server-side, trainable, etc.), but outside of a feature matrix, any test run on spam filters is going to be specific to the user's email behavior. There's no tride and true way to get effective tests for any spam tool unless you try it yourself. There are just way too many variables:

- How predictible is the email?
- Since training is an ongoing process, when does the initial "training" stop and the measurements start?
- At what rate are spams and innocent messages interleaved (spam ratio) ?
- How are false positives handled by each system (some simply re-learn, others re-learn violently)
- How many spammy sounding newsletters or lists is the user subscribed to?
- Is the user a member of any forums that may border on spammy sounding mail (such as a fetish list, coupons collector list, etcetera)
- What is the time interval each user submits messages into the system for re-learning? (This directly affects the messages that arrive in-between).
- How is the spam tool tuned? Several have knobs to adjust.
- How many total messages does the user receive?

... just to name a few.

Bottom line, even the best of reports you see around the net today should still be labeled with "your mileage may vary". Unless you're subscribed to the same mailing lists and receive the same emails as the person running the test, your results will be different (at least they will if your spam tool is any good at what it does). Any spam tool worth its web page will be accurate in the high 90%'s, once you get beyond that it's a matter of "well this worked better for me.."

[reply] [top]


[»] Training Questions
by Nuclear Elephant - Aug 23rd 2003 19:16:19

I'm curious to know if the initial training is all that was performed; as you know, bayesian filters learn from their mistakes, so I would like to know if false positives were also put back into the system to be retrained by any of the tools that supported it. I also would be interested in different reports based on different training threshholds...while the minimum threshhold for a particular filter might be x, if you train to x2, what difference it makes. Since most of these tools are in it for the long haul, it would be very interesting to see how much more effective they became over different periods of time. Some tools may be very ineffective at 1000 emails and 100% accurate at 2000 emails. Measuring the ramp-up cycle would be nice; your graphs do something of the sort, but I don't see any hard data though.



[reply] [top]


[»] IMAP support?
by Dig Dug - Aug 23rd 2003 17:27:30

It seems that every spam filter review misses a very important detail -- does it support IMAP and/or SSL? Every spam filter I've tried requires plaintext (as in insecure) POP3 support on the server -- something I'm not willing to use.

A spam filter that supports IMAP and can file e-mail into different IMAP folders would be greatly appreciated.

[reply] [top]


    [»] Re: IMAP support?
    by Sam - Aug 23rd 2003 18:23:52


    > It seems that every spam filter review
    > misses a very important detail -- does
    > it support IMAP and/or SSL? Every spam
    > filter I've tried requires plaintext (as
    > in insecure) POP3 support on the server
    > -- something I'm not willing to use.
    > A spam filter that supports IMAP and can
    > file e-mail into different IMAP folders
    > would be greatly appreciated.

    I restricted it to filters which run as standalone programs reading a single email from stdin (or a file) and indicating whether it was spam or not. I did that because that is the kind of filter I personally used, and because it made testing practical (no need for an actual MTA or MUA or pop or imap server). spastic was an exception, since it relies on procmail, but since it was compared with spamassassin before I thought it worthwhile to include it and hence wrote a script around procmail.

    Any decent MUA will provide a way to run a filter over an email and put it in a folder based on the result. Or something like procmail can be used on the server end.

    [reply] [top]


    [»] Re: IMAP support?
    by dustwun - Aug 23rd 2003 18:27:54

    The filters don't do this because it's not the job of the filter. It is the job of the smtp/lmtp/procmail etc. Since these are server side filters, the email server is what makes or breaks the security.

    > It seems that every spam filter review
    > misses a very important detail -- does
    > it support IMAP and/or SSL? Every spam
    > filter I've tried requires plaintext (as
    > in insecure) POP3 support on the server
    > -- something I'm not willing to use.
    > A spam filter that supports IMAP and can
    > file e-mail into different IMAP folders
    > would be greatly appreciated.

    [reply] [top]


[»] All spam filters fail in comparison...
by Gunfighter - Aug 23rd 2003 12:49:45

... to TMDA. It's not a filter, rather a whitelist/blacklist based challenge/response system. I installed it on my ISP's mailserver and customers are bombarding our phones asking us to install it for them (especially w/ the SoBig.F virus making its rounds).

http://tmda.net/

or

http://freshmeat.net/projects/tmda/

-- Gun

[reply] [top]


    [»] Re: All spam filters fail in comparison...
    by Marc Merlin - Aug 23rd 2003 14:50:37


    > ... to TMDA. It's not a filter, rather a
    > whitelist/blacklist based
    > challenge/response system. I installed
    > it on my ISP's mailserver and customers
    > are bombarding our phones asking us to
    > install it for them (especially w/ the
    > SoBig.F virus making its rounds).

    Argh....
    This is the worst and most rude solution.

    I do not want to have to ack your TDMA mails, I automatically
    block you from all my mail servers as soon as I receive a confirmation mail, and as a listmaster I've done the same on
    any site were people sent TDMA-like ack messages.

    Think mail admin sending a message and getting back 10,000+ TDMA messages (for instance monthly mailman
    password reminder), except that the biggest list server I admined had 400,000 users...

    [reply] [top]


      [»] Re: All spam filters fail in comparison...
      by Gunfighter - Aug 23rd 2003 15:23:55


      >
      > % ... to TMDA. It's not a filter, rather
      > a
      > % whitelist/blacklist based
      > % challenge/response system. I
      > installed
      > % it on my ISP's mailserver and
      > customers
      > % are bombarding our phones asking us
      > to
      > % install it for them (especially w/
      > the
      > % SoBig.F virus making its rounds).
      >
      >
      > Argh....
      > This is the worst and most rude
      > solution.
      >
      > I do not want to have to ack your TDMA
      > mails, I automatically
      > block you from all my mail servers as
      > soon as I receive a confirmation mail,
      > and as a listmaster I've done the same
      > on
      > any site were people sent TDMA-like ack
      > messages.
      >
      > Think mail admin sending a message and
      > getting back 10,000+ TDMA messages (for
      > instance monthly mailman
      > password reminder), except that the
      > biggest list server I admined had
      > 400,000 users...

      When used properly, it's very effective. This includes list subscriptions like your mailman newsletters. If you just throw it up and don't take care of it by actively (and proactively) managing your whitelist, it's useless.

      Also, you'd better get used to those confirmations. Until some major, radical change is enacted with a protocol (like SMTP) to combat spam, these challenge/response systems are going to be more and more commonplace.

      TMDA responds to the "return-path" header address. I personally use ezmlm to manage my lists, and the return-path header automatically removes these persons with the "ack" emails coming back to the system.

      -- Gun

      [reply] [top]


        [»] Re: All spam filters fail in comparison...
        by simmons75 - Aug 23rd 2003 21:51:44


        >
        > %
        > % % ... to TMDA. It's not a filter,
        > rather
        > % a
        > % % whitelist/blacklist based
        > % % challenge/response system. I
        > % installed
        > % % it on my ISP's mailserver and
        > % customers
        > % % are bombarding our phones asking us
        > % to
        > % % install it for them (especially w/
        > % the
        > % % SoBig.F virus making its rounds).
        > %
        > %
        > % Argh....
        > % This is the worst and most rude
        > % solution.
        > %
        > % I do not want to have to ack your
        > TDMA
        > % mails, I automatically
        > % block you from all my mail servers as
        > % soon as I receive a confirmation
        > mail,
        > % and as a listmaster I've done the
        > same
        > % on
        > % any site were people sent TDMA-like
        > ack
        > % messages.
        > %
        > % Think mail admin sending a message
        > and
        > % getting back 10,000+ TDMA messages
        > (for
        > % instance monthly mailman
        > % password reminder), except that the
        > % biggest list server I admined had
        > % 400,000 users...
        >
        >
        > When used properly, it's very effective.
        > This includes list subscriptions like
        > your mailman newsletters. If you just
        > throw it up and don't take care of it by
        > actively (and proactively) managing your
        > whitelist, it's useless.
        >
        > Also, you'd better get used to those
        > confirmations. Until some major, radical
        > change is enacted with a protocol (like
        > SMTP) to combat spam, these
        > challenge/response systems are going to
        > be more and more commonplace.
        >
        > TMDA responds to the "return-path"
        > header address. I personally use ezmlm
        > to manage my lists, and the return-path
        > header automatically removes these
        > persons with the "ack" emails coming
        > back to the system.
        >
        > -- Gun

        I've got to agree with Marc. Challenge-Response is
        rude. Really rude. Want me to do business with
        you? Don't make me do challenge-response. If I
        have to jump through hoops, I'm going elsewhere.
        Ditto if you just want to communicate via email. If
        you can't make it as simple as firing off an email and
        getting a response, it's a waste of my time. This is
        10x more annoying (and more of a waste of time)
        than voicemail menus.

        If you see Challenge-Response as any sort of solution
        to a spam problem, please, just go away. I don't
        want to deal with you. You're beneath notice.

        [reply] [top]


          [»] Re: All spam filters fail in comparison...
          by Tombstone0 - Aug 25th 2003 07:10:57


          > I've got to agree with Marc.
          > Challenge-Response is
          > rude. Really rude. Want me to do
          > business with
          > you? Don't make me do
          > challenge-response. If I
          > have to jump through hoops, I'm going
          > elsewhere.
          > Ditto if you just want to communicate
          > via email. If
          > you can't make it as simple as firing
          > off an email and
          > getting a response, it's a waste of my
          > time. This is
          > 10x more annoying (and more of a waste
          > of time)
          > than voicemail menus.
          >
          > If you see Challenge-Response as any
          > sort of solution
          > to a spam problem, please, just go away.
          > I don't
          > want to deal with you. You're beneath
          > notice.
          >

          There are no hoops.
          I realize this is very difficult but if you look up at the
          top of your e-mail client you'll see a little reply button.
          Press it then press sent (or just press r in pine).
          You see, no hoops.

          [reply] [top]


            [»] Re: All spam filters fail in comparison...
            by antrik - Aug 27th 2003 17:24:25


            > There are no hoops.
            > I realize this is very difficult but if
            > you look up at the
            > top of your e-mail client you'll see a
            > little reply button.
            > Press it then press sent (or just press
            > r in pine).
            > You see, no hoops.

            You have no right to annoy the whole world with your stupid challange messages just because you do not want to be annoyed by SPAM.

            Answering one such request is considerably more work than deleting a dozen SPAMs -- and what's worse, it's work you force on others.

            Some people seem to get fanatic about SPAM. I know it's annoying, but maybe you should make a reality check. You may notice a "little" discrepancy between motivation and means.

            [reply] [top]


              [»] Re: All spam filters fail in comparison...
              by Goat Tosser - Dec 29th 2003 04:34:21



              > You have no right to annoy the whole
              > world with your stupid challange
              > messages just because you do not want to
              > be annoyed by SPAM.

              Interesting. Someone does not have the "right" to determine who they wish to talk to but everyone else has the "right" to what they wish to whomever they wish, when they wish. What "gives you the right" to determine who can or cannot have rights?

              A reality check might be in order - there are no rights in nature, merely consequences. Rights are man made and they go hand in hand with responsibility.

              [reply] [top]


            [»] Re: All spam filters fail in comparison...
            by minorgod - Sep 25th 2004 10:06:46


            >

            > There are no hoops.

            > I realize this is very difficult but if

            > you look up at the

            > top of your e-mail client you'll see a

            > little reply button.

            > Press it then press sent (or just press

            > r in pine).

            > You see, no hoops.

            It's not about jumping through hoops, it's about pissing off potential customers and clients who send a message expecting it to be read in a timely fashion, only to return to their computers the next day to discover a stupid f#!@*ng challenge message sitting in their inbox telling them that not only has their message been detained for questioning, but it now requires that a second message be sent just to free the first one. It's like mail purgatory. Challenge/response is a horrific alternative to decent baysean spam filtering and is basically a way for a mail administrator to say, "those mean old spammers have gotten the best of me and I'm going home to get some sleep and let my users deal with the problem." Anyone with the skill to set up a challenge/response system on their server could easily set up a better system using SpamAssassin and/or a series of open-source baysean and white/blacklist filtering. Challenge-response systems will never gain widespread adoption simply because too many people do business online. When you buy something from an online store, an automated system usually sends you a receipt with your order info. An automated system is not going to respond to your idiot challenge message just to send you your order receipt. And if an automated system DID respond to your challenge, then it would by definition, be defeating your spam filter. You really think a spammer can't write a script to fake out your challenge-response system? I'm not a particularly gifted programmer, but I could do it in a few minutes, just by looking at the format of your challenge messages and writing a few regular expressions. Hell, what's to stop someone from writing a plug-in or script that auto-replies to all challenge messages? A baysean filter would have no problem identifying such challenge messages with 98% accuracy and replying to each and every one. If challenge-response systems gained widespread popularity, that's exactly what you'd see. Someone would write a filter to auto reply to such challenge messages. Then spammers would simply start fishing for addresses using this method, and at the same time they'd be priming the accounts to accept their spam unimpeded. In fact, over the long term, I'd be surprised if the only people such challenge-response systems don't annoy are spammers themselves, who pretty much live to break your rules and will always manage to script their way around any obstacles you put up. That's why the only really viable solution is for ISPs to stop spammers from sending spam, for webmasters to run server-based filters and for e-mail users to run their own baysean filters. It's a 3-tiered approach that doesn't offload all the responsibility to the end user as challenge-response systems do. As someone who relies almost entirely on e-mail for business communication (and has for many years now), I can say with absolute certainty that challenge-response sytems are part of the problem, not the solution. I always tell my clients not to use them. I simply point my clients to spambayes.sourceforge.net and they are generally very pleased with the results. It's not the absolute best, but it's damned good and it's free.

            --
            Nowhere does science promise emancipation.

            [reply] [top]


        [»] Re: All spam filters fail in comparison...
        by rusty - Dec 18th 2004 12:36:52


        >

        > %

        > % % ... to TMDA. It's not a filter,

        > rather

        > % a

        > % % whitelist/blacklist based

        > % % challenge/response system.

        > %

        > % Argh....

        > % This is the worst and most rude

        > % solution.

        > %

        > % I do not want to have to ack your TDMA

        > % mails, I automatically

        > % block you from all my mail servers as

        > % soon as I receive a confirmation mail,

        >

        > When used properly, it's very effective.

        > This includes list subscriptions like

        > your mailman newsletters. If you just

        > throw it up and don't take care of it by

        > actively (and proactively) managing your

        > whitelist, it's useless.

        >

        > Also, you'd better get used to those

        > confirmations. Until some major, radical

        > change is enacted with a protocol (like

        > SMTP) to combat spam, these

        > challenge/response systems are going to

        > be more and more commonplace.

        >

        > TMDA responds to the "return-path"

        > header address. I personally use ezmlm

        > to manage my lists, and the return-path

        > header automatically removes these

        > persons with the "ack" emails coming

        > back to the system.

        >

        > -- Gun

        Sorry Gun, after a couple of painful "challenge|response|fail" message sequences, I tend to agree with Mark. I have not had the pleasure of encountering a user who has made a good use of a challenge|response system.

        If I ever get an e-mail challenge from someone I know personally, I will advise them the next time I see them that they need to add me to their whitelist themselves, I will not play the challenge|response game. If I ever again encounter a challenge from someone who has given me their e-mail (or rather given an organization I am a part of, with the understanding that someone will be sending them e-mail) I will block that e-mail address at my server. If I get a challenge in response to my responding to a message they sent, it tells me that they have their filter set up incorrectly, and I will treat that e-mail address the same way as if I received a bounce from them.

        -Rusty

        [reply] [top]


    [»] Re: All spam filters fail in comparison...
    by Sam - Aug 23rd 2003 18:41:09


    > ... to TMDA. It's not a filter, rather a
    > whitelist/blacklist based
    > challenge/response system. I installed
    > it on my ISP's mailserver and customers
    > are bombarding our phones asking us to
    > install it for them (especially w/ the
    > SoBig.F virus making its rounds).

    I thought SoBig sent emails that claimed to be from addresses it found in address books on infected users machines. In which case a whitelist style filter would likely let a lot of them through - since they would be from people on the whitelist.

    But not using windows for mail I admit I don't know much about the workings of these things.

    [reply] [top]


    [»] Re: All spam filters fail in comparison...
    by Nuclear Elephant - Aug 23rd 2003 20:17:24


    > ... to TMDA. It's not a filter, rather a
    > whitelist/blacklist based
    > challenge/response system. I installed
    > it on my ISP's mailserver and customers
    > are bombarding our phones asking us to
    > install it for them (especially w/ the
    > SoBig.F virus making its rounds).
    >
    > http://tmda.net/
    >
    > or
    >
    > http://freshmeat.net/projects/tmda/
    >
    > -- Gun

    This approach is relatively easy to evade in the light of stolen accounts and automated scripts. Not to mention it's a pain for the people sending legitimate emails. Until we have a digital certificate registry for SMTP or something of that sort, approaches to spam like this will only last a short while before the black hats write scripts around them. On top of this, this tool probably contributed to an even higher load of email sending back thousands of challenges to people whose email addresses were forged. This approach is great for mailing lists and the like, but apart from all my other reasons for not liking it, it's just too darn annoying.

    [reply] [top]


    [»] Re: All spam filters fail in comparison...
    by Hal F. Gottfried - Dec 20th 2003 19:11:05

    Yes but in a world where new customers are contacting you to purchase your product or service they don't want to be greeted by a challenge and response. You lose business that way.


    > ... to TMDA. It's not a filter, rather a
    > whitelist/blacklist based
    > challenge/response system. I installed
    > it on my ISP's mailserver and customers
    > are bombarding our phones asking us to
    > install it for them (especially w/ the
    > SoBig.F virus making its rounds).
    >
    > http://tmda.net/
    >
    > or
    >
    > http://freshmeat.net/projects/tmda/
    >
    > -- Gun

    [reply] [top]


[»] Individual vs. systemwide use?
by Nathan Neulinger - Aug 23rd 2003 12:15:30

That is a VERY important characteristic of your installation that is not reflected in these tests. Which of these engines is most suitable for an 'all users' installation, as opposed to an individual user having their own bayes DB's, and own finely tuned filtering.

[reply] [top]


    [»] Re: Individual vs. systemwide use?
    by Gilgongo - Aug 23rd 2003 16:09:26


    > That is a VERY important characteristic
    > of your installation that is not
    > reflected in these tests.

    I totally agree. Making a distinction between the two types of installation is CRITICAL to real-world usage.

    The main reason for this is that in most organisations, site-wide rulesets must be so general as to be almost meaningless - there being such a heavy administration penalty for false positives that any attempt to really clamp down is just not worth it.

    If you don't believe me, try setting up something like SpamAssassin site-wide for 100 users with a default score of 5.0. The result is complete chaos. One man's spam is another man's legitimate marketing message I'm afraid.

    Added to this is that most users have at best a limited interest in training Bayesian filters. In about 99% of cases the user's only access to the mail server is via Outlook, which severely limits the options when it comes to training.

    These tests all assume the user is a command-line toting geek.

    --
    Gone are the days when you could say "Those were the days."

    [reply] [top]


      [»] Re: Individual vs. systemwide use?
      by alrubin - Oct 2nd 2006 21:05:17

      At present there are lot of individual spam filters based on Bayesian filtration for Outlook clients. Most popular are Spam Bully, Inboxer, and Outlook Spam Filter. My most liked outlook spam filter is Spam Reader.

      [reply] [top]


    [»] Re: Individual vs. systemwide use?
    by Sam - Aug 23rd 2003 18:37:23


    > That is a VERY important characteristic
    > of your installation that is not
    > reflected in these tests. Which of these
    > engines is most suitable for an 'all
    > users' installation, as opposed to an
    > individual user having their own bayes
    > DB's, and own finely tuned filtering.

    That would be useful. However, I think in this test I was very upfront about the fact that the tests were all run on a single users email. Hence the results are only of use for single user setups.

    I don't have access to a set of hand classified emails from a large number of users in order to run a group type test. Privacy issues make getting such a set of data very difficult.

    For "all users" the bayesian filters will be of less use. Since one of their great benefits is they can reduce false positives by learning what the "spam-like" non-spams look like for the user. Over a group of users that won't be as useful.

    SpamAssassin set to mark the message via the headers seems a good solution for a group. The savvy users can then autofilter marked mails in their mail clients, and the users who don't care for such things can just bang the delete button...

    [reply] [top]


      [»] Re: Individual vs. systemwide use?
      by pellaeon - Aug 24th 2003 07:48:39


      >
      > SpamAssassin set to mark the message via
      > the headers seems a good solution for a
      > group. The savvy users can then
      > autofilter marked mails in their mail
      > clients, and the users who don't care
      > for such things can just bang the delete
      > button...
      >
      >

      That is _exactly_ what I ended up doing for my network of ~180 users. Tagging email and sending it through is the only viable option, since doing it 'the other way' drove me nuts inside a month. I ended up intercepting job offers to a member of our staff! And it took me about 1 hour every day to clear up the false positive mess.

      [reply] [top]


      [»] Re: Individual vs. systemwide use?
      by iRude - Dec 14th 2004 00:38:36

      Hmmm . . . very confusing. Just kidding. I don't the bayesian filters are that bad though.

      --
      My :2cents: are not Rude

      [reply] [top]


[»] spambayes?
by PerlChild - Aug 23rd 2003 12:08:42

Can I know why this fine python filter wasn't tested?
(my smarthost has bogofilter, spamassassin AND spambayes running from a .procmailrc and spambayes was the one that most impressed me so far, although spamassassin's network resources guarantee it has the best results)

[reply] [top]


[»] popfile
by Andres Mauricio Mujica - Aug 23rd 2003 11:19:28

Hi, it´s a greatarticle with a good analysis, but what about popfile? its one of the best spam filter available for a user.

besides,it works at win,mac,*nix!!

[reply] [top]


    [»] Re: popfile
    by Hal F. Gottfried - Dec 20th 2003 19:18:21

    I agree POPFile is a great tool, even it's outlook counter part but I see it more as a mail filer tool rather than a spam tool. Yes you can use it to filter spam but it dosn't have the real filters that make up an anti spam program.


    > Hi, it´s a greatarticle with a
    > good analysis, but what about popfile?
    > its one of the best spam filter
    > available for a user.
    >
    > besides,it works at win,mac,*nix!!
    >
    >

    [reply] [top]


[»] Thanks
by Jeff Flowers - Aug 23rd 2003 07:53:38

Not too long ago, I submitted a question via Ask
Slashdot, seeking to learn if any comparitive testing of
Bayesian filters had been accomplished but no one
seemed to be aware of any. I think that this article
does show ( as I expected) that all Bayesian filters are
not created equal.

You might, however, want to include DSPAM in your
next testing cycle.

Thanks,

Jeff Flowers

[reply] [top]


    [»] DSPAM Stats
    by Nuclear Elephant - Aug 23rd 2003 15:44:47

    If this helps any for your report...

    It is Sat Aug 23 18:57:03 US/Eastern 2003
    DSPAM has caught 7947 spams
    ...learned 406 spams
    ...scanned 40474 innocent emails
    ...with 0 false positives
    Your SPAM Ratio is 17.11%

    [reply] [top]


[»] bogofilter is good enough :)
by aldem - Aug 23rd 2003 07:45:56

As of my experience... The first (and only) filter I tried so far is bogofilter.

When I installed it first, I didn't actually pre-train it, just followed recommendation - when I see a message which is a spam, I mark it as spam, when I see a message which is actually non-spam but classified as spam, I remark it as non-spam.

So the results