
Does Gmail do a good job of filtering spam?Dave, I'm very tempted to move from Yahoo Mail to Google's Gmail, but I'm still unsure whether Gmail does a good job of filtering spam or not. I get lots of spam - darn it - and would love to hear that Gmail does a great job of filtering out all of this junk from my mailbox. What's your opinion? I've been quite impressed with both the functionality and spam filtering in Gmail, actually. Without me paying much attention, my spam folder fills up every day and only once in a blue moon is there a real, legitimate message buried in the junk. Similarly, not much spam makes it into my main mailbox either, so it appears to me that Google's figured out both sides of this problem. My friend and colleague Aaron Dragushan of Wondermill has a pretty fascinating theory about how Gmail's spam filter works: After noticing a pattern in the delivery of spam to my gmail inbox, wherein I would receive more spam if I left the account logged in (and refreshing itself) than when I logged in once a day. Since gmail is simply magnificent at handling spam, I wondered what might be going on. Here's an idea, but first a little background. Most web email providers use a "This is Spam!" button so that users can help them identify spam.
For example, if you sent 50,000 messages to AOL users and 10% identified it as spam, they might block your IP address from sending mail to AOL. To reduce the load on users, they could hold back most of the messages and pass along only the first 5,000 to see what people think. A trial run, if you will. Based on that small sample they can still tell if it's spam and take appropriate action. Gmail adds a twist and a dramatic improvement. Using that same example above, Gmail would deliver all 50,000 messages and then watch what happens. Let's say they saw that 250 people have actually seen the message, and 23.7% said it was spam. Their anti-spam system might say, "Good enough, we can act on that". And this is where something special happens. Gmail quietly reaches into the inboxes of those other 49,750 people and gently nudges the message into the spam folder. Their key insight was that because they're a web-based email provider, they don't so much "deliver" mail as make it available for viewing. All the people who haven't logged in or refreshed their browser window haven't actually "received" the message yet. If you were to make it disappear, they'll never know it was there, and don't have to trip over it. Huzzah! And that's how Google harnesses the power of the group to help everyone. I can't attest whether that's really how the Gmail spam algorithm works, but given that a typical junk message is sent to dozens, hundreds, or even thousands of users, and that it's not hard to match the same message in multiple mailboxes (I'd check Message ID and From address, personally), this certainly makes some intuitive sense. Anyone want to test this theory out? Just stay logged in to Gmail for a day and count how many messages are automatically routed to your spam folder versus how many spam end up in your regular mailbox. Then on another day log in just once at the beginning and end of the day and compare filtered versus unfiltered spam messages. In any case, yes, Gmail is very elegant and very well designed. The only limitation is that you still need to get an invitation to join, but fortunately I have quite a few I'm happy to share with Ask Dave Taylor readers: How do I invite people to join Gmail? Note: this article was updated to reflect a useful refinement Aaron sent me regarding how he believes Gmail works.
Help others find this article at Del.icio.us, Digg, Netscape, Reddit, and Stumble Upon
Categorized:
Computer and Internet Basics
(Article 4191)
Tagged: Previous: Backup Requires Password on Windows XP? Next: Can I import my Outlook Express Address Book into MSN Hotmail? Subscribe!
Never miss another useful Q&A article again! Subscribe to AskDaveTaylor with Google Reader. Laughing Squid, who hosts my lame website (along with two misanthropemanor.com email addresses), uses SpamAssassin to, well, assassinate spam, with thus far 100% accuracy. In the five months [11/05] that Squid has been my host, I've had 0 false alarms, 0 junk messages labeled as good - and ALL spam has come to the public address that's a sitting duck out on every page of the site, ripe for the trollbots' taking; the other ID, given to only a select few who don't use Microsoft Windows and/or Outlook and have a clue what "BCC" is used for, has received no spam at all. By comparison, my Comcast address - set up only so I could subscribe to Usenet and NEVER USED for email or displayed in newsgroup postings - with the "spam filter" engaged, lets through around 25 spam messages a week. (In other words, 100% of the mail I receive at Comcast is spam, and it's coming to an address that's never been given out.) At my two AT&T WorldNet addresses - which have been abandoned - around 75 messages per week slip through the "spam filter." Two GMail addresses - one for friends and family who *do* use MS Windows or Outlook, and one for subscriptions, registrations and any other places that might share my ID with third-parties - receive practically NO spam. As for "dictionary attacks," I've reused the same user names across every commercial domain at which I have an email address; that is, myname@att.net, myname@comcast.net, myname@gmail.com. Both AT&T and Comcast use Brightmail for filtering, and since both ISPs are set to just toss those messages which they actually deem to be spam, I have no idea what the total amount of spam received is. I don't know whether GMail uses a third-party product or in-house, proprietary filtering, but whatever it is, it works pretty damn well. My GMail accounts are set to save local copies of spam, and over time only a handful of messages have landed in the filtered folders. Judging from the simple the rules/analysis shown in the sample below (SpamAssassin's report attached to garden variety spam received at this address), it seems like it wouldn't take six NASA scientists working around the clock for a week to come up with the magic algorithms to separate the good from the bad (or the bad from the good), with maybe some minor end-user white list tweaking to allow access for those Microsoft Outlook and AOHell friends and family who insist on using (or forwarding) a screaming rainbow of HTML crap, in-line graphics and animated smileys in every email message. Why then, can't AOHell, MSN, WorldNet, SBC/Yahoo!, Comcast et al figure out how to at least minimize the amount of spam that lands in everyone's In Baskets?
Running a Windows enterprise was like working in the emergency room of Cook County Memorial. Working on Linux was like being a Maytag repair I have been using Gmail for a while now, and I have NEVER had any spam land in my inbox!! (unlike hotmail) Posted by: Hugh at November 11, 2006 11:39 AMGmail - It used to be very good at seperating spam from real mail - But now... I get about 3 spam emails a day. It's the kind of spam where they have random words (ie. and then volcano ducks she said no wisdom) or some crap like that and then they add attachments. It's not doing a good job for me anymore!!! Posted by: M at December 19, 2006 10:56 AMI'm using Gmail about a year already and i'm not getting spam. Not because gmail's spam filter but because i'm not getting spam at all. However i had important emails marked as spam way too MANY times and only god knows how many times i missed such email just because they were deleted from Spam folder before i noticed them. The worst thing is that there no option to turn off the Spam folder or set up a POP download on all emails including those which gmail marks as spam! So only think i can do is log in to the web-client often and check the spam folder... GMAIL SUCKS MY IMPORTANT EMAILS WERE SENT TO SPAM WHICH I DELETED. Posted by: Jonathan Lee at April 13, 2007 5:20 PMTry using Gmail for more than a year - You'll notice about 1 spam email per day making its way into your mailbox. I'm tired of these people saying Gmail has such an amazing spam filter, it's not that great people. Just today I got an email with "viagra" in the subject line - How gmail didn't filter that is beyond me... Posted by: J.C. Biggums at August 21, 2007 10:50 AMI am a CS Professor who has being using gmail since early on, and find its spam treatement incredibly good. Yesterday as an in-class demo in setting up qmail I send some emails to various of my addresses, all of which go to different places but end up being forwarded to gmail. They all ended up in spam, so it will be a bit interesting to sort out how. The originating address was one I had just created on a machine we were configuring. I have followed your work since the elm days. Keep it up. Posted by: Douglas Harris at October 9, 2007 9:27 AMI think Gmail works very well. But I don't understand the following: In my spam folder I curently have 9 mails received in the last 20 days from apparently the same spam sender, e.g.: marcel.janssen@gmail.com (Joanne Anthony), Finally gmail will delete them, that's ok, but THE QUESTION IS: as gmail software knows the address spam comes from - why gmail doesn't block the address? And to avoid errors, simply to give me access to the list of recently banned senders whose spam I obtained. Posted by: mk1 at October 28, 2007 1:38 AMGmail has been deleting SPAM in my account... dunno why, when my SPAM directory reaches ~900 it gets to 800 again... and I keep receiving SPAM. Posted by: foobar at December 4, 2007 1:47 PMGMail rules! In fact, Google rule. So much of what they do I am not only interested in but use on a regular basis and am constantly amazed at how they are improving they're systems. GMail is just case in point. I was just bitching about how many spam messages I get but forgot to bare in mind that ALL of the 400+ messages (accumulated over a month) in the spam box were Spam. Only 3 had managed to find its way into my normal box. But to keep the system working relies on all of us doing our part and notifying Google of any undetected spam. And remember, as long as you haven't selected 'Delete Forever' then anything you do delete accidentally is stored in your bin folder for 30days. Pi Posted by: Seaniepie at April 24, 2008 4:06 AMone MAJOR problem: I'm reading my gmail mail with MS outlook. my only real concern over the spam is that i wish that those that are legitimate spam would simply not go to my spam filter period. It is nice that the messages are deleted after 30 days, but for those of us who don't check it every once in awhile, they will have in the hundred's of spam and going through that is painstaking work and something i am simply not enjoying doing, i would rather the occasional spam going to the filter then having the filter fill itself hourly as it is. Posted by: Ben at September 30, 2008 6:18 AMI have a lot to say, but ...
I do have a comment, now that you mention it!
|
![]()
Search
Find just the answers you seek from among our 1700+ free tech support articles by using our Lijit search engine.
Help!
Subscribe to
Ask Dave Taylor!
Free Updates!
Sign up and get free weekly updates and special offers on books, seminars, workshops and more.
Articles and Reviews
Auctions and Online Shopping Blogs and RSS Feeds Building Web site traffic Business and Management Cell Phones and Mobile Phones CGI Scripts and Web Site Programming Computer and Internet Basics d) None of the Above HTML and CSS Mac OS X Help MySpace, Facebook, Twitter and Social Network Help Pay Per Click (PPC) Search Engine Optimization Shell Script Programming Sony PSP, MP3 Players, Etc. The Writing Business Unix and Linux Help Video Game Tips and Help Windows Help
Recent Entries
Join the List!
Book Links
|