The Grey Area

Full disclosure: Duncan Riley was one of the founders of b5media, my current employer. He is no longer with the company. That said, he left before I was hired, and I don’t know him, other than by what information is publicy available. This post is written by me as an individual WordPress developer, not a b5media employee.

Matt Mullenweg, co-founder and one of the lead developers of WordPress:

Vanilla, the popular open source forum software, is now embedding sponsored links in every download so when you install it they’re on your site. This strikes me as a bad idea the same way sponsored themes are, except worse because it’s in the core code.

Photo Matt » Vanilla Sponsored Links

Duncan Riley responded at TechCrunch with How Grey Is Your Valley: Making Money From Open Source. The main point of the post seemed to be that since Matt (via Automattic) is making money off of Open Source software, he has no room to critize others for making money from Open Source software.

This is a complete straw man. Matt wasn’t criticizing people for making money off of Open Source software, he was criticizing the way they were making money in this specific instance: namely, by participating in search engine spamming schemes. Search engine spamming isn’t a grey area — it’s a black area. Duncan tries to lump all money-making activities that leverage Open Source software into a grey area — a position that morally equates spamming with a legitimate enterprise like Automattic. The entire premise of the article is incorrect: Matt wasn’t criticizing the idea of profiting from Open Source software, and the case he was specifically addressing was not a grey area… it was spam.

Duncan makes a few other remarks that deserve a response.

Akismet is a service that relies on the failure of the WordPress code to be able to natively deal with comment spam.

This assumes that an Open Source application which, until recently, was only updated roughly once every 6-9 months, should be able to deal with the comment/trackback spam issue in a way that doesn’t require a lot of maintenance or unduly burden users.

One method of web spam prevention is a CAPTCHA (you know, one of those distorted letters-in-an-image things). Problems with CAPTCHA:

  1. cheap human labor can be used to solve these and enable spamming
  2. computers can technically solve them
  3. Disabled users (or even those with slightly less than perfect eyesight) can’t solve them. Heck, sometimes I can’t solve them, and I have perfect vision!
  4. They place a burden on your users
  5. Does not apply to Trackback/Pingback spam

Another way is by using a complicated anti-spam plugin like Spam Karma 2. Problems:

  1. It’s really complicated
  2. It requires a lot of tweaking
  3. Spammers can see how it works, and can thus circumvent it

And both these solutions share a common flaw: they’re Club solutions. That is, the more people who use them, the less effective they become. If you’re operating the only blog in the world with CAPTCHA and Spam Karma 2 — you’re going to be fairly spam-free. But once enough people use those measures to make it worth spammers’ while to figure out a way to beat those measures, they’ll spend the effort. So these methods require constant updates (tricker CAPTCHA, better Spam Karma 2 rules) to work.

Akismet works for three reasons: spam reporting is communal, so when one site reports spam, all other sites benefit. Another reason that it works is that its sauce is secret. Spammers don’t know why a particular comment was blocked, so they can’t easily learn how to work around that block. The last reason is that Akismet is a remote service, and thus is being constantly updated for all users. No one needs to upgrade their blog or a plugin in order to get the latest level of Akismet spam protection.

Akismet was created because Open Source locally-run anti-spam measures have shortcomings, and because Matt knew that he could build a service that didn’t have those shortcomings. If a blog-level anti-spam solution without such shortcomings presented itself, I have no doubt that Matt would have it in WordPress in an instant. And if he didn’t, well, he’s not the only one with commit access to WordPress. I know I’d put it in, and I don’t have the conflict of interest Matt is accused of having.

It was not that long ago that Mullenweg was sprung for including in excess of 150,000 spam pages on; it was an honest mistake but as they say, people who live in glass houses…

Matt made a huge mistake by allowing that. I was disappointed in him at the time, both personally and professionally. But he’s learned from that mistake. More than that, he’s lead efforts to warn others about that kind of behavior. That’s what the whole “sponsored themes” thing was about. That’s what the Vanilla comment was about.

I just remain unconvinced that those offering the odd paid link on a WordPress template is any different or worse than Mullenweg, who not only stuffs links to his own blog in every standard install of WordPress […]

The default blogroll is a source of contention, to be sure. Personally, I’d like to replace those links with links to WordPress documentation and resources, and stick the WordPress “credits” and “props” inside the admin, where they won’t be indexed by search engines. But even so, there is a huge difference between paid commercial links and unpaid non-commercial links. Also consider that the default blogroll is “legacy.” That is, it was created in a much more innocent time, when people didn’t consider the SEO ramifications of such an inclusion. To hear Matt, me and several other WP developers weigh in on this issue, watch this video (timecode 109:55).

11 thoughts on “The Grey Area

  1. i agree with your assessment of duncan’s article. i have serious problems with your logic and supplemental points.

    1) how is spam karma a “club solution”? it’s my understanding that he’s improving it (manually) based on user reports. sure, akismet does this automagically, but if humans are used to create spam, humans can also fight spam. even if SK2 alone is a club solution (i don’t agree that it is), SK2 + the akismet plugin for SK2 uses SK2’s far better algorithms to teach akismet, making it a lojack.

    1a) the idea that closed source is more secure is bunk, and if you believed it, you wouldn’t be working on wordpress. most of the tricks that SK2 uses to determine if a comment is spam have to do with the mechanics of when and from where the comment were posted. those characteristics are inherent in the nature of mechanical spam generation. making them publicly known does not provide a workaround.

    1b) spam karma 2 is not complicated, nor does it necessarily require tweaking. it is developed under a different model from wordpress. where wordpress seeks to provide solid defaults, and leave the options to plugins, SK2 (and the flavors of linux i prefer) attempt to give users fine grained control, even if it’s a bit overwhelming at first. nonetheless, if you just activated SK2, it would start emailing you daily, and would work just fine, even if you never visited the options panel.

    2) calling the default blogroll legacy implies that it is still serving some function, or providing some sort of backwards compatibility. Obviously, neither of those are true. matt doesn’t want to remove _himself_ from the blogroll, so he’s not removing any of the ex-developers either.

    3) i agree that the hot nacho incident is irrelevant to the ethics of what lossumo is doing. i don’t agree that it’s irrelevant to duncan’s post: it’s valid to infer from that incident that matt should not be leading the charge (i would disagree, but only by degrees (trust is always subjective). the question it raises is valid).

  2. Mark J, what do you think of the email that Text Link Ads sent our yesterday to its affiliates:
    “As a Text Link Ads affiliate we’re pleased to let you know that we’ve begun using to shorten and secure our affiliate referral links.”

    At least Text Link Ads and PayPerPost are now consistent between themselves in their games of hide and seek with Google.

  3. First off, I don’t want you getting the idea that I’m down on Spam Karma. I contributed code to Spam Karma 1 and was an avid user of Spam Karma until about 18 months ago. I’m arguing against inclusion of such code into core. Duncan’s argument was that code was being omitted from core that could work as a blog-level spam prevention system. I say that’s not the case. Spam Karma is a very workable anti-spam solution for people willing to keep it updated (version upgrades) and tweaked (filter settings).

    how is spam karma a “club solution”?

    You answered this yourself: “he’s improving it (manually) based on user reports.” Users report what the spammers are doing to get around it. The software adapts (new club), and then the spammers get around it.

    the idea that closed source is more secure is bunk, and if you believed it, you wouldn’t be working on wordpress

    I didn’t say anything about security. Obviously I think security by obscurity is a bad policy. This isn’t security, this is spam prevention. It’s trying to make a binary determination on a spam-to-ham continuum. The edge cases can end up in the wrong camp, and if the logic for determining this is exposed, spammers can learn how to exploit those edge cases.

    those characteristics are inherent in the nature of mechanical spam generation. making them publicly known does not provide a workaround.

    Then why does Spam Karma need to be updated?

    spam karma 2 is not complicated, nor does it necessarily require tweaking.

    SK2 (and the flavors of linux i prefer) attempt to give users fine grained control, even if it’s a bit overwhelming at first

    Overwhelming (even at first) = too complicated. Which may be fine for a plugin, but it’s the goal of WordPress to avoid such complications in the core functionality.

    calling the default blogroll legacy implies that it is still serving some function, or providing some sort of backwards compatibility.

    You’re choosing a very specific definition of “legacy” (specially “legacy code”). I meant “legacy” in the normal dictionary definition. Something that has been handed down. It’s there now because it was there before. But I see from Lloyd’s comment that it’s going away, in the manner I had hoped, so the legacy has come to an end.

    it’s valid to infer from that incident that matt should not be leading the charge

    “Converts” can be the best advocates. They’ve seen things from both sides, and they can articulate why the side they switched to is superior.

  4. calling it a club solution seemed pretty down on it.
    which i still disagree with, SK2 doesn’t attempt to deter by its very presence (like a captcha), it’s a silent killer, and if everyone used it, it would decrease spam across the board.

    if your definition of club/lojack were true, then akismet wouldn’t need user reports, and wouldn’t need to “learn”. club/lojack is a difficult enough metaphor without you misusing it.

    i don’t think SK2 belongs in core in its present state either. the UI is bad, but i like what it does, so i put up with it.

    also, converts often become zealots. zealots (for any cause) are bad. i think matt’s in danger of going too far.

  5. this is something i’ve been thinking on for a while. just because akismet uses machine learning to defeat spam does not necessarily mean that it’s a good thing. it still ratchets up the arms race. the next logical step is for machine learning to be used on the spammers end. it’s an evolutionary situation. i would almost prefer to see human-generated spam, since humans can be convinced that spamming is bad. if we leave it to businesses (advertisers weighing the cost of spam production against sales) and machines, there will always be spam.

  6. I think we all know that if Matt had written Spam Karma, it would be in core. And that if Matt had not invented Akismet, it would not be bundled. Since there is no such thing as a 100% effective spam-prevention plugin, Matt’s only method of deciding which is the best is remembering which one he wrote 😉 Saying that he leaves WP open to spam so he can profit from Akismet is just silly, because the only way not to get spam is to switch off commenting and delete all comment-related files.

    I don’t see anything about changing the default blogroll on Trac or on the ideas thread so to be honest I’ll believe it when I see it.

  7. Mark, thanks for sharing your thoughts.

    When we last looked at using parts of SK2 in core, there were license issues as it is not GPL. Not sure if that’s still a problem. If I were to imagine things in core though, the only one that seems plausible would be something like the old Hashcash, but I’m not sure if making JS a requirement for comments is a good decision.

    that girl again, there are platforms now building Akismet into the core, not as a plugin, that I have nothing to do with. That, plus the dozens of plugins and modules available for other systems, should indicate that Akismet is a fair-market, long-term solution to this problem.

  8. I don’t intend “club solution” to be derogatory. Club solutions can provide some level of protection. And not 100% of spammers will adapt to a club, which means that even if a club becomes widely used and widely overcome by spammers, it does still provide an additional hurdle that will slow down spammers. And if you have enough time or talent to keep your club updated to the latest version (or better yet, make your own custom club), you’ll do better than most people.

    And yes, SK2 is still under a custom license that forbids commercial redistribution, making it incompatible with the GPL.

    Hashcash is a decent idea, as it presents a fairly big hurdle for spammers (JS processing capabilities), but it’s not an insurmountable hurdle, it only deals with comment spam — we’re still stuck with content analysis for pings, and it makes Javascript a requirement for commenting.

    This is not all to say that there aren’t ways to improve how we deal with spam. I have an idea for an “Akismet Plus” plugin that would improve Akismet’s spam identification rate. And there are definite UI improvements that could be made to WP’s backend for dealing with despamming (I’ve been whiteboarding this for 2.4, and I think Happy Cog is working on it too). Bottom line: the idea that there’s a killer locally-run spam-killing solution out there that has been held back from WP core in order to benefit Automattic is false.

Comments are closed.