I was Googling for an old Banditry post yesterday, as part of a discussion about that new ‘people lie about their drinking’ study. Eventually I found it, only to discover that I’d linked to a (London) Times article, and that therefore the paywall had ruined the whole thing (curiously, even though the Times now shows unregistered users the headline, lede and first sentence for new articles, it completely screws up on old ones). So I more or less gave up on the post [*].
While Googling, I was rather surprised to discover the amount of content that I’d apparently written about the availability, acquisition and applications of various medicinal substances (link will hopefully die in a few weeks as Google updates itself). I briefly considered the possibility that in a fit of poverty and/or drunkenness I’d decided to set up my own online pharmacy, then remembered that I’m based in the country with some of the tightest controls on prescription drugs in the world so that would be rather silly. Rather, I’d been hacked.
I’ve been blogging for more than a decade now, so this isn’t the first pharmaceutical spam I’ve experienced: but it is the most insidious.
The hacked pages are tainted only to Google’s crawler – if you or I or anyone in the world who isn’t Google’s crawler click through to them, then they appear as originally intended, both in the browser and in the source code. So the spam-merchant gets to benefit from my PageRank without doing suspicious things to my traffic stats or making suspicious links appear on my actual site, which has been the giveaway for previous hacks. They also, cleverly, didn’t go for an out-and-out hack of all pages, so if you google for “johnband.org” or search the site for a specific term that isn’t drug-related, then you’ll get the correct result, with no indication that some of the pages (mostly tag pages, category pages, and monthly archives) exist to Google only as pharmaceutical billboards.
Conveniently, Google has a funky-cool Fetch As Google tool, described here by their engineer Matt Cutts, which allows you to see exactly what the Googlebot sees when it crawls any page on your site. Sticking the affected pages into the tool confirmed that Google was still seeing them as pharmaceutically compromised. And that they’d been this way since last July-August.
So, I junked my evening plans and settled in for a night of Fun With WordPress, PHP, MySQL, Unix Permissions And Google. Which is my favourite sort of fun, obviously.
Hope, cruelly dashed
The top Google hit on the pharma hack, from blogger Chris Pearson, was an extremely well-written summary which described an identical problem to mine. “Result!”, I thought. So I followed Chris’s steps, only to discover that absolutely none of them worked. The trouble is, the pharma spammers are cleverer bastards than I’d thought: once the tricks of your trade are readily visible with a quick Google, you’re at a disadvantage. And Chris’s post dates from April 2010. Three years of malware evolution later, although his macro-level points are still worth a read, the actual techniques described were way obsolete.
So I Googled a bit more, mostly finding sites that repeated Chris’s solution, but eventually happening upon a couple of write-ups that were closer to my problem – at least, in the sense that they also found none of the things Chris describes, nor any of the obvious hacks I’ve experienced before like a doctored .htaccess file or dodgy-sounding access permissions, nor any changes to the main WordPress database… at least, none of the changes that anyone has noted online.
The most comprehensive, although perhaps the least comprehensible unless you’re ultra-techie, was a post from Shaun Green from February 2012. Short version: the current version of the hack creates php files with names that sound like they should be real WordPress files, and distributes them throughout your WordPress install but especially in the wp-includes folder so that they’re almost impossible to find and tell apart from real WordPress files without doing extremely nerdy things.
I’m not really a deep-level coder, so following all of Shaun’s steps sounded rather painful. And my install didn’t contain the specific filenames he lists (https.php and class-sftp.php), so I would have had to literally retrace his steps rather than just following his conclusions.
Instead, I went for a slightly lower-tech option. Everything in the wp-includes folder is a standard WordPress file, which shouldn’t have changed since installation. The same is true for everything in the wp-admin folder, and for everything in the WordPress root folder except for wp-config.php (which I’d already checked to make sure it wasn’t compromised). So I downloaded a vanilla version of WordPress 3.5.1, deleted everything from my install except for the wp-content folder (where themes, plugins and pictures are stored) and wp-config.php, and then copied the untainted files across.
One quick check on Fetch As Google later and – hurrah! – the pharmaceuticals had all disappeared. Now all I need to do is wait for Google to update its cache, and everything should be back to normal.
While the problem was solved in the short term, it clearly wasn’t solved in the long term: I’d started with an uncorrupted WP installation, and someone had managed to corrupt it. So – after doing the basic password changing things, obviously – I installed Wordfence and Better WP Security. If you host your own WordPress blog (anything that isn’t on wordpress.com), then so should you. Wordfence is the equivalent of an antivirus program for your WordPress install; Better WP Security automates a whole bunch of handy lockdown and obfuscation tricks. Wordfence threw up a few vaguely suspicious files associated with some of the themes that were installed, so I deleted them; everything was then fine.
I’ve also set up Google Alerts that notify me if any new content appears on johnband.org containing various spammy keywords (the usual suspects), which obviously won’t be much use until the current spam-buggered content is removed, but will then allow me to kill any future infections before they’ve completely ruined my search results. I’ll update this post in the event that anything else occurs. If I remember, I’ll update it in a couple of months if nothing else has occurred, since zero is sometimes a helpful data point.
TL/DR: Was quite painful, could have been much worse. If this happens to you I definitely recommend the “for every folder which shouldn’t have changed since WP was installed, delete the folder and reinstall” approach, although do check the database and fix any issues there first. And set up the security things even if this hasn’t happened to you yet, because it probably will.
[*] Short version of post I was going to write: epidemological studies into alcohol-related harm are also based on self-reported consumption, so while it’s likely that everyone drinks more than they say, it’s also likely that alcohol is correspondingly less bad for you than those studies have shown, by about the same margin – unless we can come up with valid reasons why people would underestimate in one sort of study but not the other. Also, News Corporation are still unimaginably bad at digital strategy.