Category Archives: Statistrickery

The Facebook decline paper is a disgrace to Princeton’s name

The obvious answer to the question “why won’t Facebook decline by 80% by the end of December this year” is “because obviously it won’t, what kind of idiot would even claim it would?”. It’s the leading social network in all age groups, and between July and December 2013 total user numbers only fell by 3%.

However, if you’re reading the papers today, you might be forgiven for thinking otherwise. The Daily Mail is the worst offender, because obviously the Daily Mail is the worst offender, but plenty of derp is being thrown left, right and centre. I’m quoting the Mail piece, because hell, why not:

Faebook is heading for a catastrophic decline and could lose 80% of its users by 2015, researchers warned today.

(yes, Faebook in the lede is the Daily Mail’s typo. QUALITY JERNALISMS!)

The researchers in question are proper academics, more or less: they’re two PhD candidates at Princeton, John Cannarella and Joshua A Spechler. They’ve written a paper which takes a standard epidemology model, the SIR (susceptible, infectious and removed) model, and tries to apply this to the spread of social networks. It’s not a bad choice in theory: it’s generally accepted that social networks spread virally; and the SIR model applies to diseases which are fatal or immunising (so once you’ve got over it, you can’t get it again, like measles [*]) – most people who give up on a network don’t come back, so fair play.

There are a couple of obvious [**] early alarm bells: the paper is not peer-reviewed, and Cannarella and Spechler are studying for PhDs neither in the epidemiology department nor the digital cultures department. They are mechanical and aeronautical engineers. Working entirely outside your discipline doesn’t necessarily disqualify you from doing good work… but it makes the need for review by someone who does know the discipline even more important than usual.

The global headlines are based on our stupid typo

But what does it say? Well, the paper does make the claim reported in the Daily Mail, on page 6 of the full document:

Extrapolating the best fit into the future shows that Facebook is expected to undergo rapid decline in the upcoming years, shrinking to 20% of its maximum size by December 2014.

Unfortunately, this claim is solely due to the paper not undergoing peer review, or apparently proof-reading, before being made publicly available. Page 7 says:

Extrapolating the best fit model into the future suggests that Facebook will undergo a rapid decline in the coming years, losing 80% of its peak user base between 2015 and 2017.

This second conclusion fits with the charts and data presented in the paper. So nobody at all is actually predicting the 80% decline by December 2014; the journalists reporting on it are gibbering halfwits, and the writers are monumentally half-arsed for failing to spot such a basic and disastrous mistake in such a short piece of work.

But also, the premise of what we’re doing is stupid

What about the “losing 80% of peak user base by 2017″ conclusion, then? This is indeed what the authors’ model predicts.

Unfortunately, the authors’ model is not entirely robust.

My TL:DR summary of the paper’s methodology is “we modelled MySpace’s growth and decline against the number of Google searches for MySpace, and then applied the same model to the number of Google searches for Facebook”.

If you think this is a ridiculous way of doing things, given the niche, geographically and age-group limited status of MySpace versus the universality of Facebook, and given the different corporate natures of the two organisations, you are correct.

There is an excellent piece in The Week which covers these flaws in the paper’s central conceit very well (keywords: no Murdoch; profitable; less spam; universal; vast corporate cash war chest).

But also also, we’ve completely juked the stats

However, if the models line up, then – subject to critiquing the assumptions – there might be something of value in the paper, right? Well, no. This is where things move from “hmm, I’m not sure this fits with existing research on epidemiology or social networking” to “oh, go and stick your heads in a fire”.

The model used is not actually the SIR model. It is a model called irSIR, which the authors have invented (page 3). They have used this because the SIR model doesn’t work. They don’t cite any epidemiology research when justifying their irSIR model, just a “common-sense” theory about how social network users behave, coupled with a couple of descriptive papers about online network usage.

They don’t use any of the work on social ties that digital cultures theorists have spent the last 20 years developing. Nor do they use any of the work on epidemiology beyond the SIR model as detailed in first-year undergraduate classes. Because hell, where would be the fun in that?

Strangely enough, the model they have custom-built to fit their data on MySpace’s decline fits their data on MySpace’s decline almost perfectly.

However, there’s a new problem. The decline thesis doesn’t really fit the data on Google searches for ‘Facebook’, which remain at 2011 levels and don’t show much of a declining trend at all (the dotted bit is Google’s projection; feel free to ignore everything after January 2014 if you’re sceptical):
facebook_google_trends

The authors get past this problem in a way that is truly ingenious: despite not having any evidence that the increase in October 2012 is fake, they scale back all post-October data by 0.8x. As a result, they end up with this beautiful chart, which not only matches the shape of the MySpace curve, but does so over a similar time period and is even steeper:
facebook_curve_rigged

Strangely enough, following the modification to make their data on Facebook line up almost exactly with the data on MySpace, the projected decline for Facebook lines up almost exactly with the recorded decline for MySpace.

In short, this paper is incredibly sloppy, is based on a flawed premise, and only works because the data has been tortured until it confessed.

If the authors apply the same principles to mechanical and aeronautical engineering that they apply to social media uptake, then I’d be fucking reluctant to get in a plane that either of them had had anything to do with.

[*] A small proportion of people who get diseases like measles are at risk of getting them again, which more complicated models have been built by actual epidemiologists to allow for.
[**] If you are used to reading academic papers. Not, apparently, if you are a journalist.

How To Calibrate A Booze Up So You’re Halfway Likely To Die

So Dan Nolan was wondering how much beer it would take to kill you.

It turns out the answer (LD50) is 42.5 cans in an hour, or 61 cans in a 24-hour day for a normal drinker, or 96.5 cans in a 24-hour day for a heavy drinker who hasn’t yet developed serious liver damage.

But don’t take my word for it, the model is here for your edification: Too Much Beer Will Kill You Just As Sure As None At All.xlsx

Thought on Nate Silver and election projection

This is technically true (random quote from blog commenter, but one which reflects a lot of educated-people-who-know-about-stats opinion on the Silver model):

Silver’s analysis (which I happen to accept) won’t be contradicted (or proven) in any way by tomorrow’s outcome. Either result is accounted for in the model. People seem not to understand that.

However, it’s a silly thing to say. If you craft a model in such a way that you are publicly on record as saying that one candidate in a two-horse race has a 90% chance of winning, and he loses, then you will find it very hard to avoid looking like a tit, even if your stats were absolutely correct and the result is just a one-in-ten piece of bad luck for your model.

The only way in which you could plausibly avoid the tail-risk of looking like a tit would be to focus a sizeable part of your commentary on that tail-risk, why your model shouldn’t be taken as an out-and-out prediction, and why you might be wrong, rather than focusing on the reasons that you think are underlying the 90%-likely outcome.

Mr Silver has gone very strongly for the “focusing on the underlying reasons” option, presumably because he’d much take a 90% chance of being The Awesome Pollster Who Correctly Tipped The Election with a 10% chance of being That Tit, than a 100% chance of being That Boring Wonk Who Explained Why We Shouldn’t Pay Too Much Attention To His Numbers.

Which is entirely rational, given the risk/reward matrix he faces, but does mean that anyone who suggests we should refrain from calling him That Tit if the 10% scenario comes through is missing the point.

(tenuously relatedly, I’m delighted to see Ezra Klein dredging up this fine work of speculative psephology and poll-bludgeoning)

The sun is, most likely, still gonna shine in November

After a massively high-spending recall campaign, a controversial Republican state governor has held onto power with a slightly increased majority (while losing control of the state senate). Naturally, the oh-so-left-wing US media are spinning this as Terrible Democrat Defeat, Disaster Due for November, etc.

To highlight the fact that this spin is absolute dingoes’ kidneys, it’s been pointed out that the Walker campaign spent $7 for every $1 his opponent could muster, which is not really a feasible plan for the November election (no, not even for someone with Mitt Romney’s wallet).

This figure is slightly unfair: the difference wasn’t as stark between interest groups which didn’t donate directly to the campaign. Not much less stark, though. Some quick-and-dirty analysis on CNN’s handy “who gave what” piece shows that we have:

Walker $30.5m
Named R lobby groups $16.9m
Estimated R lobby groups* $0.8m
Total R $48.2m (71% of total)

Barrett $3.9m
Named D lobby groups $14.9m
Estimated D lobby groups* $0.7m
Total D $19.4 (29% of total)

So the Republicans only need to manage to outspend the Democrats by 2.5:1 in November. That’ll be nice and easy for them.

They’ll also need a charismatic candidate who’s become popular among independents (17% of Walker voters currently say they’ll go for Obama in November) through being fiscally conservative but avoiding the social culture war. That’ll be nice and easy for them.

* The “estimate” is where I’ve split the outside donations that aren’t named to specific groups between the parties according to the split of named groups. If you ignore it instead, you get 72%/28%.

Alcohol-related stupidity

Alcohol is famous for its ability to cause stupidity. As with most other drugs, this property doesn’t solely apply to chronic abusers – it also applies to policymakers and opinion writers, even the sober ones. Drugs and alcohol are second only to immigration as a leading cause of utterly stupid articles.

Now, I’ve written plenty on this blog in the past about how nannyist fools lie about the levels of drink-related violence and disease, and adopt completely the wrong policies for cutting alcohol consumption even if it were a good idea to do so.

So, in the interests of balance, today I’m looking at a piece from Harry’s Place that opposes a minimum price for alcohol. Now, there’s nothing wrong with opposing a minimum price for alcohol, mostly because it’s an attempt to solve a problem that doesn’t exist. But the piece in question manages to seize upon all the stupidest grounds for doing so that it possibly could.

Its starting point is that alcohol is price-inelastic:

Certain products – the classic example being alcohol – do not respond in the typical way to price changes in the market. A price increase does not lead to a significant drop in demand. People simply grin and bear the price increase.

There’s only one small problem: this is bollocks. According to actual evidence (Table 7), the price elasticity for alcohol is around -1; in other words, a 1% rise in price leads to a 1% fall in consumption. While the various studies vary in terms of total magnitude, all show that price elasticity is significant. A rise in the price of alcohol does, empirically, lead to a cut in alcohol consumption.

Impressively, the article goes on to get worse:

Far from reducing alchol-related social ills, arguably, it may even have the opposite effect. It will make social drinking at pubs even more expensive relative to wholesale drinking. People will end up drinking more at home, quaffing back the artificially inflated (but still cheaper) supermarket booze in the environment most likely to encourage them to destroy their livers, beat up their spouses and neglect their children, and to cause accidents at work even more than before.

The problem here is that alcohol minimum pricing proposals that have been made for the UK by even vaguely serious organisations have been talking about a minimum price to the consumer.

Let’s assume the minimum price at retail is set at 50p a unit. If I’m a manufacturer of gin, I don’t have to worry whether Tesco are paying me 50p a unit when they buy a truckload of gin from me to sell in their shops, and I don’t have to worry whether Mitchells & Butlers are paying me 50p a unit when they buy a truckload of gin from me to sell in their pubs. Rather, it’s Tesco’s responsibility not to sell you a bottle of gin for less than GBP14, and it’s M&B’s responsibility not to sell you a shot of gin for less than 50p.

Now, at the moment you can buy a bottle of gin for way under GBP14 in any supermarket, but you certainly can’t get a shot of gin for under 50p in any pub. The same would apply to beer as well: a 50p/unit minimum price would ban pubs from charging less than GBP1.25 for a pint of Kronenbourg, which none of them currently do, while banning supermarkets from charging less than GBP1 for a tin of Kronenbourg, which all of them currently do.

In other words, there’d be a significant impact on supermarket prices, but no impact on pub prices. So there’d be a significant decline in home consumption, but no decline in pub consumption. Which, if you believe that there’s a binge drinking problem with evil effects that are made worse by drinking at home (not, of course, that any evidence is produced for this one either), would be a good outcome.

Rather depressingly, Tim cites the HP piece as an example of lefties understanding economics. Which I suppose is true, in that it’s using the cargo-cult sense of economics that glibertoonians often base their arguments on – relying solely on half-remembered theory from the sixth form, missing obvious theoretical points out (whether because they’re inconvenient or because you’re slapdash, who can say?), not testing your theory against empirical data because you can’t be bothered, not doing sums because they’re hard, and coming up with clownish bullshit that even a GCSE economics teacher would grade as “F minus, see me”. In that sense, it’s absolutely spot on.

No, the Old Spice campaign hasn’t failed

There seems to be a meme floating around the social marketing world at the moment that the super-notorious Old Spice mass media and viral ad campaign has failed to drive sales, despite grabbing mindshare and winning awards. This seems to be based on a Brandweek article that isn’t available on their website (w00t new media marketing excellence, not), but that has been excerpted here. It says:

[S]ales of the featured product—Red Zone After Hours Body Wash—aren’t necessarily tracking with that consumer appeal: In the 52 weeks ended June 13, sales of the brand have dropped 7 percent according to SymphonyIRI. (That amount excludes those rung up at Walmart.) P&G execs were not available to comment.

SymphonyIRI get their sales data direct from the tills in all major US supermarkets except Walmart (who figure they’re big enough that they’ve got more to lose than to gain from sharing their data with competitors), so it’s pretty reliable. I wish I had access myself – I have done for projects in the past, and damn it’s good, but a subscription costs millions of dollars…

However, even without access to the data, we can easily show that the Brandweek piece is absolutely irrelevant. First, a quote from Forbes last Thursday:

Total sales for Old Spice body wash at supermarkets, drugstores and mass market retailers excluding Wal-Mart were up 16.7% in the 52-week period ending June 13, according to SymphonyIRI Group, a Chicago-based market research firm.

In other words, assuming both articles are accurate, a specific sub-brand of Old Spice has fallen in sales, but the overall brand has risen in sales. Since the campaign primarily promotes Old Spice as a master brand (I didn’t even know it was plugging Red Zone After Hours Body Wash, and nor did you), the Brandweek article is somewhere between misled and misleading in its selective data usage.

Even if Forbes has somehow got its numbers wrong, and the Brandweek data is representative of the brand’s overall performance, this still wouldn’t show the campaign had failed. The IRI data covers a 52 week period – it’s comparing Jul 2009-Jun 2010 to Jul 2008-Jun 2009. The interesting comparisons for a breaking campaign (the ads started in February, and the social campaign’s been building since) are week-on-week (wk20 2010 vs wk20 2009) and month-on-month (Jun 2010 vs Jun 2009), not averages for the whole year. If sales fell in the second half of 2009 and were gradually revived this year by the campaign, the 52 week data wouldn’t show this at all.

The most awesome thing about IRI data if you’re a marketing-stats-data-geek (guilty) is that it’s updated daily. So Procter & Gamble and its agency, Wieden & Kennedy, will know exactly, day on day how sales have reacted. They (well, they plus SymphonyIRI, Unilever, Colgate, and their respective marketing agencies) are the only ones currently in a position to say whether the campaign has worked. Until and unless they, or SymphonyIRI, or a naughty leaker working for a company with access to IRI’s database, tell us what the week-on-week comparisons are, we’ve got little idea whether or not the campaign has succeeded.

Well, except that Old Spice had been in decline as a brand for a very long time – so if there has been a 17% rise in 52-week sales as the Forbes piece suggests, then that’s a good indication that the rise in sales since the campaign launched in February is larger still.

Lesson: while everyone wants smug marketers to fail (yes, of course you do), a campaign that captures the public imagination to the degree that Old Spice has is bloody unlikely to fail to drive sales, at the very least in the short term. Relatedly, most people don’t understand data.

Missing the point on booze marketing, again

So there’s yet another alcohol-bashing study out. This one says [*] that sports stars’ drunk behaviour has no impact on young adults’ drinking behaviour (that’s ‘over 18s’, or ‘legally responsible adults’), but that alcohol marketing does.

This isn’t surprising. Of course alcohol marketing makes people drink more of the brand being marketed, otherwise people wouldn’t do it. But we need people to research things that seem obvious from time to time, because sometimes we find out that what we think we know is wrong. So, decent study, worth funding, all good.

But:

“There’s always been a link made between alcohol and sport… the detrimental effects of that, in the same way as there was previously between cigarettes and sport,” Professor Kolt said.

Err, no. The difference is that smoking, full stop, is harmful. Alcohol consumption below 30 units (300ml of alcohol; 15 pints of bitter) a week has not been demonstrated to do harm, even compared to not drinking at all, and you need to get up to 50+ units before the risks of morbidity or mortality are substantially higher than for non-drinkers.

Unless the study shows that the impact of alcohol marketing is to encourage people aged 18-22 to drink more than 30 units a week, then it’s only of interest to alcohol marketers, and not to policymakers. And if they had found that, they’d most certainly have put it in the press release…

The problem with this kind of alcohol research (i.e. social science on consumption behaviour, rather than epidemiological science on health outcomes) is that nearly all the work commissioned and published by public bodies is carried out by miserable puritans who hate the concept of anyone ever having any kind of fun. This is because researchers who don’t hate the concept of anyone ever having any kind of fun work for drinks companies instead: they pay better, you get a free bar after work, and you don’t have to hang out with people from the first group.

But drinks companies tend to keep their studies private, because they don’t want their rivals to see them…

Therefore, the general pattern in the public arena is that some people will create a report which actually shows mildly interesting things about how people like to consume alcohol – but because of the prejudices of the people who’re writing it, the abstract and the PR make groundless accusations about negative impacts on disorder and health. And then the media reports the groundless accusations as “a study has concluded that”, and the public debate is ratcheted slightly further towards miserable puritanism.

[*] I have no idea what the study says. The above is what the press release says; the press release features quotes from and has been approved by the study’s main authors, and is what will shape the public debate.

Worried about stabbings? Don’t be

Here’s a nice report by a real statistician on how London’s low murder rate is nothing to worry about unless you’re a gibbering paranoid ignorant fool.

Unsurprisingly, it’s received almost no media play at all. I mean, what news value is there in a study proving that the ‘OMG t3H knife crime!!!!’ narrative is bollocks and that there’s nothing to worry about…?

Fraser Nelson: ignorance and paranoia, in one simple package

RBS:

As part of our implementation of FSA guidelines around Anti-Money Laundering activities, we introduced questions on Politically Exposed Persons as part of our account opening procedures.

Genius financial columnist Nelson:

what on earth is a Politically Exposed Person?

The FSA anti-money-laundering guidelines, which have been in force for three years:

customers who, by virtue of their position in public life, are vulnerable to corruption

This isn’t earth-shattering stuff; any professional or financial services firm has to ask those questions of its clients, and RBS would be remiss for not doing so. Taking UK political party membership as an indication of PEP status was technically incorrect, but fair enough I reckon – if you’re a member of a UK political party, you’re either corrupt, stupid or massively over-optimistic…

Nelson then goes off into an insanely paranoid rant about banks asking people whether they occupy any political offices. Oh noes! A majority-government-owned institution is asking me whether I’m a political party member. The gulags surely do beckon…

If you’re worried about this, you’re a moron.

Dumbing down, and that’s just the fogeyish commentators

In England and Wales up until 1986, there were two sorts of exam a child could take at the compulsory school leaving age of 16.

CSEs weren’t very academically rigorous and were aimed at kids who weren’t planning to take further academic qualifications. O-levels were more academically rigorous; they were aimed at kids who were planning to take A-levels at 18, and possibly head on to university.

After 1986, the system was – officially – changed to have a single qualification at 16 encompassing both streams of kids, known as the GCSE.

However, almost all GCSEs are divided into two tiers of paper, Foundation and Higher, which are separate exams based on related but different syllabuses [*]. All kids who would have taken an O-level in a particular subject, and some who would have taken a CSE, take the Higher paper; the highest grade possible in the Foundation paper is a C.

(I note, in passing, that 1986 fulfils the criterion of ‘being a year that happened after the current generation of crusty old farts finished compulsory education’.)

For more or less the following 23 years non-stop, commentators born before 1970 have failed to point out the fact that nothing has really changed – whether this is due to their idiocy or their dishonesty is not quite clear – and instead have had great fun comparing Foundation GCSE (i.e. rebadged CSEs) papers with O-level papers. They discover, shock-horror-ish-ly, that the paper the thick kids take is harder than the paper the bright kids used to take.

For example, this post from the Libertoonian Alliance links to a GCSE physics paper, fails to point out what the foundation/higher distinction means, and labels it ‘for intelligent 16 year olds’. Since the foundation questions, answered by the dumb kids, make up the first 16 pages of the test, most readers will end up skimming those, assuming they represent ‘bright kid’ questions, and hence that the paper is moronic.

But it isn’t, for the ‘higher’ section at least [**]. It’s a reasonable test for bright-ish kids of theoretical knowledge on energy and electricity, with a couple of fluffier ‘social impact of science’ questions thrown in for a total of approximately 2 marks.

So, a fail, then. But the attempts to mislead get better. In the comments section, the post author, a (but not the, one would assume) David Davis, says:

In the “further” physics papers (only for the really bright people doing what is called “separate” sciences

This is what is known as ‘grossly misleading’.

Later, he says:

Yes [this linked paper is the equivalent of an O-level paper]. Passing this, is what those who will go on to do “A” levels in the sciences will have to do.

This is what is known as ‘grossly misleading’.

In a final and epic demonstation of his elite science-y skills, he says:

I don’t know [why the paper spells sulfur correctly]. It has sort of crept in the last year or so. I thought like you do that these people were supposed to hate America, but they adopt its spelling of “sulfur”.

If you’re going to mouth off about the ignorance and evils of people setting science exams, perhaps you might want to check the conventions actual scientists have agreed with each other on how to spell words. Or not.

There might, possibly, be an actual case that there has been a dumbing-down in educational standards (damn unlikely given relative skill and literacy measures by cohort, but possible). This paper, dramatically and massively, certainly doesn’t present it, and the dishonest codgers putting it forward do themselves no favours.

[*] English words don’t require Latin plurals. Fact.

[**] with the exception of the rather bizarre question 4D (the actual answer, given the levels of ignorance of more or less everything held by people who object to planning applications for wind turbines, is ‘all of the above’; but I’ve no idea what answer the exam board wants).