Thursday 8 January 2015

Citizen Four (2): Big Data Is Even More Useless To The NSA Than To Tesco

Security consultants have a vested interest in scaring you, your employer's IT department and the politicians. Mo' fear means mo' money. There are a few good guys out there, but none of them are working for McAfee, Norton, the Big Consultancies or the big software support companies. Those people are in it for the money. Here's a quick test for anyone who claims to be a computer security consultant. Ask them if you need McAfee or Norton running all the time on your computer. If they say YES, thank them for their time, show them the door and check you still have your watch and all your fingers when they've left. (Why you should is the subject of another post.)

Though the security consultants often seem to be against the Sigint community, their interests are more or less exactly aligned. The Signint guys want the Bad Guys off the Internet because the Bad Guys are lost in the noise there, so they spread stories about how the Internet and phone service is their bitch. The security consultants want to sell you their stuff, so they spread stories about how the Internet and phone service is anyone's bitch, but especially the sigint guys'.

(Don't get me wrong. Banks, medical companies and government departments that deal in personal data need to have secure communications and computers and data. They should vet their staff and make it difficult for even employees to sign on to their networks. You need to practice safe computing at home and in cafes, and run your OS firewalls. But like all security, this is to deter amateurs and up the cost of hacking you as against the next person. If the pros want access to your computers, they will get it.)

The hype says that the sigint agencies can search amongst all this data to find "patterns". There are two kinds of patterns. First those obtained by looking at who is contacting who, and who visits what websites, sometimes called "traffic analysis". The idea is that the agencies have certain kinds of pattern-archetypes they prepared earlier, and go looking for those in new traffic records, thus finding terrorists, drug dealers, illegal gambling lines and all sorts of other illegal activity. Because terrorists and drug dealers don't learn and are creatures of habit. This is more-or-less nonsense. Traffic analysis works when listening in on radio traffic between armed forces engaged in industrial-scale warfare (which is where it came from), but unless it's used in conjunction with a list of "numbers (or URL's) of interest", it's more or less useless on a retail scale. The second kind of patterns are about content: word use, photographs and the like. In business, this is known as predictive modelling, and there's a huge problem with it.

Predictive modelling is used to identify people who have a higher probability of doing whatever it is you're selling or supplying: using certain kinds of social services, taking insurance or loans, making insurance claims, defaulting on payments (that's a huge industry in the financial sector called "credit risk"), committing crimes, or redeeming coupons for Pampers. These are almost always events with a very low incidence - very few people do them each month - and a fairly low prevalence - the stock of people who have done them is less than 10%.

A bliding glimpse of the obvious is that if you want to predict a rare event with high probability, it must be with a bunch of indicators which line up just right almost equally rarely. In business, it can be acceptable to use a method that over-predicts wildly, as long as it over-predicts less wildly than the previous method. If you can send only half as many leaflets and get twice the response rate from those letters, you've halved your marketing costs and kept the same revenues. In business that counts as a result. In espionage, that's awful: you have far too many false positives.

The other blinding glimpse of the obvious is that you need enough examples of people doing whatever it is to find and prove patterns with statistical techniques. There just aren't enough terrorists in the UK, and there haven't been enough bombings, to gather that amount of data.

The holy grail of predictive modelling is that the private process has a public choke-point: everyone who does X, must do Y or Z and almost the only reason for doing Y or Z is X, and that Y and Z are both easily observable. Seeing someone come out of a branch of William Hill is pretty good evidence that they laid a bet. As far as anyone knows, there’s no equivalent of William Hill’s for terrorists and other nasties. And even if there was, it wouldn’t last for long, as they will change methods on an erratic basis. This is basic tradecraft that’s been practiced since Sun Tzu ran spies, and it’s not rocket science. You think that bit in The Wire where the bad guys sent each other photographs of clocks wasn’t based on a real example?

No. Nobody is using Big Data techiniques to spot malfeasors and terrorists. They might be trying, but you can rest easy that they will fail. The benefits of Big (commercial) Data are mostly hype, and the benefits of Big (Intelligence) Data are total vapourware. Except, and this is crucial, when the agencies have a bona fide target and can get that target’s phone numbers and other comms identities. That takes humint, not Big Data. Business has had Big Data for a long time, and the best it can do is improve the efficiency of its mail order shots from, oh, 0.2% to 0.6%.

Collecting data on “everyone” is so obviously pointless, un-economic and silly that if the NSA and GCHQ are doing it, or heading that way, the people in charge should be fired. I don’t think the people who run these agencies are stupid. I don’t think they are really doing what the FUD-meisters in the security business suggest they are doing. But I do think they don’t mind that the security FUD-meisters are saying that they can and are.

So was Edward Snowden actually planted on us by the NSA to spread the fear? I don’t think so. Though it would explain why his location wasn’t found within an hour by an operator looking at hotel security footage from across the world, and why he wasn’t shot the next evening by a special forces sniper flown out to Hong Kong on a Gulfstream and guided by imagery of the hotel bedroom taken from one of the smartphone cameras that was turned on automatically from half-way across the world. Because that’s what the NSA and CIA can really do. Right?

Oh. And the scene in Citizen Four where the bullies from GCHQ make the Guardian journalists grind and drill holes in the hard drives to destroy the data? Pure hype. On a modern terabyte-storing 3.5" platter, a single write 0's pass will eradicate the data past all restoring, just as securely as some fancy 7-pass US DoD wiping algorithm. The forensic guys can deal with lightly damaged discs, discs that have lost their controllers and stuff like that, but once you do a standard disk wipe, it's gone. Hit it with a hammer a few times afterwards if you like. But the guys from GCHQ would prefer you believed that they can see past a data shred, so that you didn't bother in the first place. Then they could "recover" the data.

No comments:

Post a Comment