正文
经济学人下载:人工智能"投身"情报界,让你无处遁形(2)
Yet what is possible in public health is not always so easy in national security. Western intelligence agencies must contend with laws governing how private data may be gathered and used. In its paper, GCHQ says that it will be mindful of systemic bias, such as whether voice-recognition software is more effective with some groups than others, and transparent about margins of error and uncertainty in its algorithms. American spies say, more vaguely, that they will respect "human dignity, rights, and freedoms". These differences may need to be ironed out. One suggestion made by a recent task-force of former American spooks in a report published by the Centre for Strategic and International Studies (CSIS) in Washington was that the "Five Eyes" intelligence alliance—America, Australia, Britain, Canada and New Zealand—create a shared cloud server on which to store data.
然而,在公共卫生领域可行的事情在国家安全领域并不总是那么容易做到。西方情报机构必须应对有关如何收集和使用私人数据的法律。GCHQ在其论文中表示,它会注意系统偏见,比如语音识别软件对某些群体是否比其他群体更有效,以及算法的误差和不确定性的界限是否透明。美国间谍含糊表示他们将尊重“人的尊严、权利和自由”。这些分歧可能需要解决。在华盛顿战略与国际研究中心(CSIS)最近发布的一份报告中,前美国特工小组提出了一个建议:“五眼”情报联盟——美国、澳大利亚、英国、加拿大和新西兰——可以创建一个用于存储数据的共享云服务器。
In any case, the constraints facing AI in intelligence are as much practical as ethical. Machine learning is good at spotting patterns—such as distinctive patterns of mobile-phone use—but poor at predicting individual behaviour. That is especially true when data are scarce, as in counterterrorism. Predictive-policing models can crunch data from thousands of burglaries each year. Terrorist attacks are much rarer, and therefore harder to learn from.
无论如何,人工智能在智能领域面临的限制既是现实的,也是道德的。机器学习擅长识别模式——比如手机使用的独特模式——但在预测个人行为方面却很差。在数据匮乏的情况下尤其如此,比如在反恐行动中。预测警务模型可以处理每年数千起盗窃案的数据。恐怖袭击要罕见得多,因此也更难从中吸取教训。
That rarity creates another problem, familiar to medics pondering mass-screening programmes for rare diseases. Any predictive model will generate false positives, in which innocent people are flagged for investigation. Careful design can drive the false-positive rate down. But because the "base rate" is lower still—there are, mercifully, very few terrorists—even a well-designed system risks sending large numbers of spies off on wild-goose chases.
这种罕见带来了另一个问题,正在考虑对罕见疾病进行大规模筛查的医生对这个问题很熟悉。任何预测模型都会出现误报,无辜的人被标记为调查对象。精心的设计可以降低误报率。但是由于“基本比率”仍然较低——幸运的是,恐怖分子很少——即使是一个设计良好的系统也有可能使大量间谍在徒劳的追捕中丧命。
And those data that do exist may not be suitable. Data from drone cameras, reconnaissance satellite and intercepted phone calls, for instance, are not currently formatted or labelled in ways that are useful for machine learning. Fixing that is a "tedious, time-consuming, and still primarily human task exacerbated by differing labelling standards across and even within agencies", notes the CSIS report. That may not be quite the sort of work that would-be spies signed up for.
而那些确实存在的数据可能并不合适。例如,来自无人机摄像头、侦察卫星和截获的电话的数据,目前尚未格式化或标记为有利于机器学习的方式。CSIS的报告指出,解决这一问题是“一项乏味、耗时且主要还是人工的任务,各机构甚至各机构内部不同的标签化标准加剧了这一问题”。这可能不是那些想成为间谍的人愿意做的工作。