正文
经济学人下载:分子生物学 蛋白质的奥秘(3)
One common protein-contact prediction is that, if the side chain of one member of a pair of amino acids brought close together by folding is long, then that of the other member will be short, and vice versa.
一种比较普通的蛋白质接触预测就是,如果一对氨基酸的一个侧链折叠后很长,那么里一个侧链就会很短,反之亦然。
In other words, the sum of the two lengths is constant.
换句话说,两个氨基酸侧链的总长度是恒定的。
If you have but a single protein sequence available, knowing this is not much use.
只知道一个蛋白质中氨基酸分子的排列顺序没有太大用处。
Recent developments in genomics, however, mean that the DNA sequences of lots of different species are now available.
不过基因组学最近的进展表示,现在许多不同物种的DNA分子的顺序是可以获取的。
Since DNA encodes the amino-acid sequences of an organism’s proteins, the composition of those species’ proteins is now known, too.
因为DNA分子编码生物体蛋白质中氨基酸分子的顺序,那么这些物种的蛋白质的组成也就可以获悉。
That means slightly different versions, from related species, of what is essentially the same protein can be compared.
这意味着,功能相同,但属于近缘物种的,在组成上稍有不同的蛋白质,可以进行比较。
The latest version of Rosetta does so, looking for co-variation (eg, in this case, two places along the length of the proteins’ chains where a shortening of an amino acid’s side chain in one is always accompanied by a lengthening of it in the other).
最新版本的Rosetta所做的就是寻找蛋白质的相关变异。(比如:在这个例子中,沿着蛋白质链长度方向的两个地方,如果一个氨基酸的侧链变短了,另一个氨基酸的侧链就会变长)。
In this way, it can identify parts of the folded structure that are close together.
用这种方法可以辨别紧密接触的折叠氨基酸的结构。
Though it is still early days, the method seems to work.
虽然现在是初期阶段,不过这个方法还是有用的。
None of the 614 structures Dr Baker modelled most recently has yet been elucidated by crystallography or NMR, but six of the previous 58 have.
Baker博士近期所建立的614种蛋白质模型中,没有被晶体学或者磁共振所证实的,但是之前的58个模型中有6个被证实。
In each case the prediction closely matched reality.
在每一个模型中,预测的蛋白质结构与实际蛋白质分子的结构相差无几。
Moreover, when used to “hindcast” the shapes of 81 proteins with known structures, the protein-contact-prediction version of Rosetta got them all right.
此外,应用最新版本的Rosetta对已知结构的81个蛋白质进行“追算”,结果表明,蛋白质接触预测的蛋白质结构都是正确。
There is a limitation, though.
然而它是有局限性的。
Of the genomes well-enough known to use for this trick, 88,000 belong to bacteria, the most speciose type of life on Earth.
已熟知的,并且适用这种方法的基因组中,有88000种属于地球上最多的物种-细菌。
Only 4,000 belong to eukaryotes—the branch of life, made of complex cells, which includes plants, fungi and animals.
仅仅有4000中属于真核生物,生命的另一种形式。它是由复杂的细胞组成,有动物、植物、真菌。
There are, then, not yet enough relatives of human beings in the mix to look for the co-variation Dr Baker’s method relies on.
然而,在这个大家族中,没有足够多的与人类具有亲缘关系的物种,所以无法研究相关变异,而这是 Baker博士的方法所需要的条件。
Others think they have an answer to that problem.
对于这个问题,其他人认为他们有解决方法。
They are trying to extend protein-contact prediction to look for relationships between more than two amino acids in a chain.
他们尝试扩展蛋白质接触预测的范围,在一条链中寻找不止2个氨基酸的相互关系。
This would reduce the number of related proteins needed to draw structural inferences and might thus bring human proteins within range of the technique.
这将会减少结构上不同的相关蛋白质的数目,并可能因此将人类蛋白质引入技术范围内。
But to do so, you need a different computational approach.
但是如果这么做的话,就需要一个不同的计算方法。
Those attempting it are testing out the branch of artificial intelligence known as deep learning.
想要尝试的人正在对以深度学习为人熟知的人工智能的分支技术进行检测。
Deep learning employs pieces of software called artificial neural networks to fossick out otherwise-abstruse patterns.
深度学习采用一些称为人工神经网络的软件来搜寻其他深奥的模式。
It is the basis of image- and speech-recognition programs, and also of the game-playing programs that have recently beaten human champions at Go and poker.
它是图像和语音识别程序的基础,也是最近在围棋和纸牌游戏中打败人类冠军的游戏程序的基础。
Jianlin Cheng, of the University of Missouri, in Columbia, who was one of the first to apply deep learning in this way, says such programs should be able to spot correlations between three, four or more amino acids, and thus need fewer related proteins to predict structures.
哥伦比亚的密苏里州的大学的程建林最先把深度学习应用到这个方面。他说,这个程序能够找到三个、四个或者更多氨基酸之间的相互关性。并且需要更少的相关的蛋白质分子来预测其结构。
Jinbo Xu, of the Toyota Technological Institute in Chicago, claims to have achieved this already.
芝加哥丰田技术研究所的徐金波声称现在已经达到这种技术水平。
He and his colleagues published their method in PLOS Computational Biology, in January, and it is now being tested.
他和他同事在一月份将这一方法发表在《PLOS计算生物学》上,现处于测试阶段。
If the deep-learning approach to protein folding lives up to its promise, the number of known protein structures should multiply rapidly.
对于蛋白质分子折叠,如果深度学习的方法达到了预期的效果,那么已知蛋白质结构的数目应该会迅速增加。
More importantly, so should the number that belong to human proteins.
更为重要的是,对人类蛋白质结构的了解也会增加。
That will be of immediate value to drug makers.
对于制药公司来说将会有即时的好处。
It will also help biologists understand better the fundamental workings of cells—and thus what, at a molecular level, it truly means to be alive.
这也将会帮助生物学家更好的理解细胞的基本功能。如此一来,意味着分子水平的研究真正开始了。