Optimizing weighting factors for a linear combination of terms in a scoring function is a crucial step for success in developing a threading algorithm. Usually weighting factors are optimized to yield the highest success rate on a training dataset, and the determined constant values for the weighting factors are used for any target sequence. Here we explore completely different approaches to handle weighting factors for a scoring function of threading. Throughout this study we use a model system of gapless threading using a scoring function with two terms combined by a weighting factor, a main chain angle potential and a residue contact potential. First, we demonstrate that the optimal weighting factor for recognizing the native structure differs from target sequence to target sequence. Then, we present three novel threading methods which circumvent training dataset-based weighting factor optimization. The basic idea of the three methods is to employ different weighting factor values and finally select a template structure for a target sequence by examining characteristics of the distribution of scores computed by using the different weighting factor values. Interestingly, the success rate of our approaches is comparable to the conventional threading method where the weighting factor is optimized based on a training dataset. Moreover, when the size of the training set available for the conventional threading method is small, our approach often performs better. In addition, we predict a target-specific weighting factor optimal for a target sequence by an artificial neural network from features of the target sequence. Finally, we show that our novel methods can be used to assess the confidence of prediction of a conventional threading with an optimized constant weighting factor by considering consensus prediction between them. Implication to the underlined energy landscape of protein folding is discussed.
Cite this work
Researchers should cite this work as follows:
- Yifeng, D., Kihara, D. (2013). Threading Without Optimizing Weighting Factor for the Scoring Function Proteins. Purdue University Research Repository. doi:10.4231/D3PK0719W
The Angle Potential dataset file contains 40 lines. The odd number line indicates the amino acid type, the even number line stores the angle potentials for the corresponding amino acid type.Angle potentials are sampled from a 36*36*36 grid on the tau,theta,tau dimention. An angle triad (tau1,theta1,tau2) corresponds to kth number of the line with the relationship k=36*36*int(theta1/5)+36*int(tau2/10)+int(tau1/10);: The Contact Potential dataset file contains 210 lines. Each line stores the potential for the specific amino acid pair.