

InterviewSolution
Saved Bookmarks
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
1. |
Which of the following is untrue about SCOP?(a) It is a database for comparing and classifying protein structures(b) It is constructed almost entirely based on manual examination of protein structures(c) The proteins are grouped into hierarchies of classes, folds, superfamilies, and families(d) The SCOP families consist of proteins having low sequence identity (>30%) |
Answer» The correct answer is (d) The SCOP families consist of proteins having low sequence identity (>30%) To explain I would say: The SCOP families consist of proteins having high sequence identity (>30%). Thus, the proteins within a family clearly share close evolutionary relationships and normally have the same functionality. The protein structures at this level are also extremely similar. |
|
2. |
Which of the following statements about SCOP is incorrect regarding its features?(a) Proteins with the same shapes but having little sequence or functional similarity are placed in different super families, and are assumed to have only a very distant common ancestor(b) Proteins having the same shape and some similarity of sequence and/or function are placed in ‘families’, and are assumed to have a closer common ancestor(c) SCOP was created in 1994 in the Centre of Protein Engineering and the University College London(d) It aims to determine the evolutionary relationship between proteins |
Answer» The correct answer is (c) SCOP was created in 1994 in the Centre of Protein Engineering and the University College London The best explanation: SCOP, Structural Classification of Proteins, was created in 1994 in the Centre of Protein Engineering and the Laboratory of Molecular Biology. It was maintained by Alexey G. Murzin and his colleagues in the Centre for Protein Engineering until its closure in 2010 and subsequently at the Laboratory of Molecular Biology in Cambridge, England. |
|
3. |
While analysing motif sequences, what is the major disadvantageous feature of PROSITE?(a) The database constructs profiles to complement some of the sequence patterns(b) The functional information of these patterns is primarily based on published literature(c) Some of the sequence patterns are too short to be specific(d) Lack of specificity about probability and variation and relation between them |
Answer» The correct answer is (c) Some of the sequence patterns are too short to be specific The best explanation: The major pitfall with the PROSITE patterns is that some of the sequence patterns are too short to be specific. Rest of the options are advantages. The problem with these short sequence patterns is that the resulting match is very likely to be a result of random events. Overall, PROSITE has a greater than 20% error rate. Thus, either a match or non-match in PROSITE should be treated with caution. |
|
4. |
Which of the following about the Gibbs sampler is untrue?(a) It is a statistical method for finding motifs in sequences(b) It is dissimilar to the principle of the EM method(c) It searches for the statistically most probable motifs(d) It can find the optimal width and number of given motifs in each sequence |
Answer» The correct answer is (b) It is dissimilar to the principle of the EM method Explanation: It is another statistical method for finding motifs in sequences is the Gibbs sampler. The method is similar in principle to the EM method described above, but the algorithm is different. A combinatorial approach of the Gibbs sampler and MOTIF may be used to make blocks at the BLOCKS Web site. |
|
5. |
In which of the following multipurpose packages Gibbs sampling algorithm is used?(a) Consensus(b) BEST(c) AlignACE(d) PhyloCon |
Answer» Correct choice is (c) AlignACE The best explanation: The Gibbs sampling algorithm can identify multiple motifs in a sequence in a sequence set using iterative masking procedure. It is used in AlignACE whereas BEST is a suite of four motif discovery tools integrated in a graphical user interface. Also, Consensus program finds motifs in a set of unaligned sequences and PhyloCon builds on this framework by modeling conservation across orthologous genes from multiple species. |
|
6. |
If a good sampling of sequences is _______ the number of sequences is _________ and the motif structure is ________ it should, in principle, be possible to obtain frequencies highly representative of the same motif in other sequences also.(a) available, sufficiently large, not too complex(b) unavailable, sufficiently large, not too complex(c) unavailable, sufficiently small, not too complex(d) available, sufficiently large, too complex |
Answer» The correct option is (a) available, sufficiently large, not too complex To elaborate: The more abundant amino acids already found in that column would probably be favored. Thus, if a good sampling of sequences is available, the number of sequences is sufficiently large, and the motif structure is not too complex, it should, in principle, be possible to obtain frequencies highly representative of the same motif in other sequences also (Henikoff and Henikoff 1996). |
|
7. |
Which of the following is incorrect about evolution?(a) The macromolecules can be considered molecular fossils that encode the history of millions of years of evolution(b) The building blocks of these biological macromolecules, nucleotide bases, and amino acids form linear sequences that determine the primary structure of the molecules(c) DNA and proteins are products of evolution(d) The molecular sequences barely undergo changes |
Answer» The correct choice is (d) The molecular sequences barely undergo changes Easiest explanation: During this time period, the molecular sequences undergo random changes, some of which are selected during the process of evolution. As the selected sequences gradually accumulate mutations and diverge over time, traces of evolution may still remain in certain portions of the sequences to allow identification of the common ancestry. |
|
8. |
For what type of sequences Gibbs sampling is used?(a) Closely related sequences(b) Distinctly related sequences(c) Distinctly related sequences that share common motifs(d) Closely related sequences that share common motifs |
Answer» The correct option is (c) Distinctly related sequences that share common motifs For explanation I would say: Often, distantly related sequences that share common motifs cannot be readily aligned. For example, the sequences for the helix-turn-helix motif in transcription factors can be subtly different enough that traditional multiple sequence alignment approaches fail to generate a satisfactory answer. For detecting such subtle motifs, more sophisticated algorithms such as expectation maximization (EM) and Gibbs sampling are used. |
|
9. |
In the web-based program MEME, the computation is a _____ step procedure.(a) one(b) two(c) three(d) four |
Answer» The correct option is (b) two For explanation I would say: In constructing a probability matrix, it allows multiple starting alignments and does not assume that there are motifs in every sequence. Also, the computation is a two-step procedure that includes generation of sequence motif and finding highest score. |
|
10. |
Gibbs is a web-based program that uses the Gibbs sampling approach to look for _____ gap-free segments for either DNA or protein sequences.(a) short, partially conserved(b) long, partially conserved(c) long, conserved(d) short, not conserved |
Answer» Right choice is (a) short, partially conserved To explain I would say: Gibbs sampling approach to look for short, partially conserved gap-free segments for either DNA or protein sequences. To ensure accuracy, more than twenty sequences of the exact same length should be used. |
|
11. |
In a type of probability, analysis is to calculate the odds score for one event OR a second event, or of a series of events. In this case, the odds scores are _______(a) multiplied(b) subtracted(c) added and multiplied(d) added |
Answer» Right option is (d) added To explain: An example is the calculation of the odds score for a given sequence alignment using a series of alternative PAM scoring matrices. The alignment scores are calculated in log odds units and then converted into odds scores. |
|
12. |
If the purpose is to calculate the probability of one event AND a second event, the odds scores for the events are _________(a) added(b) multiplied(c) multiplied and added(d) subtracted |
Answer» Right choice is (b) multiplied To elaborate: An example is the calculation of the odds of an alignment of two sequences from the alignment scores for each of the matched pairs of bases or amino acids in the alignment. The odds scores for the pairs are multiplied. Usually, the log odds score for the first pair is added to that for the second, etc., until the scores for every pair have been added. |
|
13. |
Which of the following feature of Bayesian methods is the disadvantage of it?(a) A length and distance that gives the highest overall probability may be determined(b) They are used to calculate evolutionary distance(c) Computationally Bayesian methods are better(d) A specific mutational model is required |
Answer» The correct choice is (d) A specific mutational model is required To elaborate: One disadvantage of the Bayesian approach is that a specific mutational model is required, whereas other methods, such as the maximum likelihood approach, can be used to estimate the best mutational model as well as the distance. Computationally, however, the Bayesian method is much more practical. |
|
14. |
By whom and when were the Bayesian methods applied first?(a) Smith-Waterman, 1981(b) Agarwal and States, 1996(c) Smith-Waterman, 1996(d) Agarwal and States, 1981 |
Answer» Right choice is (b) Agarwal and States, 1996 The best I can explain: Agarwal and States, in1996, have applied Bayesian methods to provide the best estimate of the evolutionary distance between two DNA sequences. For example, sequences of the same length that have a certain level of mismatches. |
|
15. |
Which of the following is not a base of RNA?(a) Thymine (T)(b) Adenine (A)(c) Cytosine (C)(d) Guanine (G) |
Answer» Right option is (a) Thymine (T) To explain I would say: RNA structures can be described at three levels as in proteins: primary, secondary, and tertiary. The primary structure is the linear sequence of RNA, consisting of four bases, adenine (A), cytosine (C), guanine (G), and uracil (U). |
|
16. |
The pre-alignment independent programs fare __________ for predicting long sequences.(a) slight better(b) much better(c) a bit worse(d) much worse |
Answer» Correct answer is (d) much worse To explain I would say: For small RNA sequences such as tRNA, both subtypes can achieve very high accuracy (up to 100%). This illustrates that the comparative approach is consistently more accurate than the ab initio one. |
|
17. |
The number of possible global alignments between two sequences of length N is _____(a) (frac{2^N}{sqrt{πN}})(b) (frac{2^{2N}}{sqrt{πN}})(c) (frac{2^{(N-1)}}{sqrt{πN}})(d) (frac{2^{2N}}{sqrt{N}}) |
Answer» Right choice is (b) (frac{2^{2N}}{sqrt{πN}}) For explanation I would say: By the total number of permutations and combinations option b gives the accurate number of possible global alignments between two sequences of length N. For two sequences of 250 residues this is 10^149. |
|
18. |
In CATH, Structural domain separation is carried by ___________(a) manual comparison only(b) computer programs only(c) human expertise only(d) a combined effort of a human expert and computer programs |
Answer» Correct choice is (d) a combined effort of a human expert and computer programs Explanation: CATH classifies proteins based on the automatic structural alignment program SSAP as well as manual comparison. Structural domain separation is carried out also as a combined effort of a human expert and computer programs. Individual domain structures are classified at five major levels: class, architecture, fold/topology, homologous superfamily, and homologous family. |
|
19. |
___________ of covariation can be ___________ to the RNA structure and functions.(a) Any lack, deleterious(b) Any lack, benign(c) Any abundance, deleterious(d) Any inadequacy, advantageous |
Answer» Correct answer is (a) Any lack, deleterious For explanation I would say: Based on this rule, algorithms can be written to search for the covariation patterns after a set of homologous RNA sequences are properly aligned. The detected correlated substitutions help to determine conserved base pairing in a secondary structure. |
|
20. |
Which of the following is untrue regarding Ab Initio–Based Methods?(a) This type of method predicts the secondary structure based on a single query sequence(b) This type of method predicts the secondary structure based on a multiple query sequence(c) It measures the relative propensity of each amino acid belonging to a certain secondary structure element(d) The propensity scores are derived from known crystal structures |
Answer» Right option is (b) This type of method predicts the secondary structure based on a multiple query sequence Best explanation: Examples of ab initio prediction are the Chou–Fasman and Garnier, Osguthorpe, Robson (GOR) methods. The ab initio methods were developed in the 1970s when protein structural data were very limited. The statistics derived from the limited data sets can therefore be rather inaccurate. However, the methods are simple enough that they are often used to illustrate the basics of secondary structure prediction. |
|
21. |
RNA structures can be experimentally determined using _____(a) X-ray crystallography techniques only(b) NMR techniques only(c) X-ray crystallography or NMR techniques(d) Gel electrophoresis |
Answer» The correct answer is (c) X-ray crystallography or NMR techniques Explanation: However, the two approaches are extremely time consuming and expensive. As a result, computational prediction has become an attractive alternative. Option Gel electrophoresis, here, becomes irrelevant as it comes to the structure of RNA. |
|
22. |
If prediction accuracy can be represented using a ______ the ______ programs score roughly 20% to 60% depending on the length of the sequences.(a) multiple parameter, ab initio–based(b) single parameter, ab initio–based(c) multiple parameter, comparative–based(d) single parameter, comparative–based |
Answer» Right answer is (b) single parameter, ab initio–based Easiest explanation: As mentioned, the scores depend on the length of the sequences. Generally speaking, the programs perform better for shorter RNA sequences than for longer ones. |
|
23. |
Which of the following is untrue about Ab initio prediction?(a) The limited knowledge of protein folding forms the basis of ab initio prediction(b) The ab initio prediction method attempts to produce all-atom protein models based on sequence information alone without the aid of known protein structures(c) The ab initio prediction method attempts to produce all-atom protein models based on sequence information alone with some aid of known protein structures(d) The perceived advantage of this method is that predictions are not restricted by known folds and that novel protein folds can be identified |
Answer» The correct choice is (c) The ab initio prediction method attempts to produce all-atom protein models based on sequence information alone with some aid of known protein structures Easy explanation: Alongside the advantages, because the physicochemical laws governing protein folding are not yet well understood, the energy functions used in the ab initio prediction are, at present, rather inaccurate. The folding problem remains one of the greatest challenges in bioinformatics today. |
|
24. |
In dynamic programming, in ab initio methods, the use of a dot plot can be effective in finding a ____ secondary structure in a ________ molecule.(a) multiple, large(b) single, large(c) single, small(d) multiple, small |
Answer» The correct answer is (c) single, small The explanation: Mostly, The use of a dot plot can be effective in finding a single secondary structure in a small Molecule. However, if a large molecule contains multiple secondary structure segments, choosing a combination that is energetically most stable among a large number of possibilities can be a daunting task. |
|
25. |
The presence of ______ signal peptides can significantly compromise the prediction _______ because the programs tend to confuse hydrophobic signal peptides with membrane helices.(a) hydrophobic, accuracy(b) hydrophobic, error(c) hydrophilic, accuracy(d) hydrophilic, error |
Answer» Correct option is (a) hydrophobic, accuracy Explanation: Predicting transmembrane helices is relatively easy. The accuracy of Some of the best predicting programs, such as TMHMM or HMMTOP, can exceed 70%. To minimize errors, the presence of signal peptides can be detected using a number of specialized programs and then manually excluded. |
|
26. |
Because residues in the same aligned position are assumed to have the same secondary structure, any inconsistencies or errors in prediction of individual sequences can be corrected using a majority rule.(a) True(b) FalseI have been asked this question in an interview for job.The question is from Protein Secondary Structure Prediction for Globular Proteins in chapter Secondary Structure Prediction of Bioinformatics |
Answer» The correct answer is (a) True Explanation: By aligning multiple sequences, information of positional conservation is revealed. This homology based method has helped improve the prediction accuracy by another 10% over the second-generation methods. |
|
27. |
The secondary structure prediction methods can be either ab initio based, which make use of single sequence information only, or homology based, which makes use of multiple sequence alignment information.(a) True(b) FalseI have been asked this question during a job interview.Query is from Protein Secondary Structure Prediction for Globular Proteins in division Secondary Structure Prediction of Bioinformatics |
Answer» The correct choice is (a) True To explain I would say: The ab initio methods, which belong to early generation methods, predict secondary structures based on statistical calculations of the residues of a single query sequence. The homology-based methods do not rely on statistics of residues of a single sequence, but on common secondary structural patterns conserved among multiple homologous sequences. |
|
28. |
Which of the following is incorrect regarding sequence homology?(a) Two sequences can homologous relationship even if have do not have common origin(b) It is an important concept in sequence analysis(c) When two sequences are descended from a common evolutionary origin, they are said to have a homologous relationship(d) When two sequences are descended from a common evolutionary origin, they are said to share homology |
Answer» Right choice is (a) Two sequences can homologous relationship even if have do not have common origin For explanation I would say: Homologous relationships are more certain when the sequences have common evolutionary origin. A related but different term is sequence similarity, which is the percentage of aligned residues that are similar in physiochemical properties such as size, charge, and hydrophobicity. |
|
29. |
Which of the following doesn’t describe PAM matrices?(a) This family of matrices lists the likelihood of change from one amino acid to another in homologous protein sequences during evolution(b) There is presently no other type of scoring matrix that is based on such sound evolutionary principles as are these matrices(c) Even though they were originally based on a relatively small data set, the PAM matrices remain a useful tool for sequence alignment(d) It stands for Percent Altered Mutation |
Answer» Right option is (d) It stands for Percent Altered Mutation Easy explanation: PAM stands for Percent Accepted Mutation. In this, each matrix gives the changes expected for a given period of evolutionary time, evidenced by decreased sequence similarity as genes encoding the same protein diverge with increased evolutionary time. |
|
30. |
Which of the following statements about SUPERFAMILY database is incorrect regarding its features?(a) Sequences can be submitted raw or FASTA format(b) Sequences must be submitted in FASTA format only(c) It searches the database using a superfamily, family, or species name plus a sequence, SCOP, PDB or HMM ID’s(d) It has generated GO annotations for evolutionarily closed domains and distant domains |
Answer» Correct option is (b) Sequences must be submitted in FASTA format only Best explanation: SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP super families. Sequences can be amino acids, a fixed frame nucleotide sequence, or all frames of a submitted nucleotide sequence. Up to 1000 sequences can be run at a time. |
|
31. |
Which of the following is untrue about Protein substitution matrices?(a) They are significantly more complex than DNA scoring matrices(b) They have the N x N matrices of the amino acids(c) Protein substitution matrices have quite important role in evolutionary studies(d) They are significantly quite less complex than DNA scoring matrices |
Answer» The correct answer is (d) They are significantly quite less complex than DNA scoring matrices For explanation I would say: Protein substitution matrices are significantly more complex than DNA scoring matrices. Proteins are composed of twenty amino acids, and physico-chemical properties of individual amino acids vary considerably. A protein substitution matrix can be based on any property of amino acids: size, polarity, charge, hydrophobicity. |
|
32. |
Which of the following does not describe PAM matrices?(a) These matrices are used in optimal alignment scoring(b) It stands for Point Altered Mutations(c) It stands for Point Accepted Mutations(d) It was first developed by Margaret Dayhoff |
Answer» Correct option is (b) It stands for Point Altered Mutations To explain I would say: PAM stands for Point Accepted Mutations. PAM matrices are calculated by observing the differences in closely related proteins. One PAM unit (PAM1) specifies one accepted point mutation per 100 amino acid residues, i.e. 1% change and 99% remains as such. |
|
33. |
Which of the following is wrong in case of substitution matrices?(a) They determine likelihood of homology between two sequences(b) They use system where substitutions that are more likely should get a higher score(c) They use system where substitutions that are less likely should get a lower score(d) BLOSUM-X type uses logarithmic identity to find similarity |
Answer» The correct choice is (d) BLOSUM-X type uses logarithmic identity to find similarity The best explanation: BLOSUM-X type identifies sequences that are X% similar to the query sequence i. e. score 54 corresponds to 54% similarity hence reducing the complexity of the output and giving the similarity in percentage. Also, these matrices are popular in bioinformatics due to their speed and accuracy. |
|
34. |
Which of the following is false in case of the database SMART and its algorithm?(a) Contains HMM profiles constructed from manually refined protein domain alignments(b) Alignments in the database are built based on tertiary structures whenever available or based on PSI-BLAST profiles(c) Alignments are further checked but not refined by human annotators before HMM profile construction(d) SMART stands for Simple Modular Architecture Research Tool |
Answer» Correct choice is (c) Alignments are further checked but not refined by human annotators before HMM profile construction To elaborate: Alignments are further checked and refined by human annotators before HMM profile construction. Protein functions are also manually curated. Thus, the database may be of better quality than Pfam with more extensive functional annotations. Compared to Pfam, The SMART database contains an independent collection of HMMs, with emphasis on signaling, extracellular, and chromatin-associated motifs and domains. Sequence searching in this database produces a graphical output of domains with well-annotated information with respect to cellular localization, functional sites, super-family, and tertiary structure. |
|
35. |
Which of the following is not an advantage of Statistical models’ methods in analyzing protein motifs?(a) Sequence information is preserved from a multiple sequence alignment and expresses it with probabilistic models(b) Statistical models allow partial matches and compensate for unobserved sequence patterns using pseudo-counts(c) Statistical models have stronger predictive power than the regular expression based approach, even when they are derived from a limited set of sequences(d) The comparative flexibility is less in case of these methods when compared to regular expressions methods |
Answer» Right choice is (d) The comparative flexibility is less in case of these methods when compared to regular expressions methods The best explanation: The major limitation of regular expressions is that this method does not take into account sequence probability information about the multiple alignment from which it is modeled making them less flexible. If a regular expression is derived from an incomplete sequence set, it has less predictive power because many more sequences with the same type of motifs are not represented. Unlike regular expressions, position-specific scoring matrices (PSSMs), profiles, and HMMs preserve the sequence information from a multiple sequence alignment and express it with probabilistic models. |
|
36. |
Motifs that can form α/β horseshoes conformation are rich with which protein residue?(a) Proline(b) Arginine(c) Valine(d) Leucine |
Answer» Right answer is (d) Leucine To elaborate: Specific pattern of Leucine residues, strands form a curved sheet with helices on the outside. Leucine-rich repeats (LRRs) are 20-29-residue sequence motifs present in a number of proteins with diverse functions. The primary function of these motifs appears to be to provide a versatile structural framework for the formation of protein-protein interactions. |
|
37. |
Which of the common structural motifs are described wrongly?(a) β-hairpin – adjacent antiparallel strands(b) Greek key – 4 adjacent antiparallel strand(c) β-α-β – 2 parallel strands connected by helix(d) β-α-β – 2 antiparallel strands connected by helix |
Answer» Right choice is (d) β-α-β – 2 antiparallel strands connected by helix To elaborate: In motif, two adjacent β parallel strands are connected by an α helix from the C-terminus of strand 1 to the N-terminus of strand. Most protein structures that contain parallel beta-sheets are built up from combinations of such β-α-βmotifs. |
|
38. |
In terminologies related to regular expressions which of the following is false about terms and operators?(a) Terms are strings or substrings(b) Operators combine terms and expressions(c) Operators do not have precedence(d) Operators have precedence like arithmetic operators |
Answer» Correct choice is (c) Operators do not have precedence Easy explanation: For harmonious, efficient and error-free functioning of the matching preocess, operators have precedence in order to set the priority of the operations to be carried out during the alignment. |
|
39. |
Which of the following best defines regular expressions?(a) They are made up of terms, operators and modifiers(b) They describe string or set of strings to find matching patterns(c) They are strictly restricted to alignment and corresponding score(d) They consist of set of rules for the connotations of various amino acid residues |
Answer» The correct option is (b) They describe string or set of strings to find matching patterns For explanation: Regular expressions are powerful notable algebra that describe string or set of strings to find matching patterns. Pattern matching is defined as true or false in answer or outcome. And it is true that they are made up of terms, operators and modifiers but they are terminologies further used in matching process. |
|
40. |
For motif scanning which of the following programs or databases is for regulated sites curated from scientific literature?(a) ENSEMBL(b) ORegAnno(c) MAST(d) Clover |
Answer» Right choice is (b) ORegAnno Best explanation: Clover identifies overrepresented motifs in protein sequences whereas; MAST allows users to scan different databases for matches to motifs. ENSEMBL is another online genomic sequence repository which also includes online tools for data mining as well as BLAST searches. |
|
41. |
The matrices PAM250 and BLOSUM62 contain _______(a) positive and negative values(b) positive values only(c) negative values only(d) neither positive nor negative values, just the percentage |
Answer» Right choice is (a) positive and negative values For explanation I would say: These matrices contain positive and negative values, reflecting the likelihood of each amino acid substitution in related proteins. Using these tables, an alignment of a sequential set of amino acid pairs with no gaps receives an overall score that is the sum of the positive and negative log odds scores for each individual amino acid pair in the alignment. |
|
42. |
Which of the following are not related to Needleman-Wunsch alignment algorithm?(a) Global alignment programs use this algorithm(b) The output is a positive number(c) Small changes in the scoring system can produce a different alignment(d) Changes in the scoring system can produce the same alignment |
Answer» Right answer is (d) Changes in the scoring system can produce the same alignment To elaborate: In general, global alignment programs use the Needleman-Wunsch alignment algorithm and a scoring system that scores the average match of an aligned nucleotide or amino acid pair as a positive number. Hence, the score of the alignment of random or unrelated sequences grows proportionally to the length of the sequences. In addition, there are many possible different global alignments depending on the scoring system chosen, and small changes in the scoring system can produce a different alignment. |
|
43. |
Who were the inventors of this method?(a) Smith-Waterman(b) Margaret Preston(c) Gibbs and McIntyre(d) Needleman-Wunsch |
Answer» Correct answer is (c) Gibbs and McIntyre The best explanation: The first computer aided sequence comparison is called “dot-matrix analysis” or simply dot-plot. The first published account of this method is by Gibbs and McIntyre (1970 The diagram, a method for comparing sequences. Eur. J. Biochem 16: 1-11). |
|
44. |
Which of the following is true for EMBOSS Dottup?(a) Allows you to specify threshold(b) Doesn’t allow you to specify threshold(c) Doesn’t allow you to specify window size(d) If all cells in the window are identity, it colors in some specific cells in the window |
Answer» Correct choice is (b) Doesn’t allow you to specify threshold The explanation: The EMBOSS Dottup doesn’t allow you to specify threshold but allows you to specify window size. Also, if all cells in the window are identity, it colors in all the cells in the window. |
|
45. |
Many studies have demonstrated that the distribution of similarity scores assumes a peculiar shape that resembles a highly skewed normal distribution with a long tail on one side. The distribution matches the _______(a) Gumble elective value distribution(b) Gumble extreme void distribution(c) Gumble end value distribution(d) Gumble extreme value distribution |
Answer» Right answer is (d) Gumble extreme value distribution To explain I would say: The mentioned Distribution pattern matches the Gumble extreme value distribution for which a mathematical expression is available. This means that, given a sequence similarity value, by using the mathematical formula for the extreme distribution, the statistical significance can be accurately estimated. |
|
46. |
Which of the following cannot be related to multiple sequence alignment?(a) Many conserved and functionally critical amino acid residues can be identified in a protein multiple alignment(b) Multiple sequence alignment is also an essential prerequisite to carrying out phylogenetic analysis of sequence families and prediction of protein secondary and tertiary structures(c) Multiple sequence alignment also has applications in designing degenerate polymerase chain reaction (PCR) primers based on multiple related sequences(d) This method does not contribute much to degenerate polymerase chain reaction (PCR) primers creation |
Answer» Right answer is (d) This method does not contribute much to degenerate polymerase chain reaction (PCR) primers creation The explanation: In practice, heuristic approaches are most often used. Multiple sequence alignment has applications in designing degenerate (PCR) primers based on multiple related sequences. |
|
47. |
Which of the following is untrue regarding T-Coffee?(a) It stands for Tree-based Consistency Objective Function for alignment Evaluation(b) It performs progressive sequence alignments as in Clustal.(c) The global pair wise alignment is not performed using the Clustal program.(d) The local pair wise alignment is generated by the Lalign program, from which the top ten scored alignments are selected |
Answer» Right choice is (c) The global pair wise alignment is not performed using the Clustal program. The explanation: The global pair wise alignment is performed using the Clustal program. The main difference is that, in processing a query, T-Coffee performs both global and local pair wise alignment for all possible pairs involved. The collection of local and global sequence alignments is pooled to form a library. The consistency of the alignments is evaluated. |
|
48. |
Which of the following is a part of the statistical test of sequences?(a) An optimal alignment between two chosen sequences is obtained at the end(b) Unrelated sequences of the same length are then generated through a randomization process(c) Unrelated sequences of the different length are then generated through a randomization process(d) Related sequences of the same length are then generated through a randomization process |
Answer» Correct option is (b) Unrelated sequences of the same length are then generated through a randomization process To explain I would say: Unrelated sequences of the same length are then generated through a randomization process in which one of the two sequences is randomly shuffled. And the next step is that a new alignment score is computed for the shuffled sequence pair. |
|
49. |
Progenitor sequences represented by the ______ branches of the tree are derived by alignment of the _______ sequences.(a) outer, outermost(b) inner, outermost(c) inner, innermost(d) outer, innermost |
Answer» The correct answer is (b) inner, outermost For explanation I would say: Progenitor sequences represented by the inner branches of the tree are derived by alignment of the outermost sequences. These inner branches will have uncertainties where positions in the outermost sequences are dissimilar. |
|
50. |
Which of the following is untrue about DCA?(a) It stands for Divide-and-Conquer Alignment(b) It works by breaking each of the sequences into two smaller sections(c) The breaking points during the process are determined based on regional similarity of the sequences(d) If the sections are not short enough, further divisions are restricted as well |
Answer» Right choice is (d) If the sections are not short enough, further divisions are restricted as well The best I can explain: This is a web-based program that is in fact semi exhaustive because certain steps of computation are reduced to heuristics. If the sections are not short enough, further divisions are carried out. When the lengths of the sequences reach a predefined threshold, dynamic programming is applied for aligning each set of subsequences. The resulting short alignments are joined together head to tail to yield a multiple alignment of the entire length of all sequences. |
|