116 + Interview Questions in GENERAL QA IN BIOINFORMATICS

1.	Which of the following is untrue about SCOP?(a) It is a database for comparing and classifying protein structures(b) It is constructed almost entirely based on manual examination of protein structures(c) The proteins are grouped into hierarchies of classes, folds, superfamilies, and families(d) The SCOP families consist of proteins having low sequence identity (>30%)
Answer» The correct answer is (d) The SCOP families consist of proteins having low sequence identity (>30%) To explain I would say: The SCOP families consist of proteins having high sequence identity (>30%). Thus, the proteins within a family clearly share close evolutionary relationships and normally have the same functionality. The protein structures at this level are also extremely similar.

Discussion

2.	Which of the following statements about SCOP is incorrect regarding its features?(a) Proteins with the same shapes but having little sequence or functional similarity are placed in different super families, and are assumed to have only a very distant common ancestor(b) Proteins having the same shape and some similarity of sequence and/or function are placed in ‘families’, and are assumed to have a closer common ancestor(c) SCOP was created in 1994 in the Centre of Protein Engineering and the University College London(d) It aims to determine the evolutionary relationship between proteins
Answer» The correct answer is (c) SCOP was created in 1994 in the Centre of Protein Engineering and the University College London The best explanation: SCOP, Structural Classification of Proteins, was created in 1994 in the Centre of Protein Engineering and the Laboratory of Molecular Biology. It was maintained by Alexey G. Murzin and his colleagues in the Centre for Protein Engineering until its closure in 2010 and subsequently at the Laboratory of Molecular Biology in Cambridge, England.

Discussion

3.	While analysing motif sequences, what is the major disadvantageous feature of PROSITE?(a) The database constructs profiles to complement some of the sequence patterns(b) The functional information of these patterns is primarily based on published literature(c) Some of the sequence patterns are too short to be specific(d) Lack of specificity about probability and variation and relation between them
Answer» The correct answer is (c) Some of the sequence patterns are too short to be specific The best explanation: The major pitfall with the PROSITE patterns is that some of the sequence patterns are too short to be specific. Rest of the options are advantages. The problem with these short sequence patterns is that the resulting match is very likely to be a result of random events. Overall, PROSITE has a greater than 20% error rate. Thus, either a match or non-match in PROSITE should be treated with caution.

Discussion

4.	Which of the following about the Gibbs sampler is untrue?(a) It is a statistical method for finding motifs in sequences(b) It is dissimilar to the principle of the EM method(c) It searches for the statistically most probable motifs(d) It can find the optimal width and number of given motifs in each sequence
Answer» The correct answer is (b) It is dissimilar to the principle of the EM method Explanation: It is another statistical method for finding motifs in sequences is the Gibbs sampler. The method is similar in principle to the EM method described above, but the algorithm is different. A combinatorial approach of the Gibbs sampler and MOTIF may be used to make blocks at the BLOCKS Web site.

Discussion

5.	In which of the following multipurpose packages Gibbs sampling algorithm is used?(a) Consensus(b) BEST(c) AlignACE(d) PhyloCon
Answer» Correct choice is (c) AlignACE The best explanation: The Gibbs sampling algorithm can identify multiple motifs in a sequence in a sequence set using iterative masking procedure. It is used in AlignACE whereas BEST is a suite of four motif discovery tools integrated in a graphical user interface. Also, Consensus program finds motifs in a set of unaligned sequences and PhyloCon builds on this framework by modeling conservation across orthologous genes from multiple species.

Discussion

6.	If a good sampling of sequences is _______ the number of sequences is _________ and the motif structure is ________ it should, in principle, be possible to obtain frequencies highly representative of the same motif in other sequences also.(a) available, sufficiently large, not too complex(b) unavailable, sufficiently large, not too complex(c) unavailable, sufficiently small, not too complex(d) available, sufficiently large, too complex
Answer» The correct option is (a) available, sufficiently large, not too complex To elaborate: The more abundant amino acids already found in that column would probably be favored. Thus, if a good sampling of sequences is available, the number of sequences is sufficiently large, and the motif structure is not too complex, it should, in principle, be possible to obtain frequencies highly representative of the same motif in other sequences also (Henikoff and Henikoff 1996).

Discussion

7.	Which of the following is incorrect about evolution?(a) The macromolecules can be considered molecular fossils that encode the history of millions of years of evolution(b) The building blocks of these biological macromolecules, nucleotide bases, and amino acids form linear sequences that determine the primary structure of the molecules(c) DNA and proteins are products of evolution(d) The molecular sequences barely undergo changes
Answer» The correct choice is (d) The molecular sequences barely undergo changes Easiest explanation: During this time period, the molecular sequences undergo random changes, some of which are selected during the process of evolution. As the selected sequences gradually accumulate mutations and diverge over time, traces of evolution may still remain in certain portions of the sequences to allow identification of the common ancestry.

Discussion

8.	For what type of sequences Gibbs sampling is used?(a) Closely related sequences(b) Distinctly related sequences(c) Distinctly related sequences that share common motifs(d) Closely related sequences that share common motifs
Answer» The correct option is (c) Distinctly related sequences that share common motifs For explanation I would say: Often, distantly related sequences that share common motifs cannot be readily aligned. For example, the sequences for the helix-turn-helix motif in transcription factors can be subtly different enough that traditional multiple sequence alignment approaches fail to generate a satisfactory answer. For detecting such subtle motifs, more sophisticated algorithms such as expectation maximization (EM) and Gibbs sampling are used.

Discussion

9.	In the web-based program MEME, the computation is a _____ step procedure.(a) one(b) two(c) three(d) four
Answer» The correct option is (b) two For explanation I would say: In constructing a probability matrix, it allows multiple starting alignments and does not assume that there are motifs in every sequence. Also, the computation is a two-step procedure that includes generation of sequence motif and finding highest score.

Discussion

10.	Gibbs is a web-based program that uses the Gibbs sampling approach to look for _____ gap-free segments for either DNA or protein sequences.(a) short, partially conserved(b) long, partially conserved(c) long, conserved(d) short, not conserved
Answer» Right choice is (a) short, partially conserved To explain I would say: Gibbs sampling approach to look for short, partially conserved gap-free segments for either DNA or protein sequences. To ensure accuracy, more than twenty sequences of the exact same length should be used.

Discussion

11.	In a type of probability, analysis is to calculate the odds score for one event OR a second event, or of a series of events. In this case, the odds scores are _______(a) multiplied(b) subtracted(c) added and multiplied(d) added
Answer» Right option is (d) added To explain: An example is the calculation of the odds score for a given sequence alignment using a series of alternative PAM scoring matrices. The alignment scores are calculated in log odds units and then converted into odds scores.

Discussion

12.	If the purpose is to calculate the probability of one event AND a second event, the odds scores for the events are _________(a) added(b) multiplied(c) multiplied and added(d) subtracted
Answer» Right choice is (b) multiplied To elaborate: An example is the calculation of the odds of an alignment of two sequences from the alignment scores for each of the matched pairs of bases or amino acids in the alignment. The odds scores for the pairs are multiplied. Usually, the log odds score for the first pair is added to that for the second, etc., until the scores for every pair have been added.

Discussion

13.	Which of the following feature of Bayesian methods is the disadvantage of it?(a) A length and distance that gives the highest overall probability may be determined(b) They are used to calculate evolutionary distance(c) Computationally Bayesian methods are better(d) A specific mutational model is required
Answer» The correct choice is (d) A specific mutational model is required To elaborate: One disadvantage of the Bayesian approach is that a specific mutational model is required, whereas other methods, such as the maximum likelihood approach, can be used to estimate the best mutational model as well as the distance. Computationally, however, the Bayesian method is much more practical.

Discussion

14.	By whom and when were the Bayesian methods applied first?(a) Smith-Waterman, 1981(b) Agarwal and States, 1996(c) Smith-Waterman, 1996(d) Agarwal and States, 1981
Answer» Right choice is (b) Agarwal and States, 1996 The best I can explain: Agarwal and States, in1996, have applied Bayesian methods to provide the best estimate of the evolutionary distance between two DNA sequences. For example, sequences of the same length that have a certain level of mismatches.

Discussion

15.	Which of the following is not a base of RNA?(a) Thymine (T)(b) Adenine (A)(c) Cytosine (C)(d) Guanine (G)
Answer» Right option is (a) Thymine (T) To explain I would say: RNA structures can be described at three levels as in proteins: primary, secondary, and tertiary. The primary structure is the linear sequence of RNA, consisting of four bases, adenine (A), cytosine (C), guanine (G), and uracil (U).

Discussion

16.	The pre-alignment independent programs fare __________ for predicting long sequences.(a) slight better(b) much better(c) a bit worse(d) much worse
Answer» Correct answer is (d) much worse To explain I would say: For small RNA sequences such as tRNA, both subtypes can achieve very high accuracy (up to 100%). This illustrates that the comparative approach is consistently more accurate than the ab initio one.

Discussion

17.	The number of possible global alignments between two sequences of length N is _____(a) (frac{2^N}{sqrt{πN}})(b) (frac{2^{2N}}{sqrt{πN}})(c) (frac{2^{(N-1)}}{sqrt{πN}})(d) (frac{2^{2N}}{sqrt{N}})
Answer» Right choice is (b) (frac{2^{2N}}{sqrt{πN}}) For explanation I would say: By the total number of permutations and combinations option b gives the accurate number of possible global alignments between two sequences of length N. For two sequences of 250 residues this is 10^149.

Discussion

18.	In CATH, Structural domain separation is carried by ___________(a) manual comparison only(b) computer programs only(c) human expertise only(d) a combined effort of a human expert and computer programs
Answer» Correct choice is (d) a combined effort of a human expert and computer programs Explanation: CATH classifies proteins based on the automatic structural alignment program SSAP as well as manual comparison. Structural domain separation is carried out also as a combined effort of a human expert and computer programs. Individual domain structures are classified at five major levels: class, architecture, fold/topology, homologous superfamily, and homologous family.

Discussion

19.	___________ of covariation can be ___________ to the RNA structure and functions.(a) Any lack, deleterious(b) Any lack, benign(c) Any abundance, deleterious(d) Any inadequacy, advantageous
Answer» Correct answer is (a) Any lack, deleterious For explanation I would say: Based on this rule, algorithms can be written to search for the covariation patterns after a set of homologous RNA sequences are properly aligned. The detected correlated substitutions help to determine conserved base pairing in a secondary structure.

Discussion

20.	Which of the following is untrue regarding Ab Initio–Based Methods?(a) This type of method predicts the secondary structure based on a single query sequence(b) This type of method predicts the secondary structure based on a multiple query sequence(c) It measures the relative propensity of each amino acid belonging to a certain secondary structure element(d) The propensity scores are derived from known crystal structures
Answer» Right option is (b) This type of method predicts the secondary structure based on a multiple query sequence Best explanation: Examples of ab initio prediction are the Chou–Fasman and Garnier, Osguthorpe, Robson (GOR) methods. The ab initio methods were developed in the 1970s when protein structural data were very limited. The statistics derived from the limited data sets can therefore be rather inaccurate. However, the methods are simple enough that they are often used to illustrate the basics of secondary structure prediction.

Discussion

21.	RNA structures can be experimentally determined using _____(a) X-ray crystallography techniques only(b) NMR techniques only(c) X-ray crystallography or NMR techniques(d) Gel electrophoresis
Answer» The correct answer is (c) X-ray crystallography or NMR techniques Explanation: However, the two approaches are extremely time consuming and expensive. As a result, computational prediction has become an attractive alternative. Option Gel electrophoresis, here, becomes irrelevant as it comes to the structure of RNA.

Discussion

22.	If prediction accuracy can be represented using a ______ the ______ programs score roughly 20% to 60% depending on the length of the sequences.(a) multiple parameter, ab initio–based(b) single parameter, ab initio–based(c) multiple parameter, comparative–based(d) single parameter, comparative–based
Answer» Right answer is (b) single parameter, ab initio–based Easiest explanation: As mentioned, the scores depend on the length of the sequences. Generally speaking, the programs perform better for shorter RNA sequences than for longer ones.

Discussion

23.	Which of the following is untrue about Ab initio prediction?(a) The limited knowledge of protein folding forms the basis of ab initio prediction(b) The ab initio prediction method attempts to produce all-atom protein models based on sequence information alone without the aid of known protein structures(c) The ab initio prediction method attempts to produce all-atom protein models based on sequence information alone with some aid of known protein structures(d) The perceived advantage of this method is that predictions are not restricted by known folds and that novel protein folds can be identified
Answer» The correct choice is (c) The ab initio prediction method attempts to produce all-atom protein models based on sequence information alone with some aid of known protein structures Easy explanation: Alongside the advantages, because the physicochemical laws governing protein folding are not yet well understood, the energy functions used in the ab initio prediction are, at present, rather inaccurate. The folding problem remains one of the greatest challenges in bioinformatics today.

Discussion

24.	In dynamic programming, in ab initio methods, the use of a dot plot can be effective in finding a ____ secondary structure in a ________ molecule.(a) multiple, large(b) single, large(c) single, small(d) multiple, small
Answer» The correct answer is (c) single, small The explanation: Mostly, The use of a dot plot can be effective in finding a single secondary structure in a small Molecule. However, if a large molecule contains multiple secondary structure segments, choosing a combination that is energetically most stable among a large number of possibilities can be a daunting task.

Discussion

25.	The presence of ______ signal peptides can significantly compromise the prediction _______ because the programs tend to confuse hydrophobic signal peptides with membrane helices.(a) hydrophobic, accuracy(b) hydrophobic, error(c) hydrophilic, accuracy(d) hydrophilic, error
Answer» Correct option is (a) hydrophobic, accuracy Explanation: Predicting transmembrane helices is relatively easy. The accuracy of Some of the best predicting programs, such as TMHMM or HMMTOP, can exceed 70%. To minimize errors, the presence of signal peptides can be detected using a number of specialized programs and then manually excluded.

Discussion

26.	Because residues in the same aligned position are assumed to have the same secondary structure, any inconsistencies or errors in prediction of individual sequences can be corrected using a majority rule.(a) True(b) FalseI have been asked this question in an interview for job.The question is from Protein Secondary Structure Prediction for Globular Proteins in chapter Secondary Structure Prediction of Bioinformatics
Answer» The correct answer is (a) True Explanation: By aligning multiple sequences, information of positional conservation is revealed. This homology based method has helped improve the prediction accuracy by another 10% over the second-generation methods.

Discussion

27.	The secondary structure prediction methods can be either ab initio based, which make use of single sequence information only, or homology based, which makes use of multiple sequence alignment information.(a) True(b) FalseI have been asked this question during a job interview.Query is from Protein Secondary Structure Prediction for Globular Proteins in division Secondary Structure Prediction of Bioinformatics
Answer» The correct choice is (a) True To explain I would say: The ab initio methods, which belong to early generation methods, predict secondary structures based on statistical calculations of the residues of a single query sequence. The homology-based methods do not rely on statistics of residues of a single sequence, but on common secondary structural patterns conserved among multiple homologous sequences.

Discussion

28.	Which of the following is incorrect regarding sequence homology?(a) Two sequences can homologous relationship even if have do not have common origin(b) It is an important concept in sequence analysis(c) When two sequences are descended from a common evolutionary origin, they are said to have a homologous relationship(d) When two sequences are descended from a common evolutionary origin, they are said to share homology
Answer» Right choice is (a) Two sequences can homologous relationship even if have do not have common origin For explanation I would say: Homologous relationships are more certain when the sequences have common evolutionary origin. A related but different term is sequence similarity, which is the percentage of aligned residues that are similar in physiochemical properties such as size, charge, and hydrophobicity.

Discussion

29.	Which of the following doesn’t describe PAM matrices?(a) This family of matrices lists the likelihood of change from one amino acid to another in homologous protein sequences during evolution(b) There is presently no other type of scoring matrix that is based on such sound evolutionary principles as are these matrices(c) Even though they were originally based on a relatively small data set, the PAM matrices remain a useful tool for sequence alignment(d) It stands for Percent Altered Mutation
Answer» Right option is (d) It stands for Percent Altered Mutation Easy explanation: PAM stands for Percent Accepted Mutation. In this, each matrix gives the changes expected for a given period of evolutionary time, evidenced by decreased sequence similarity as genes encoding the same protein diverge with increased evolutionary time.

Discussion

30.	Which of the following statements about SUPERFAMILY database is incorrect regarding its features?(a) Sequences can be submitted raw or FASTA format(b) Sequences must be submitted in FASTA format only(c) It searches the database using a superfamily, family, or species name plus a sequence, SCOP, PDB or HMM ID’s(d) It has generated GO annotations for evolutionarily closed domains and distant domains
Answer» Correct option is (b) Sequences must be submitted in FASTA format only Best explanation: SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP super families. Sequences can be amino acids, a fixed frame nucleotide sequence, or all frames of a submitted nucleotide sequence. Up to 1000 sequences can be run at a time.

Discussion

31.	Which of the following is untrue about Protein substitution matrices?(a) They are significantly more complex than DNA scoring matrices(b) They have the N x N matrices of the amino acids(c) Protein substitution matrices have quite important role in evolutionary studies(d) They are significantly quite less complex than DNA scoring matrices
Answer» The correct answer is (d) They are significantly quite less complex than DNA scoring matrices For explanation I would say: Protein substitution matrices are significantly more complex than DNA scoring matrices. Proteins are composed of twenty amino acids, and physico-chemical properties of individual amino acids vary considerably. A protein substitution matrix can be based on any property of amino acids: size, polarity, charge, hydrophobicity.

Discussion

32.	Which of the following does not describe PAM matrices?(a) These matrices are used in optimal alignment scoring(b) It stands for Point Altered Mutations(c) It stands for Point Accepted Mutations(d) It was first developed by Margaret Dayhoff
Answer» Correct option is (b) It stands for Point Altered Mutations To explain I would say: PAM stands for Point Accepted Mutations. PAM matrices are calculated by observing the differences in closely related proteins. One PAM unit (PAM1) specifies one accepted point mutation per 100 amino acid residues, i.e. 1% change and 99% remains as such.

Discussion

33.	Which of the following is wrong in case of substitution matrices?(a) They determine likelihood of homology between two sequences(b) They use system where substitutions that are more likely should get a higher score(c) They use system where substitutions that are less likely should get a lower score(d) BLOSUM-X type uses logarithmic identity to find similarity
Answer» The correct choice is (d) BLOSUM-X type uses logarithmic identity to find similarity The best explanation: BLOSUM-X type identifies sequences that are X% similar to the query sequence i. e. score 54 corresponds to 54% similarity hence reducing the complexity of the output and giving the similarity in percentage. Also, these matrices are popular in bioinformatics due to their speed and accuracy.

Discussion

34.	Which of the following is false in case of the database SMART and its algorithm?(a) Contains HMM profiles constructed from manually refined protein domain alignments(b) Alignments in the database are built based on tertiary structures whenever available or based on PSI-BLAST profiles(c) Alignments are further checked but not refined by human annotators before HMM profile construction(d) SMART stands for Simple Modular Architecture Research Tool
Answer» Correct choice is (c) Alignments are further checked but not refined by human annotators before HMM profile construction To elaborate: Alignments are further checked and refined by human annotators before HMM profile construction. Protein functions are also manually curated. Thus, the database may be of better quality than Pfam with more extensive functional annotations. Compared to Pfam, The SMART database contains an independent collection of HMMs, with emphasis on signaling, extracellular, and chromatin-associated motifs and domains. Sequence searching in this database produces a graphical output of domains with well-annotated information with respect to cellular localization, functional sites, super-family, and tertiary structure.

Discussion

35.	Which of the following is not an advantage of Statistical models’ methods in analyzing protein motifs?(a) Sequence information is preserved from a multiple sequence alignment and expresses it with probabilistic models(b) Statistical models allow partial matches and compensate for unobserved sequence patterns using pseudo-counts(c) Statistical models have stronger predictive power than the regular expression based approach, even when they are derived from a limited set of sequences(d) The comparative flexibility is less in case of these methods when compared to regular expressions methods
Answer» Right choice is (d) The comparative flexibility is less in case of these methods when compared to regular expressions methods The best explanation: The major limitation of regular expressions is that this method does not take into account sequence probability information about the multiple alignment from which it is modeled making them less flexible. If a regular expression is derived from an incomplete sequence set, it has less predictive power because many more sequences with the same type of motifs are not represented. Unlike regular expressions, position-specific scoring matrices (PSSMs), profiles, and HMMs preserve the sequence information from a multiple sequence alignment and express it with probabilistic models.

Discussion

36.	Motifs that can form α/β horseshoes conformation are rich with which protein residue?(a) Proline(b) Arginine(c) Valine(d) Leucine
Answer» Right answer is (d) Leucine To elaborate: Specific pattern of Leucine residues, strands form a curved sheet with helices on the outside. Leucine-rich repeats (LRRs) are 20-29-residue sequence motifs present in a number of proteins with diverse functions. The primary function of these motifs appears to be to provide a versatile structural framework for the formation of protein-protein interactions.

Discussion

37.	Which of the common structural motifs are described wrongly?(a) β-hairpin – adjacent antiparallel strands(b) Greek key – 4 adjacent antiparallel strand(c) β-α-β – 2 parallel strands connected by helix(d) β-α-β – 2 antiparallel strands connected by helix
Answer» Right choice is (d) β-α-β – 2 antiparallel strands connected by helix To elaborate: In motif, two adjacent β parallel strands are connected by an α helix from the C-terminus of strand 1 to the N-terminus of strand. Most protein structures that contain parallel beta-sheets are built up from combinations of such β-α-βmotifs.

Discussion

38.	In terminologies related to regular expressions which of the following is false about terms and operators?(a) Terms are strings or substrings(b) Operators combine terms and expressions(c) Operators do not have precedence(d) Operators have precedence like arithmetic operators
Answer» Correct choice is (c) Operators do not have precedence Easy explanation: For harmonious, efficient and error-free functioning of the matching preocess, operators have precedence in order to set the priority of the operations to be carried out during the alignment.

Discussion

39.	Which of the following best defines regular expressions?(a) They are made up of terms, operators and modifiers(b) They describe string or set of strings to find matching patterns(c) They are strictly restricted to alignment and corresponding score(d) They consist of set of rules for the connotations of various amino acid residues
Answer» The correct option is (b) They describe string or set of strings to find matching patterns For explanation: Regular expressions are powerful notable algebra that describe string or set of strings to find matching patterns. Pattern matching is defined as true or false in answer or outcome. And it is true that they are made up of terms, operators and modifiers but they are terminologies further used in matching process.

Discussion

40.	For motif scanning which of the following programs or databases is for regulated sites curated from scientific literature?(a) ENSEMBL(b) ORegAnno(c) MAST(d) Clover
Answer» Right choice is (b) ORegAnno Best explanation: Clover identifies overrepresented motifs in protein sequences whereas; MAST allows users to scan different databases for matches to motifs. ENSEMBL is another online genomic sequence repository which also includes online tools for data mining as well as BLAST searches.

Discussion

41.	The matrices PAM250 and BLOSUM62 contain _______(a) positive and negative values(b) positive values only(c) negative values only(d) neither positive nor negative values, just the percentage
Answer» Right choice is (a) positive and negative values For explanation I would say: These matrices contain positive and negative values, reflecting the likelihood of each amino acid substitution in related proteins. Using these tables, an alignment of a sequential set of amino acid pairs with no gaps receives an overall score that is the sum of the positive and negative log odds scores for each individual amino acid pair in the alignment.

Discussion

42.	Which of the following are not related to Needleman-Wunsch alignment algorithm?(a) Global alignment programs use this algorithm(b) The output is a positive number(c) Small changes in the scoring system can produce a different alignment(d) Changes in the scoring system can produce the same alignment
Answer» Right answer is (d) Changes in the scoring system can produce the same alignment To elaborate: In general, global alignment programs use the Needleman-Wunsch alignment algorithm and a scoring system that scores the average match of an aligned nucleotide or amino acid pair as a positive number. Hence, the score of the alignment of random or unrelated sequences grows proportionally to the length of the sequences. In addition, there are many possible different global alignments depending on the scoring system chosen, and small changes in the scoring system can produce a different alignment.

Discussion

43.	Who were the inventors of this method?(a) Smith-Waterman(b) Margaret Preston(c) Gibbs and McIntyre(d) Needleman-Wunsch
Answer» Correct answer is (c) Gibbs and McIntyre The best explanation: The first computer aided sequence comparison is called “dot-matrix analysis” or simply dot-plot. The first published account of this method is by Gibbs and McIntyre (1970 The diagram, a method for comparing sequences. Eur. J. Biochem 16: 1-11).

Discussion

44.	Which of the following is true for EMBOSS Dottup?(a) Allows you to specify threshold(b) Doesn’t allow you to specify threshold(c) Doesn’t allow you to specify window size(d) If all cells in the window are identity, it colors in some specific cells in the window
Answer» Correct choice is (b) Doesn’t allow you to specify threshold The explanation: The EMBOSS Dottup doesn’t allow you to specify threshold but allows you to specify window size. Also, if all cells in the window are identity, it colors in all the cells in the window.

Discussion

45.	Many studies have demonstrated that the distribution of similarity scores assumes a peculiar shape that resembles a highly skewed normal distribution with a long tail on one side. The distribution matches the _______(a) Gumble elective value distribution(b) Gumble extreme void distribution(c) Gumble end value distribution(d) Gumble extreme value distribution
Answer» Right answer is (d) Gumble extreme value distribution To explain I would say: The mentioned Distribution pattern matches the Gumble extreme value distribution for which a mathematical expression is available. This means that, given a sequence similarity value, by using the mathematical formula for the extreme distribution, the statistical significance can be accurately estimated.

Discussion

46.	Which of the following cannot be related to multiple sequence alignment?(a) Many conserved and functionally critical amino acid residues can be identified in a protein multiple alignment(b) Multiple sequence alignment is also an essential prerequisite to carrying out phylogenetic analysis of sequence families and prediction of protein secondary and tertiary structures(c) Multiple sequence alignment also has applications in designing degenerate polymerase chain reaction (PCR) primers based on multiple related sequences(d) This method does not contribute much to degenerate polymerase chain reaction (PCR) primers creation
Answer» Right answer is (d) This method does not contribute much to degenerate polymerase chain reaction (PCR) primers creation The explanation: In practice, heuristic approaches are most often used. Multiple sequence alignment has applications in designing degenerate (PCR) primers based on multiple related sequences.

Discussion

47.	Which of the following is untrue regarding T-Coffee?(a) It stands for Tree-based Consistency Objective Function for alignment Evaluation(b) It performs progressive sequence alignments as in Clustal.(c) The global pair wise alignment is not performed using the Clustal program.(d) The local pair wise alignment is generated by the Lalign program, from which the top ten scored alignments are selected
Answer» Right choice is (c) The global pair wise alignment is not performed using the Clustal program. The explanation: The global pair wise alignment is performed using the Clustal program. The main difference is that, in processing a query, T-Coffee performs both global and local pair wise alignment for all possible pairs involved. The collection of local and global sequence alignments is pooled to form a library. The consistency of the alignments is evaluated.

Discussion

48.	Which of the following is a part of the statistical test of sequences?(a) An optimal alignment between two chosen sequences is obtained at the end(b) Unrelated sequences of the same length are then generated through a randomization process(c) Unrelated sequences of the different length are then generated through a randomization process(d) Related sequences of the same length are then generated through a randomization process
Answer» Correct option is (b) Unrelated sequences of the same length are then generated through a randomization process To explain I would say: Unrelated sequences of the same length are then generated through a randomization process in which one of the two sequences is randomly shuffled. And the next step is that a new alignment score is computed for the shuffled sequence pair.

Discussion

49.	Progenitor sequences represented by the ______ branches of the tree are derived by alignment of the _______ sequences.(a) outer, outermost(b) inner, outermost(c) inner, innermost(d) outer, innermost
Answer» The correct answer is (b) inner, outermost For explanation I would say: Progenitor sequences represented by the inner branches of the tree are derived by alignment of the outermost sequences. These inner branches will have uncertainties where positions in the outermost sequences are dissimilar.

Discussion

50.	Which of the following is untrue about DCA?(a) It stands for Divide-and-Conquer Alignment(b) It works by breaking each of the sequences into two smaller sections(c) The breaking points during the process are determined based on regional similarity of the sequences(d) If the sections are not short enough, further divisions are restricted as well
Answer» Right choice is (d) If the sections are not short enough, further divisions are restricted as well The best I can explain: This is a web-based program that is in fact semi exhaustive because certain steps of computation are reduced to heuristics. If the sections are not short enough, further divisions are carried out. When the lengths of the sequences reach a predefined threshold, dynamic programming is applied for aligning each set of subsequences. The resulting short alignments are joined together head to tail to yield a multiple alignment of the entire length of all sequences.

Discussion

Explore topic-wise InterviewSolutions in Current Affairs.

In which of the following multipurpose packages Gibbs sampling algorithm is used?(a) Consensus(b) BEST(c) AlignACE(d) PhyloCon

For what type of sequences Gibbs sampling is used?(a) Closely related sequences(b) Distinctly related sequences(c) Distinctly related sequences that share common motifs(d) Closely related sequences that share common motifs

In the web-based program MEME, the computation is a _____ step procedure.(a) one(b) two(c) three(d) four

Gibbs is a web-based program that uses the Gibbs sampling approach to look for _____ gap-free segments for either DNA or protein sequences.(a) short, partially conserved(b) long, partially conserved(c) long, conserved(d) short, not conserved

In a type of probability, analysis is to calculate the odds score for one event OR a second event, or of a series of events. In this case, the odds scores are _______(a) multiplied(b) subtracted(c) added and multiplied(d) added

If the purpose is to calculate the probability of one event AND a second event, the odds scores for the events are _________(a) added(b) multiplied(c) multiplied and added(d) subtracted

By whom and when were the Bayesian methods applied first?(a) Smith-Waterman, 1981(b) Agarwal and States, 1996(c) Smith-Waterman, 1996(d) Agarwal and States, 1981

Which of the following is not a base of RNA?(a) Thymine (T)(b) Adenine (A)(c) Cytosine (C)(d) Guanine (G)

The pre-alignment independent programs fare __________ for predicting long sequences.(a) slight better(b) much better(c) a bit worse(d) much worse

The number of possible global alignments between two sequences of length N is _____(a) (frac{2^N}{sqrt{πN}})(b) (frac{2^{2N}}{sqrt{πN}})(c) (frac{2^{(N-1)}}{sqrt{πN}})(d) (frac{2^{2N}}{sqrt{N}})

In CATH, Structural domain separation is carried by ___________(a) manual comparison only(b) computer programs only(c) human expertise only(d) a combined effort of a human expert and computer programs

_ of covariation can be _ to the RNA structure and functions.(a) Any lack, deleterious(b) Any lack, benign(c) Any abundance, deleterious(d) Any inadequacy, advantageous

RNA structures can be experimentally determined using _____(a) X-ray crystallography techniques only(b) NMR techniques only(c) X-ray crystallography or NMR techniques(d) Gel electrophoresis

If prediction accuracy can be represented using a the programs score roughly 20% to 60% depending on the length of the sequences.(a) multiple parameter, ab initio–based(b) single parameter, ab initio–based(c) multiple parameter, comparative–based(d) single parameter, comparative–based

In dynamic programming, in ab initio methods, the use of a dot plot can be effective in finding a secondary structure in a ____ molecule.(a) multiple, large(b) single, large(c) single, small(d) multiple, small

The presence of signal peptides can significantly compromise the prediction _ because the programs tend to confuse hydrophobic signal peptides with membrane helices.(a) hydrophobic, accuracy(b) hydrophobic, error(c) hydrophilic, accuracy(d) hydrophilic, error

Which of the following does not describe PAM matrices?(a) These matrices are used in optimal alignment scoring(b) It stands for Point Altered Mutations(c) It stands for Point Accepted Mutations(d) It was first developed by Margaret Dayhoff

Motifs that can form α/β horseshoes conformation are rich with which protein residue?(a) Proline(b) Arginine(c) Valine(d) Leucine

Which of the common structural motifs are described wrongly?(a) β-hairpin – adjacent antiparallel strands(b) Greek key – 4 adjacent antiparallel strand(c) β-α-β – 2 parallel strands connected by helix(d) β-α-β – 2 antiparallel strands connected by helix

In terminologies related to regular expressions which of the following is false about terms and operators?(a) Terms are strings or substrings(b) Operators combine terms and expressions(c) Operators do not have precedence(d) Operators have precedence like arithmetic operators

For motif scanning which of the following programs or databases is for regulated sites curated from scientific literature?(a) ENSEMBL(b) ORegAnno(c) MAST(d) Clover

The matrices PAM250 and BLOSUM62 contain _______(a) positive and negative values(b) positive values only(c) negative values only(d) neither positive nor negative values, just the percentage

Which of the following are not related to Needleman-Wunsch alignment algorithm?(a) Global alignment programs use this algorithm(b) The output is a positive number(c) Small changes in the scoring system can produce a different alignment(d) Changes in the scoring system can produce the same alignment

Who were the inventors of this method?(a) Smith-Waterman(b) Margaret Preston(c) Gibbs and McIntyre(d) Needleman-Wunsch

Which of the following is true for EMBOSS Dottup?(a) Allows you to specify threshold(b) Doesn’t allow you to specify threshold(c) Doesn’t allow you to specify window size(d) If all cells in the window are identity, it colors in some specific cells in the window

Progenitor sequences represented by the branches of the tree are derived by alignment of the _ sequences.(a) outer, outermost(b) inner, outermost(c) inner, innermost(d) outer, innermost

Explore topic-wise InterviewSolutions in Current Affairs.

In which of the following multipurpose packages Gibbs sampling algorithm is used?(a) Consensus(b) BEST(c) AlignACE(d) PhyloCon

For what type of sequences Gibbs sampling is used?(a) Closely related sequences(b) Distinctly related sequences(c) Distinctly related sequences that share common motifs(d) Closely related sequences that share common motifs

In the web-based program MEME, the computation is a _____ step procedure.(a) one(b) two(c) three(d) four

Gibbs is a web-based program that uses the Gibbs sampling approach to look for _____ gap-free segments for either DNA or protein sequences.(a) short, partially conserved(b) long, partially conserved(c) long, conserved(d) short, not conserved

In a type of probability, analysis is to calculate the odds score for one event OR a second event, or of a series of events. In this case, the odds scores are _______(a) multiplied(b) subtracted(c) added and multiplied(d) added

If the purpose is to calculate the probability of one event AND a second event, the odds scores for the events are _________(a) added(b) multiplied(c) multiplied and added(d) subtracted

By whom and when were the Bayesian methods applied first?(a) Smith-Waterman, 1981(b) Agarwal and States, 1996(c) Smith-Waterman, 1996(d) Agarwal and States, 1981

Which of the following is not a base of RNA?(a) Thymine (T)(b) Adenine (A)(c) Cytosine (C)(d) Guanine (G)

The pre-alignment independent programs fare __________ for predicting long sequences.(a) slight better(b) much better(c) a bit worse(d) much worse

The number of possible global alignments between two sequences of length N is _____(a) (frac{2^N}{sqrt{πN}})(b) (frac{2^{2N}}{sqrt{πN}})(c) (frac{2^{(N-1)}}{sqrt{πN}})(d) (frac{2^{2N}}{sqrt{N}})

In CATH, Structural domain separation is carried by ___________(a) manual comparison only(b) computer programs only(c) human expertise only(d) a combined effort of a human expert and computer programs

___________ of covariation can be ___________ to the RNA structure and functions.(a) Any lack, deleterious(b) Any lack, benign(c) Any abundance, deleterious(d) Any inadequacy, advantageous

RNA structures can be experimentally determined using _____(a) X-ray crystallography techniques only(b) NMR techniques only(c) X-ray crystallography or NMR techniques(d) Gel electrophoresis

In dynamic programming, in ab initio methods, the use of a dot plot can be effective in finding a ____ secondary structure in a ________ molecule.(a) multiple, large(b) single, large(c) single, small(d) multiple, small

The presence of ______ signal peptides can significantly compromise the prediction _______ because the programs tend to confuse hydrophobic signal peptides with membrane helices.(a) hydrophobic, accuracy(b) hydrophobic, error(c) hydrophilic, accuracy(d) hydrophilic, error

Which of the following does not describe PAM matrices?(a) These matrices are used in optimal alignment scoring(b) It stands for Point Altered Mutations(c) It stands for Point Accepted Mutations(d) It was first developed by Margaret Dayhoff

Motifs that can form α/β horseshoes conformation are rich with which protein residue?(a) Proline(b) Arginine(c) Valine(d) Leucine

Which of the common structural motifs are described wrongly?(a) β-hairpin – adjacent antiparallel strands(b) Greek key – 4 adjacent antiparallel strand(c) β-α-β – 2 parallel strands connected by helix(d) β-α-β – 2 antiparallel strands connected by helix

In terminologies related to regular expressions which of the following is false about terms and operators?(a) Terms are strings or substrings(b) Operators combine terms and expressions(c) Operators do not have precedence(d) Operators have precedence like arithmetic operators

For motif scanning which of the following programs or databases is for regulated sites curated from scientific literature?(a) ENSEMBL(b) ORegAnno(c) MAST(d) Clover

The matrices PAM250 and BLOSUM62 contain _______(a) positive and negative values(b) positive values only(c) negative values only(d) neither positive nor negative values, just the percentage

Which of the following are not related to Needleman-Wunsch alignment algorithm?(a) Global alignment programs use this algorithm(b) The output is a positive number(c) Small changes in the scoring system can produce a different alignment(d) Changes in the scoring system can produce the same alignment

Who were the inventors of this method?(a) Smith-Waterman(b) Margaret Preston(c) Gibbs and McIntyre(d) Needleman-Wunsch

Which of the following is true for EMBOSS Dottup?(a) Allows you to specify threshold(b) Doesn’t allow you to specify threshold(c) Doesn’t allow you to specify window size(d) If all cells in the window are identity, it colors in some specific cells in the window

Progenitor sequences represented by the ______ branches of the tree are derived by alignment of the _______ sequences.(a) outer, outermost(b) inner, outermost(c) inner, innermost(d) outer, innermost

_ of covariation can be _ to the RNA structure and functions.(a) Any lack, deleterious(b) Any lack, benign(c) Any abundance, deleterious(d) Any inadequacy, advantageous

In dynamic programming, in ab initio methods, the use of a dot plot can be effective in finding a secondary structure in a ____ molecule.(a) multiple, large(b) single, large(c) single, small(d) multiple, small

The presence of signal peptides can significantly compromise the prediction _ because the programs tend to confuse hydrophobic signal peptides with membrane helices.(a) hydrophobic, accuracy(b) hydrophobic, error(c) hydrophilic, accuracy(d) hydrophilic, error

Progenitor sequences represented by the branches of the tree are derived by alignment of the _ sequences.(a) outer, outermost(b) inner, outermost(c) inner, innermost(d) outer, innermost