119 + Interview Questions in Sequence Alignment in Bioinformatics Page 1 InterviewSolution

1.	Unlike the commonly used methods for aligning a pair of sequences, the Bayesian method _______ using a particular scoring matrix or designated gap penalties.(a) does not depend on(b) depends on(c) is based on(d) involvesI had been asked this question in an online quiz.This interesting question is from Bayesian Statistics in portion Sequence Alignment of Bioinformatics
Answer» CORRECT answer is (a) does not depend on For explanation: Because it doesn’t depend on the MENTIONED techniques, there is no need to choose a particular scoring SYSTEM or gap penalty. Instead, a number of different scoring matrices and range of block numbers up to some reasonable maximum are examined, and the most probable alignments are determined. The Bayesian method provides a distribution of alignments weighted according to probability and can also PROVIDE an ESTIMATE of the evolutionary distance between the sequences that is independent of scoring matrix and gaps.

Discussion

2.	Who suggested that the global alignment scores between unrelated protein sequences followed the extreme value distribution, similar to local alignment scores? And when?(a) Abagyan and Batalov, in 1981(b) Chvátal and Lipman, in 1984(c) Abagyan and Batalov, in 1997(d) Chvátal and Sankoff, in 1995This question was posed to me during an interview.I'd like to ask this question from Assessing the Significance of Sequence Alignments topic in portion Sequence Alignment of Bioinformatics
Answer» The correct ANSWER is (c) Abagyan and Batalov, in 1997 For explanation I would say: Abagyan and Batalov, in 1997, suggested the above observation. HOWEVER, since the scoring system that they used favored LOCAL alignments, these alignments they produced may not be global but local. Unfortunately, there is no EQUIVALENT theory on which to base an analysis of global alignment scores as there is for local alignment scores.

Discussion

3.	The higher is the score in the alignment _________(a) the more significant is the alignment(b) or the less it resembles alignments in related proteins(c) the less significant is the alignment(d) the less it aligns with the related protein sequenceThe question was posed to me during an online interview.The question is from Dynamic Programming Algorithm for Sequence Alignment topic in division Sequence Alignment of Bioinformatics
Answer» Correct choice is (a) the more significant is the alignment The best explanation: In the scoring system, the higher this score, the more significant is the alignment, or the more it resembles alignments in related proteins. ALSO, the score GIVEN for gaps in aligned SEQUENCES is NEGATIVE, because such misaligned regions should be uncommon in sequences of related proteins. Such a score will reduce the score obtained from an adjacent, matching region upstream in the sequences.

Discussion

4.	Isolated dots that are not on the diagonal represent exact matches.(a) True(b) FalseThis question was addressed to me in an online interview.This interesting question is from Dot Matrix Sequence Comparison in division Sequence Alignment of Bioinformatics
Answer» The correct answer is (b) False The explanation is: Those isolated DOTS represent RANDOM matches. The dots on the diagonal represent the perfect alignment and the dots with vertical and horizontal shifts show INSERTIONS and deletions.

Discussion

5.	Which of the following is false in case of the database SMART and its algorithm?(a) Contains HMM profiles constructed from manually refined protein domain alignments(b) Alignments in the database are built based on tertiary structures whenever available or based on PSI-BLAST profiles(c) Alignments are further checked but not refined by human annotators before HMM profile construction(d) SMART stands for Simple Modular Architecture Research ToolI have been asked this question in semester exam.This intriguing question originated from Motif and Domain Databases Using Statistical Models in section Sequence Alignment of Bioinformatics
Answer» Correct choice is (c) Alignments are further checked but not refined by HUMAN annotators before HMM profile construction To elaborate: Alignments are further checked and refined by human annotators before HMM profile construction. Protein functions are also manually curated. Thus, the database may be of better quality than PFAM with more extensive functional annotations. Compared to Pfam, The SMART database contains an independent collection of HMMs, with emphasis on signaling, extracellular, and chromatin-associated MOTIFS and domains. Sequence searching in this database produces a graphical output of domains with well-annotated information with respect to cellular LOCALIZATION, functional sites, super-family, and tertiary structure.

Discussion

6.	Which of the following is not an advantage of Needleman-Wunsch algorithm?(a) New algorithmic improvements as well as increasing computer capacity make it possible to align a query sequence against a large DB in a few minutes(b) Similar sequence region is of same order and orientation(c) This does not help in determining evolutionary relationship(d) If you have 2 genes that are already understood as closely related, then this type of algorithm can be used to understand them in further detailsI had been asked this question in class test.This is a very interesting question from Global Sequence Alignment in chapter Sequence Alignment of Bioinformatics
Answer» Right ANSWER is (c) This does not HELP in DETERMINING evolutionary relationship For explanation I would say: Needleman-Wunsch algorithm is used when 2 genes that are ALREADY understood as CLOSELY related and can be used to understand them in further details. This is quite helpful in finding orthologs, paralogs and homologs in evolutionary studies.

Discussion

7.	_______ analyzed the distribution of scores among 100 vertebrate nucleic acid sequences and compared these scores with randomized sequences prepared in different ways.(a) Lipman, in 1984(b) Batalov, in 1964(c) Waterman, in 1987(d) Lipman, in 1967This question was addressed to me in my homework.The question is from Assessing the Significance of Sequence Alignments topic in division Sequence Alignment of Bioinformatics
Answer» The correct answer is (a) Lipman, in 1984 The explanation: When the RANDOMIZED sequences were prepared by shuffling the SEQUENCE to conserve base composition, as was done by Dayhoff and others, the standard deviation was approximately one-third less than the distribution of scores of the natural sequences. Thus, natural sequences are more variable than randomized ONES, and using such randomized sequences for a significance TEST may lead to an overestimation of the significance.

Discussion

8.	Which of the following doesn’t describe PAM matrices?(a) This family of matrices lists the likelihood of change from one amino acid to another in homologous protein sequences during evolution(b) There is presently no other type of scoring matrix that is based on such sound evolutionary principles as are these matrices(c) Even though they were originally based on a relatively small data set, the PAM matrices remain a useful tool for sequence alignment(d) It stands for Percent Altered MutationI had been asked this question during an interview for a job.My question is taken from Use of Scoring Matrices and Gap Penalties in Sequence Alignments topic in portion Sequence Alignment of Bioinformatics
Answer» Right option is (d) It STANDS for Percent Altered Mutation Easy explanation: PAM stands for Percent Accepted Mutation. In this, each matrix gives the changes expected for a given PERIOD of evolutionary time, evidenced by decreased sequence similarity as genes encoding the same protein DIVERGE with increased evolutionary time.

Discussion

9.	In scoring matrices, for convenience, odds scores are converted to log odds scores.(a) True(b) FalseI had been asked this question during a job interview.This key question is from Use of Scoring Matrices and Gap Penalties in Sequence Alignments in portion Sequence Alignment of Bioinformatics
Answer» Right answer is (a) True Easy explanation: The odds scores are converted to log odds scores so that the values for amino acid pairs in an alignment may be summed to obtain the log odds score of the alignment. In this case, the logarithms are calculated to the BASE 2 and multiplied by 2 to GIVE values DESIGNATED as half-bits (a BIT is the UNIT of an odds score that has been converted to a logarithm to the base 2). The value of 4 indicates that the 4 amino acid alignment is 2(4/2) = 4 four-fold more likely than expected by chance.

Discussion

10.	Which of the following does not describe k-tuple methods?(a) k-tuple methods are best known for their implementation in the database search tools FASTA and the BLAST family(b) They are also known as words methods(c) They are basically heuristic methods to find local alignment(d) They are useful in small scale databasesThe question was asked by my school principal while I was bunking the class.I'd like to ask this question from Local Sequence Alignment in portion Sequence Alignment of Bioinformatics
Answer» The correct answer is (d) They are USEFUL in small SCALE databases Explanation: k-tuple or word methods are especially useful in large-scale DATABASE searches where a large proportion of stored sequences will have essentially no significant match with the query sequence. They are HEURISTIC methods that are not guaranteed to find an optimal ALIGNMENT solution but are significantly more efficient than Smith-Waterman algorithm.

Discussion

11.	Which of the following is not a site on internet for alignment of sequence pairs?(a) BLASTX(b) BLASTN(c) SIM(d) BCM Search LauncherThe question was asked during an online interview.I would like to ask this question from Dynamic Programming Algorithm for Sequence Alignment topic in chapter Sequence Alignment of Bioinformatics
Answer»

Discussion

12.	Which of the following does not describe global alignment algorithm?(a) In initialization step, the first row and first column are subject to gap penalty(b) Score can be negative(c) In trace back step, beginning is with the cell at the lower right of the matrix and it ends at top left cell(d) First row and first column are set to zeroThis question was posed to me in a national level competition.I'm obligated to ask this question of Global Sequence Alignment in division Sequence Alignment of Bioinformatics
Answer» The correct CHOICE is (d) FIRST ROW and first COLUMN are set to zero For explanation I would say: INITIALIZATION and scoring system of the Smith–Waterman algorithm and Needleman-Wunsch algorithm is quite different. In global alignment first row and first column are subject to gap penalty and are not set to 0.

Discussion

13.	Zhu (1998) have devised a computer program called the Bayes block aligner which in effect slides ____ sequences along each other to find the ______ ungapped regions or blocks.(a) two, least scoring(b) two, highest scoring(c) multiple, highest scoring(d) multiple, least scoringI have been asked this question in an online quiz.This key question is from Bayesian Statistics topic in chapter Sequence Alignment of Bioinformatics
Answer» Correct option is (b) two, HIGHEST scoring The explanation: These blocks are then joined in VARIOUS combinations to produce alignments. There is no need for GAP penalties because only the aligned sequence positions in blocks are scored. Instead of using a given substitution matrix and gap scoring system to FIND the highest scoring ALIGNMENT, a Bayesian statistical approach is used.

Discussion

14.	When random or unrelated sequences are compared using a global alignment method, they can have ____________ reflecting the tendency of the global algorithm to match as many characters as possible.(a) very low scores(b) very high scores(c) moderate scores(d) low scoresI got this question during an interview.The origin of the question is Assessing the Significance of Sequence Alignments topic in portion Sequence Alignment of Bioinformatics
Answer» The CORRECT choice is (b) very HIGH scores Easiest explanation: The significance of the scores of global alignments, is more DIFFICULT to determine. Using the Needleman-Wunsch ALGORITHM and a suitable scoring system, there are many ways to produce a global alignment between any pair of sequences, and the scores of many different alignments may be quite similar hence the scores obtained might be unusually high.

Discussion

15.	Gaps are added to the alignment because it ______(a) increases the matching of identical amino acids at subsequent portions in the alignment(b) increases the matching of or dissimilar amino acids at subsequent portions in the alignment(c) reduces the overall score(d) enhances the area of the sequencesThe question was asked in exam.I would like to ask this question from Dynamic Programming Algorithm for Sequence Alignment topic in division Sequence Alignment of Bioinformatics
Answer» The correct CHOICE is (a) INCREASES the matching of identical amino acids at subsequent portions in the ALIGNMENT Explanation: In the alignment process, gaps are added to the alignment in a manner that increases the matching of identical or similar amino acids at subsequent portions in the alignment. Ideally, when two similar protein sequences are ALIGNED, the alignment should have long regions of identical or related amino ACID pairs and very few gaps. As the sequences become more distant, more mismatched amino acid pairs and gaps should appear.

Discussion

16.	Which of the following is true for EMBOSS Dottup?(a) Allows you to specify threshold(b) Doesn’t allow you to specify threshold(c) Doesn’t allow you to specify window size(d) If all cells in the window are identity, it colors in some specific cells in the windowThis question was addressed to me by my college director while I was bunking the class.My query is from Dot Matrix Sequence Comparison in section Sequence Alignment of Bioinformatics
Answer» Correct CHOICE is (B) Doesn’t allow you to specify threshold The explanation: The EMBOSS DOTTUP doesn’t allow you to specify threshold but allows you to specify WINDOW size. Also, if all cells in the window are identity, it colors in all the cells in the window.

Discussion

17.	Who were the inventors of this method?(a) Smith-Waterman(b) Margaret Preston(c) Gibbs and McIntyre(d) Needleman-WunschThis question was posed to me during an internship interview.Query is from Dot Matrix Sequence Comparison in division Sequence Alignment of Bioinformatics
Answer» Correct answer is (c) Gibbs and McIntyre The best explanation: The first computer aided SEQUENCE comparison is called “dot-matrix ANALYSIS” or simply dot-plot. The first published ACCOUNT of this method is by Gibbs and McIntyre (1970 The diagram, a method for comparing SEQUENCES. Eur. J. BIOCHEM 16: 1-11).

Discussion

18.	Which of the following does not describe BLOSUM matrices?(a) It stands for BLOcks SUbstitution Matrix(b) It was developed by Henikoff and Henikoff(c) The year it was developed was 1992(d) These matrices are logarithmic identity valuesThis question was addressed to me in homework.This question is from Local Sequence Alignment topic in portion Sequence Alignment of Bioinformatics
Answer» The correct option is (d) These MATRICES are LOGARITHMIC identity values The best I can explain: These matrices are actual percentage identity values. Or SIMPLY, they depend on similarity. BLOSUM 62 means there is 62 % similarity.

Discussion

19.	A length and distance that gives the highest overall probability may then be determined. Such alignments are initially found using ________(a) a particular scoring matrix only(b) an alignment algorithm only(c) an alignment algorithm and a particular scoring matrix(d) dot methodThe question was asked during an interview for a job.I want to ask this question from Bayesian Statistics in division Sequence Alignment of Bioinformatics
Answer» The CORRECT answer is (c) an alignment algorithm and a particular scoring matrix Explanation: ANALYSIS of the yeast and C. elegans genomes for such repeats has underscored the importance of using a range of DNA scoring MATRICES such as PAM1 to PAM120 if most repeats are to be found. The application of the above Bayesian analysis allows a DETERMINATION of the probability distributions as a function of both LENGTH of the repeated region and evolutionary distance.

Discussion

20.	In case of the varying alignment, penalizing gaps heavily might occur. Then the best scoring local alignment between the sequences will be one that optimizes the score between matches and mismatches, without any gaps.(a) True(b) FalseThe question was asked during an internship interview.My enquiry is from Use of Scoring Matrices and Gap Penalties in Sequence Alignments in division Sequence Alignment of Bioinformatics
Answer» Right choice is (a) True For explanation: If both mismatches and gaps are HEAVILY penalized, the resulting ALIGNMENT will also be a local alignment that CONTAINS the longest region of exact matches. In the above two cases, the alignment score of the highest-scoring local alignment will increase as the logarithm of the length of the sequences. Under these same conditions, the score of the CORRESPONDING global alignment between the sequences will be negative.

Discussion

21.	Which of the following is not a description of dynamic programming algorithm?(a) A method of sequence alignment(b) A method that can take gaps into account(c) A method that requires a manageable number of comparisons(d) This method doesn’t provide an optimal (highest scoring) alignmentI got this question in semester exam.Question is from Dynamic Programming Algorithm for Sequence Alignment topic in portion Sequence Alignment of Bioinformatics
Answer» Right choice is (d) This METHOD doesn’t provide an optimal (highest scoring) alignment For explanation: The method of SEQUENCE alignment by dynamic programming provides an optimal (highest scoring) alignment as an output. The quality of the alignment between two sequences is calculated using a scoring system that favors the matching of related or IDENTICAL amino ACIDS and penalizes for poorly matched amino acids and GAPS.

Discussion

22.	Use of the dynamic programming method requires a scoring system for the comparison of symbol pairs, and a scheme for GAP penalties.(a) True(b) FalseI had been asked this question in homework.The origin of the question is Dynamic Programming Algorithm for Sequence Alignment topic in division Sequence Alignment of Bioinformatics
Answer» The correct answer is (a) True The best EXPLANATION: Once those parameters have been SET, the resulting ALIGNMENT for two sequences should always be the same. Hence, the use of the dynamic PROGRAMMING method requires a scoring system for the comparison of symbol pairs (nucleotides for DNA sequences and amino acids for protein sequences), and a scheme for insertion/deletion (GAP) penalties.

Discussion

23.	The softwares for dot plot analysis perform several tasks. Which one of them is not performed by them?(a) Gap open penalty(b) Gap extend penalty(c) Expectation threshold(d) Change or mutate residuesI have been asked this question during an interview.My doubt is from Dot Matrix Sequence Comparison in division Sequence Alignment of Bioinformatics
Answer» Right answer is (d) Change or MUTATE residues Easiest explanation: The gap penalties mentioned above are for the determination of score of the ALIGNING sequences. The change in residue barely takes place as there are number of other softwares for that PURPOSE and also the main OBJECTIVE is to FIND the score of the alignment.

Discussion

24.	Which of the following is untrue regarding BLAST and FASTA?(a) FASTA is faster than BLAST(b) FASTA is the most accurate(c) BLAST has limited choices of databases(d) FASTA is more sensitive for DNA-DNA comparisonsI had been asked this question in unit test.My query is from Local Sequence Alignment in division Sequence Alignment of Bioinformatics
Answer» Correct answer is (a) FASTA is faster than BLAST The best I can explain: BLAST is faster than FASTA and most other tools. The speed and RELATIVELY good ACCURACY of BLAST is the KEY why the tool is the most popular bioinformatics search tool.

Discussion

25.	Dot plot of repeating elements would be small crosses on plot.(a) True(b) FalseI have been asked this question during an internship interview.I need to ask this question from Dot Matrix Sequence Comparison topic in chapter Sequence Alignment of Bioinformatics
Answer» The correct CHOICE is (b) False Explanation: The REPEATING elements would be represented in parallel lines in repetitive manner. BETTER is the repetition; better is the nature of parallel lines. Also, the INTERSECTIONS show the pallindromic sequences.

Discussion

26.	By whom and when were the Bayesian methods applied first?(a) Smith-Waterman, 1981(b) Agarwal and States, 1996(c) Smith-Waterman, 1996(d) Agarwal and States, 1981This question was posed to me in final exam.This interesting question is from Bayesian Statistics topic in portion Sequence Alignment of Bioinformatics
Answer» Right choice is (b) Agarwal and States, 1996 The best I can explain: Agarwal and States, in1996, have applied BAYESIAN methods to provide the best estimate of the evolutionary distance between TWO DNA sequences. For example, sequences of the same LENGTH that have a certain LEVEL of mismatches.

Discussion

27.	On analysis of the alignment scores of random sequences will reveal that the scores follow a different distribution than the normal distribution called the _________(a) Gumbel equal value distribution(b) Gumbel extreme value distribution(c) Gumbel end value distribution(d) Gumbel distributionThis question was addressed to me in an online interview.Origin of the question is Assessing the Significance of Sequence Alignments in chapter Sequence Alignment of Bioinformatics
Answer» Correct choice is (b) Gumbel extreme value DISTRIBUTION The best explanation: Originally, the significance of sequence alignment scores was evaluated on the basis of the assumption that alignment scores followed a normal STATISTICAL distribution. If sequences are randomly generated in a computer by a Monte Carlo or sequence SHUFFLING method, as in generating a sequence by picking marbles representing four bases or 20 amino acids out of a bag, the distribution may look normal at first glance. But on further ANALYSIS the above RESULT was obtained.

Discussion

28.	_______ is an interactive program for generating sequence logos.(a) EMBOSS(b) WebLogo(c) LOGOLY(d) BLASTI got this question in an online quiz.My question comes from Motif Discovery in Unaligned Sequences in division Sequence Alignment of Bioinformatics
Answer» Correct CHOICE is (B) WebLogo To elaborate: In WebLogo, a user needs to enter the SEQUENCE alignment in FASTA format to allow the program to compute the logos. A graphic file is returned to the user as a RESULT.

Discussion

29.	The overall height of a logo position reflects how conserved the position is, and the _____ of each letter in a position reflects the _______ of the residue in the alignment.(a) height, relative frequency(b) width, relative frequency(c) height, amplitude(d) width, amplitudeI have been asked this question in a national level competition.The query is from Motif Discovery in Unaligned Sequences topic in chapter Sequence Alignment of Bioinformatics
Answer» The CORRECT option is (a) height, RELATIVE frequency For explanation: The height expresses the data about the extent of the conservation of the position and each LETTER SHOWS the frequency of that particular residue. The amplitude, here in this case, is irrelevant option.

Discussion

30.	Which of the following statements about InterPro is incorrect regarding its features?(a) Protein relatedness is defined by the P-values from the BLAST alignments(b) The most closely related sequences are grouped into the lowest level clusters(c) More distant protein groups are merged into higher levels of clusters(d) The outcome of this cluster merging is a tree-like structure of functional categoriesThis question was posed to me in unit test.The above asked question is from Protein Family Databases in chapter Sequence Alignment of Bioinformatics
Answer» Correct CHOICE is (a) Protein relatedness is DEFINED by the P-values from the BLAST alignments The best I can explain: InterPro is a DATABASE of clusters of homologous PROTEINS similar to COG. Protein relatedness is defined by the E-values from the BLAST alignments. The database further provides gene ontology information for protein cluster at each level as well as keywords from InterPro domains for functional prediction.

Discussion

31.	Which of the following feature of Bayesian methods is the disadvantage of it?(a) A length and distance that gives the highest overall probability may be determined(b) They are used to calculate evolutionary distance(c) Computationally Bayesian methods are better(d) A specific mutational model is requiredThis question was posed to me during an internship interview.This question is from Bayesian Statistics in portion Sequence Alignment of Bioinformatics
Answer» The correct choice is (d) A specific MUTATIONAL MODEL is required To elaborate: One disadvantage of the BAYESIAN approach is that a specific mutational model is required, whereas other methods, such as the maximum likelihood approach, can be used to estimate the BEST mutational model as well as the distance. Computationally, however, the Bayesian method is much more practical.

Discussion

32.	In the GCG and FASTA program suites, the scoring matrix itself is formatted in a way that includes default ______(a) gap additions(b) alignment scores(c) score penalties(d) gap penaltiesThis question was posed to me during an online exam.My doubt stems from Use of Scoring Matrices and Gap Penalties in Sequence Alignments in chapter Sequence Alignment of Bioinformatics
Answer» Correct ANSWER is (d) gap penalties Best EXPLANATION: These program suites include default gap penalties. When deciding gap penalties for local alignment programs, a CONSIDERATION is that the penalties should be large enough to provide a local alignment of the SEQUENCES.

Discussion

33.	A gap opening penalty for any gap (g) and a gap extension penalty for each element in the gap (r) are most often used, to give a total gap score wx, according to the equation ______(a) wx – rx = -g(b) wx = g – rx(c) wx = g + rx(d) wx + g + rx = 0I had been asked this question by my school teacher while I was bunking the class.Question is taken from Use of Scoring Matrices and Gap Penalties in Sequence Alignments in portion Sequence Alignment of Bioinformatics
Answer» Right answer is (c) wx = g + RX For explanation I WOULD SAY: wx = g + rx is the equation where x is the LENGTH of the gap. in some formulations of the gap penalty, the equation wx= g+ r (x – 1) is used. Thus, the gap extension penalty is not added to the gap opening penalty until the gap size is 2.

Discussion

34.	Dayhoff PAM matrices, are based on an evolutionary model of protein change, whereas, BLOSUM matrices, are designed to identify members of the same family.(a) True(b) FalseI had been asked this question during an online exam.My question is from Dynamic Programming Algorithm for Sequence Alignment topic in chapter Sequence Alignment of Bioinformatics
Answer» Correct choice is (a) True The EXPLANATION is: There is a very large NUMBER amino acid scoring MATRICES in use, some much more popular than others, and these scoring matrices are designed for different purposes. Some, such as the Dayhoff PAM matrices, are based on an evolutionary model of protein change, whereas others, such as the BLOSUM matrices, are designed to IDENTIFY members of the same family. Alignments between DNA sequences require similar kinds of considerations.

Discussion

35.	Which of the following is wrong in case of substitution matrices?(a) They determine likelihood of homology between two sequences(b) They use system where substitutions that are more likely should get a higher score(c) They use system where substitutions that are less likely should get a lower score(d) BLOSUM-X type uses logarithmic identity to find similarityThe question was asked by my college professor while I was bunking the class.I need to ask this question from Global Sequence Alignment topic in division Sequence Alignment of Bioinformatics
Answer» The correct CHOICE is (d) BLOSUM-X type uses logarithmic identity to find similarity The best explanation: BLOSUM-X type identifies SEQUENCES that are X% similar to the query sequence i. E. score 54 corresponds to 54% similarity hence reducing the complexity of the output and giving the similarity in percentage. ALSO, these matrices are popular in bioinformatics DUE to their speed and accuracy.

Discussion

36.	Which of the following statements about SUPERFAMILY database is incorrect regarding its features?(a) Sequences can be submitted raw or FASTA format(b) Sequences must be submitted in FASTA format only(c) It searches the database using a superfamily, family, or species name plus a sequence, SCOP, PDB or HMM ID’s(d) It has generated GO annotations for evolutionarily closed domains and distant domainsThis question was posed to me during an interview.The doubt is from Protein Family Databases in chapter Sequence Alignment of Bioinformatics
Answer» CORRECT OPTION is (b) Sequences must be submitted in FASTA format only Best EXPLANATION: SUPERFAMILY is a DATABASE of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP super families. Sequences can be amino acids, a fixed frame NUCLEOTIDE sequence, or all frames of a submitted nucleotide sequence. Up to 1000 sequences can be run at a time.

Discussion

37.	Point out the wrong or irrelevant mathematical method in motif analysis.(a) Enumeration(b) Probabilistic Optimization(c) Deterministic Optimization(d) Literature miningI have been asked this question in a national level competition.I'd like to ask this question from Motif and Domain Databases Using Statistical Models topic in division Sequence Alignment of Bioinformatics
Answer» Right choice is (d) Literature mining The explanation: All the REST of the options are indeed valid and proven mathematical methods that contain efficient algorithms in finding motifs in PROTEIN sequences. Literature mining is not a mathematical algorithm or tool as such to be used in identifying motifs. But it is definitely a part of research when it COMES to find a FUNCTION of various protein sequences.

Discussion

38.	Which of the following is false in case of the database Pfam and its algorithm?(a) Each motif or domain is represented by an HMM profile generated from the seed alignment of a number of conserved homologous proteins(b) Since the probability scoring mechanism is more complex in HMM than in a profile-based approach the use of HMM yields further increases in sensitivity of the database matches(c) Pfam-B only contains sequence families not covered in Pfam(d) The functional annotation of motifs in Pfam-A is often related to that in UNIPROTThe question was asked by my college professor while I was bunking the class.The doubt is from Motif and Domain Databases Using Statistical Models in chapter Sequence Alignment of Bioinformatics
Answer» CORRECT option is (d) The functional annotation of motifs in Pfam-A is often related to that in UNIPROT Explanation: Pfam is a DATABASE with protein domain alignments DERIVED from sequences in SWISSPROT and TrEMBL. The Pfam database is composed of two parts, Pfam-A and Pfam-B. Pfam-A involves manual alignments and Pfam-B, automatic alignment in a way similar to ProDom. The functional annotation of motifs in Pfam-A is often related to that in PROSITE. Because of the automatic nature, Pfam-B has a MUCH larger coverage but is also more ERROR prone because some HMMs are generated from unrelated sequences.

Discussion

39.	When did Smith–Waterman first describe the algorithm for local alignment?(a) 1950(b) 1970(c) 1981(d) 1925This question was addressed to me by my school principal while I was bunking the class.The query is from Local Sequence Alignment topic in division Sequence Alignment of Bioinformatics
Answer» Correct option is (C) 1981 Explanation: The algorithm was first PROPOSED by Temple F. Smith and Michael S. Waterman in 1981. The Smith–Waterman algorithm performs local sequence ALIGNMENT; that is, for determining similar regions between two STRINGS of nucleic ACID sequences or protein sequences.

Discussion

40.	Which of the following does not describe global alignment algorithm?(a) It attempts to align every residue in every sequence(b) It is most useful when the aligning sequences are similar and of roughly the same size(c) It is useful when the aligning sequences are dissimilar(d) It can use Needleman-Wunsch algorithmThe question was asked in an internship interview.This interesting question is from Global Sequence Alignment in chapter Sequence Alignment of Bioinformatics
Answer» Correct ANSWER is (c) It is useful when the ALIGNING sequences are dissimilar The best I can EXPLAIN: Performing global alignment is most useful when the aligning sequences are SIMILAR and of roughly the same size. This is most useful to find the similarities among the ORGANISMS that are roughly connected on the timeline.

Discussion

41.	Which of the following is untrue in case of the database BLOCKS?(a) The alignments are automatically generated using the same data sets used for deriving the BLOSUM matrices(b) The derived ungapped alignments are called ‘blocks’, which are usually longer than motifs, are subsequently converted to PSSMs(c) A weighting scheme and pseudo counts are subsequently applied to the PSSMs to account for underrepresented and unobserved residues in alignments(d) The functional annotation of blocks is not consistent with that for the motifsThe question was posed to me in an interview for job.The query is from Motif and Domain Databases Using Statistical Models in chapter Sequence Alignment of Bioinformatics
Answer» Right choice is (d) The functional annotation of blocks is not consistent with that for the MOTIFS To elaborate: BLOCKS is a database that uses multiple alignments derived from the most conserved, ungapped regions of homologous protein sequences. Because blocks often ENCOMPASS motifs, the functional annotation of blocks is thus consistent with that for the motifs. A query sequence can be USED to align with pre-computed profiles in the database to select the highest scored MATCHES. Because of the use of the weighting SCHEME, the signal-to-noise ratio is improved relative to PRINTS.

Discussion

42.	Which of the following statements about COG is incorrect regarding its features?(a) Currently, there are 4,873 clusters in the COG databases derived from unicellular organisms(b) It is constructed by comparing protein sequences encoded in forty-three completely sequenced genomes, which are mainly from prokaryotes, representing thirty major phylogenetic lineages(c) The interface for sequence searching in the COG database is the COGnitor program, which is based on gapped BLAST(d) It is a protein family database based on structural classificationThe question was asked in unit test.The question is from Protein Family Databases in division Sequence Alignment of Bioinformatics
Answer» The correct option is (d) It is a PROTEIN family database based on structural CLASSIFICATION The explanation is: COG which STANDS for Cluster of Orthologous Groups, is a protein family database based on phylogenetic classification. Because orthologous proteins shared by three or more lineages are considered to have descended through a VERTICAL evolutionary scenario, if the FUNCTION of one of the members is known, functionality of other members can be assigned.

Discussion

43.	Which of the following is not a member database of InterPro?(a) SCOP(b) HAMAP(c) PANTHER(d) PfamThe question was asked in an online quiz.This is a very interesting question from Protein Family Databases topic in chapter Sequence Alignment of Bioinformatics
Answer» Correct option is (a) SCOP To ELABORATE: The SIGNATURES from INTERPRO come from 11 member databases viz. CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, TIGRFAMs.

Discussion

44.	Which of the following is true regarding the assumptions in the method of constructing the Dayhoff scoring matrix?(a) it is assumed that each amino acid position is equally mutable(b) it is assumed that each amino acid position is not equally mutable(c) it is assumed that each amino acid position is not mutable at all(d) sites do not vary in their degree of mutabilityThe question was asked in quiz.Question is from Use of Scoring Matrices and Gap Penalties in Sequence Alignments topic in chapter Sequence Alignment of Bioinformatics
Answer» Correct choice is (a) it is assumed that each amino acid position is equally mutable Explanation: In this process, first, it is assumed that each amino acid position is equally mutable, whereas, in fact, SITES VARY CONSIDERABLY in their DEGREE of mutability. Mutagenesis HOT spots are well known in molecular genetics, and variations in mutability of different amino acid sites in proteins are well known.

Discussion

45.	Pfam is available at four locations around the world. Which of the following is not one of them?(a) UK(b) Sweden(c) US(d) JapanThis question was posed to me in an online interview.My question comes from Protein Family Databases topic in division Sequence Alignment of Bioinformatics
Answer» Correct option is (d) Japan Explanation: Pfam is available at FOUR locations around the world each PROVIDING a core SET of functionality for ACCESSING each family. They are US, UK, Sweden and France. Documentation on the CONTENT and use of Pfam is available via the web.

Discussion

46.	The matrices PAM250 and BLOSUM62 contain _______(a) positive and negative values(b) positive values only(c) negative values only(d) neither positive nor negative values, just the percentageI had been asked this question during an interview for a job.My doubt stems from Dynamic Programming Algorithm for Sequence Alignment topic in section Sequence Alignment of Bioinformatics
Answer» Right choice is (a) positive and negative values For explanation I would say: These MATRICES contain positive and negative values, reflecting the likelihood of each AMINO acid substitution in related proteins. Using these tables, an alignment of a sequential set of amino acid pairs with no gaps RECEIVES an overall score that is the sum of the positive and negative log odds scores for each INDIVIDUAL amino acid pair in the alignment.

Discussion

47.	After the derivation, the outputs of the dynamic programming are the ratios are called even scores.(a) True(b) FalseThis question was posed to me during a job interview.I want to ask this question from Dynamic Programming Algorithm for Sequence Alignment topic in chapter Sequence Alignment of Bioinformatics
Answer» The CORRECT choice is (b) False The explanation: After the derivation, the outputs of the dynamic programming are the RATIOS are called odd scores. The ratios are transformed to logarithms of odds scores, called LOG odds scores, so that scores of sequential pairs may be added to reflect the overall odds of a REAL to chance alignment of an alignment. This happens in DAYHOFF PAM250 and BLOSUM62.

Discussion

48.	When was this method, first described?(a) 1959(b) 1966(c) 1970(d) 1982The question was asked in exam.I'd like to ask this question from Dot Matrix Sequence Comparison topic in chapter Sequence Alignment of Bioinformatics
Answer» Correct answer is (c) 1970 The best I can explain: This method was FIRST described in 1970. Briefly, this method involves constructing a matrix with one of the SEQUENCES to be compared running HORIZONTALLY across the BOTTOM, and the other running vertically along the left-hand SIDE.

Discussion

49.	Conserved positions have _____ residues and bigger symbols.(a) fewer(b) more(c) maximum(d) minimumThe question was posed to me in a job interview.Query is from Motif Discovery in Unaligned Sequences in portion Sequence Alignment of Bioinformatics
Answer» Right option is (a) fewer The EXPLANATION: The options MAXIMUM and minimum are comparatively OBSOLETE as there involves the studies of alignment. Conserved positions have fewer residues and bigger symbols; whereas less conserved positions have a more heterogeneous mixture of smaller symbols stacked together. In GENERAL, a SEQUENCE logo provides a clearer description of a consensus sequence.

Discussion

50.	MEME stands for _____________(a) Multiple Expectation Maximization for Motif Elicitation(b) Multiple Expectation Maximization for Motif Extraction(c) Mega Expectation Maximization for Motif Elicitation(d) Micro Expectation Maximization for Motif ExtractionI had been asked this question during a job interview.Question is taken from Motif Discovery in Unaligned Sequences topic in portion Sequence Alignment of Bioinformatics
Answer» The CORRECT option is (a) MULTIPLE Expectation Maximization for Motif ELICITATION Easy explanation: Multiple Expectation Maximization for Motif Elicitation is a web-based PROGRAM that uses the EM algorithm to find motifs either for DNA or protein sequences. It uses amodified EM algorithm to avoid the local MINIMUM problem.

Discussion

Explore topic-wise InterviewSolutions in .

Isolated dots that are not on the diagonal represent exact matches.(a) True(b) FalseThis question was addressed to me in an online interview.This interesting question is from Dot Matrix Sequence Comparison in division Sequence Alignment of Bioinformatics

In scoring matrices, for convenience, odds scores are converted to log odds scores.(a) True(b) FalseI had been asked this question during a job interview.This key question is from Use of Scoring Matrices and Gap Penalties in Sequence Alignments in portion Sequence Alignment of Bioinformatics

Who were the inventors of this method?(a) Smith-Waterman(b) Margaret Preston(c) Gibbs and McIntyre(d) Needleman-WunschThis question was posed to me during an internship interview.Query is from Dot Matrix Sequence Comparison in division Sequence Alignment of Bioinformatics

Dot plot of repeating elements would be small crosses on plot.(a) True(b) FalseI have been asked this question during an internship interview.I need to ask this question from Dot Matrix Sequence Comparison topic in chapter Sequence Alignment of Bioinformatics

_______ is an interactive program for generating sequence logos.(a) EMBOSS(b) WebLogo(c) LOGOLY(d) BLASTI got this question in an online quiz.My question comes from Motif Discovery in Unaligned Sequences in division Sequence Alignment of Bioinformatics

When did Smith–Waterman first describe the algorithm for local alignment?(a) 1950(b) 1970(c) 1981(d) 1925This question was addressed to me by my school principal while I was bunking the class.The query is from Local Sequence Alignment topic in division Sequence Alignment of Bioinformatics

Which of the following is not a member database of InterPro?(a) SCOP(b) HAMAP(c) PANTHER(d) PfamThe question was asked in an online quiz.This is a very interesting question from Protein Family Databases topic in chapter Sequence Alignment of Bioinformatics

Pfam is available at four locations around the world. Which of the following is not one of them?(a) UK(b) Sweden(c) US(d) JapanThis question was posed to me in an online interview.My question comes from Protein Family Databases topic in division Sequence Alignment of Bioinformatics

When was this method, first described?(a) 1959(b) 1966(c) 1970(d) 1982The question was asked in exam.I'd like to ask this question from Dot Matrix Sequence Comparison topic in chapter Sequence Alignment of Bioinformatics

Conserved positions have _____ residues and bigger symbols.(a) fewer(b) more(c) maximum(d) minimumThe question was posed to me in a job interview.Query is from Motif Discovery in Unaligned Sequences in portion Sequence Alignment of Bioinformatics