79 + Interview Questions in Multiple Sequence Alignment in Bioinformatics Page 1 InterviewSolution

1.	GDE (Genetic Data Environment) provides a general interface on UNIX machines for sequence analysis, sequence alignment editing, and display.(a) True(b) FalseI got this question in an interview for internship.The question is from Position topic in section Multiple Sequence Alignment of Bioinformatics
Answer» Correct choice is (a) True To explain I would say: It is available from SEVERAL ANONYMOUS FTP SITES. This interface requires communication with a host UNIX machine running the Genetics Computer Group software. Interface with MS-DOS or Macintosh is possible if the computer is equipped with the appropriate X-Windows CLIENT software.

Discussion

2.	In the intermediate steps of EM algorithm, the number of each base in each column is determined and then converted to fractions.(a) True(b) FalseI have been asked this question in an interview.The origin of the question is Statistical Methods for Aiding Alignment in chapter Multiple Sequence Alignment of Bioinformatics
Answer» The CORRECT OPTION is (a) True Explanation: For example, that there are four GS in the first column of the 10 sequences, then the frequency of G in the first column of the site, FSG = 4/10 = 0.4. This procedure is repeated for each base and each column.

Discussion

3.	A dotplot is a visual and qualitative technique whereas the sequence alignment is an exact and quantitative measure of similarity of alignments.(a) True(b) FalseThe question was posed to me in an online quiz.This question is from Needleman topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» CORRECT CHOICE is (a) True For explanation I would say: The sequence ALIGNMENT is an exact and quantitative measure of similarity of alignments. It INVOLVES Construction of the best alignment between the sequences and assessment of the similarity from the alignment.

Discussion

4.	Out of the two repeated steps in EM algorithm, the step 2 is ________(a) the maximization step(b) the minimization step(c) the optimization step(d) the normalization stepThe question was asked in a national level competition.My question is based upon Statistical Methods for Aiding Alignment in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Right choice is (a) the maximization step To ELABORATE: In step 2, the maximization step, the new counts of bases or AMINO acids for each position in the site found in step 1 are SUBSTITUTED for the previous set. Step 1 is then repeated using these new counts. The cycle is repeated until the algorithm converges on a solution and does not change with further cycles. At that time, the best location of the site in each sequence and the best estimate of the residue composition of each column in the site will be available.

Discussion

5.	In a multidimensional search matrix, for aligning N sequences, an (N+2)-dimensional matrix is needed to be filled with alignment scores.(a) True(b) FalseThis question was addressed to me in an online quiz.The doubt is from Exhaustive Algorithms in portion Multiple Sequence Alignment of Bioinformatics
Answer» Right option is (b) False The BEST explanation: In a multidimensional search matrix, for aligning N sequences, an N-dimensional matrix is needed to be filled with alignment scores. For instance, for three sequences, a three-dimensional matrix is REQUIRED to ACCOUNT for all possible alignment scores. Back-tracking is applied through the three-dimensional matrix to FIND the HIGHEST scored path that represents the optimal alignment.

Discussion

6.	An alternative method is to produce an odds scoring matrix calculated by dividing each base frequency by the background frequency of that base.(a) True(b) FalseI have been asked this question in semester exam.I need to ask this question from Statistical Methods for Aiding Alignment topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» The correct answer is (a) True For explanation: In this method, the probability of each location is then FOUND by multiplying the odds scores from each column. An even SIMPLER method is to use log odds scores in the matrix. The column scores are then simply ADDED. In this case, the log odds scores must be CONVERTED to odds scores before position probabilities are calculated.

Discussion

7.	The resulting tree is then used to guide the alignment of the most closely related sequences and groups of sequences. The resulting alignment is a global alignment produced by the Needleman-Wunsch algorithm.(a) True(b) FalseI had been asked this question in an interview for job.My question is from Progressive Methods of Multiple Sequence Alignment topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» The CORRECT answer is (a) True The best explanation: The very first sequences to be aligned are the most closely related on the sequence TREE. If these sequences align very well, there will be few errors in the initial alignments. HOWEVER, the more distantly related these sequences, the more errors will be made, and these errors will be propagated to the MSA. There is no simple way to circumvent this PROBLEM. A SECOND problem with the progressive alignment method is the choice of suitable scoring matrices and gap penalties that apply to the set of sequences.

Discussion

8.	Which of the following is untrue regarding the block analysis method?(a) Blocks represent a conserved region in the MSA(b) Blocks differ from profiles in lacking insert and delete positions in the sequences(c) Every column includes only matches and mismatches(d) Blocks may be made by searching for a section of an MSA alignment that is low conservedThis question was posed to me during a job interview.I need to ask this question from Localized Alignments in Sequences in portion Multiple Sequence Alignment of Bioinformatics
Answer» Correct answer is (d) Blocks may be made by SEARCHING for a section of an MSA alignment that is low CONSERVED The best explanation: LIKE profiles, blocks may be made by searching for a section of an MSA alignment that is highly conserved. HOWEVER, aligned regions may also be found by searching each sequence in turn for similar patterns of the same length. These patterns may include a region with one or a few matching characters followed by a short spacer region of unmatched characters and then by another set of a few matching characters, and so on, until the sequences START to be different.

Discussion

9.	The second step in the Genetic Algorithm comprises of scoring of the 100 initial MSAs by the sum of pairs method.(a) True(b) FalseI have been asked this question in a national level competition.This interesting question is from Iterative Methods of Multiple Sequence Alignment topic in section Multiple Sequence Alignment of Bioinformatics
Answer» Right choice is (a) True BEST explanation: The 100 initial MSAs are scored by the SUM of pairs method, except that both natural and quasi-natural gap-SCORING schemes are used. Recall that the best SSP score for a MSA is the minimum ONE and the one that is closest to the sum of the pair-wise sequence alignment. Standard amino acid scoring matrices and gap OPENING and extension penalties are used.

Discussion

10.	Which of the following is untrue about Needleman-Wunsch algorithm?(a) It is an example of dynamic programming(b) Basic idea here is to build up the best alignment by using optimal alignments of larger subsequences(c) It was first used by Saul Needleman and Christian Wunsch(d) It was first used in 1970This question was posed to me during an online interview.Query is from Needleman topic in section Multiple Sequence Alignment of Bioinformatics
Answer» The CORRECT option is (b) Basic idea here is to build up the best alignment by USING optimal ALIGNMENTS of LARGER subsequences The explanation: In case of Needleman-Wunsch algorithm, the basic idea here is to build up the best alignment by using optimal alignments of SMALLER subsequences. It is based on dynamic programming, a discipline invented by Richard Bellman in 1953.

Discussion

11.	In EM algorithm, as an example, suppose that there are 10 DNA sequences having very little similarity with each other, each about 100 nucleotides long and thought to contain a binding site near the middle 20 residues, based on biochemical and genetic evidence. the following steps would be used by the EM algorithm to find the most probable location of the binding sites in each of the ______ sequences.(a) 30(b) 10(c) 25(d) 20I have been asked this question in an interview for internship.This interesting question is from Statistical Methods for Aiding Alignment in section Multiple Sequence Alignment of Bioinformatics
Answer» The correct choice is (b) 10 The explanation is: When EXAMINING the EM program MEME, the size and number of binding sites, the LOCATION in each SEQUENCE, and whether or not the SITE is present in each sequence do not necessarily have to be known. For the present example, the following steps would be used by the EM ALGORITHM to find the most probable location of the binding sites in each of the 10 sequences.

Discussion

12.	Although MOTIF program is used successfully for making the BLOCKS database, it is limited in the pattern sizes that can be found.(a) True(b) FalseI had been asked this question in an interview for job.This interesting question is from Localized Alignments in Sequences topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer»

Discussion

13.	Which of the following is incorrect regarding PRRP?(a) The program PRRP uses iterative methods to produce an alignment(b) An initial pair-wise alignment is made to predict a tree(c) Only one cycle is performed(d) The whole process is repeated until there is no further increase in the alignment scoreThis question was posed to me in an interview for internship.Query is from Iterative Methods of Multiple Sequence Alignment in chapter Multiple Sequence Alignment of Bioinformatics
Answer» The CORRECT answer is (C) Only one cycle is performed To explain: As mentioned, an initial pair-wise alignment is made to predict a TREE, the tree is used to produce weights for making alignments in the same manner as MSA except that the sequences are analyzed for the presence of aligned regions that INCLUDE gaps rather than being globally aligned, and these regions are iteratively recalculated to improve the alignment score. The best scoring alignment is then used in a new cycle of calculations to predict a new tree, new weights, and new alignments.

Discussion

14.	Profiles are found by performing the _____ MSA of a group of sequences and then removing the _______ regions in the alignment into a smaller MSA.(a) local, more highly conserved(b) global, low conserved(c) global, more highly conserved(d) local, low conservedThe question was asked in an online quiz.My doubt stems from Localized Alignments in Sequences topic in section Multiple Sequence Alignment of Bioinformatics
Answer» The correct choice is (c) global, more highly CONSERVED For explanation I would say: Profiles are found by performing the global MSA of a group of sequences and then removing the more highly conserved regions in the alignment into a smaller MSA. A scoring MATRIX for the MSA, CALLED a profile, is then made. The profile is composed of columns MUCH like a mini-MSA and may include matches, mismatches, insertions, and deletions.

Discussion

15.	The Genetic Algorithm method has been recently adapted for MSA(Multiple Sequence Alignment) by Corpet (1998).(a) True(b) FalseThe question was asked during an online interview.This intriguing question comes from Iterative Methods of Multiple Sequence Alignment topic in section Multiple Sequence Alignment of Bioinformatics
Answer» Correct choice is (b) False Easy explanation: The genetic algorithm is a GENERAL type of machine-learning algorithm that has no DIRECT relationship to biology and that was invented by COMPUTER scientists. The method has been recently adapted for MSA (Multiple Sequence Alignment) by Notredame and Higgins (1996) in a computer program package called SAGA (Sequence Alignment by Genetic Algorithm).

Discussion

16.	Even if many pseudocounts are added in comparison to real sequence counts, the amino acid frequencies will not have any effect or influence.(a) True(b) FalseI got this question in an online interview.My enquiry is from Position in division Multiple Sequence Alignment of Bioinformatics
Answer» Correct CHOICE is (b) False Explanation: Knowing how many counts to add is a difficult but fortunately solvable problem. On the one hand, if too many pseudocounts are added in comparison to real sequence counts, the pseudocounts will become the DOMINANT influence in the amino acid frequencies and SEARCHES using the motif will not WORK. On the other hand, if there are relatively few real counts, many amino acid VARIATIONS may not be present because of the small sample of sequences.

Discussion

17.	The initial alignments used to produce the guide tree may be obtained by various methods. Which of the following is not one of them?(a) Fast k-tuple(b) pattern-finding approach similar(c) FASTA(d) Faster, full dynamic programming methodThe question was asked by my college professor while I was bunking the class.This intriguing question originated from Progressive Methods of Multiple Sequence Alignment in section Multiple Sequence Alignment of Bioinformatics
Answer» Right option is (d) Faster, full dynamic programming method The best explanation: The methods used, might be fast k-tuple or pattern-finding approach similar to FASTA that is useful for MANY sequences and the full dynamic programming method as well. But the option d becomes incorrect as full dynamic programming method is slower as COMPARED to rest of the methods in OPTIONS.

Discussion

18.	Clustal is a progressive multiple alignment program available either as a stand-alone or on-line program.(a) True(b) FalseI had been asked this question in an interview.Enquiry is from Heuristic Algorithms in division Multiple Sequence Alignment of Bioinformatics
Answer» The correct answer is (a) True Explanation: Probably the most well-known progressive alignment program is CLUSTAL. The stand-alone program, which RUNS on UNIX and MACINTOSH, has two variants, Clustal W and Clustal X. The W version PROVIDES a simple text-based interface and the X version provides a more user-friendly graphical interface.

Discussion

19.	Given a multiple alignment of three sequences, the sum of scores is calculated as the sum of the dissimilarity scores of every pair of sequences at each position.(a) True(b) FalseI had been asked this question during a job interview.My question is based upon Exhaustive Algorithms in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Correct choice is (b) False For explanation I would say: Given a multiple alignment of three sequences, the sum of scores is CALCULATED as the sum of the similarity scores of every pair of sequences at each position. The scoring is BASED on the BLOSUM62 matrix. If the total score for the alignment is 5, which MEANS that the alignment is 25 = 32 times more likely to OCCUR among homologous sequences than by random CHANCE.

Discussion

20.	Which of the following is untrue regarding the Progressive Alignment Method?(a) Progressive alignment depends on the stepwise assembly of multiple alignments and is heuristic in nature(b) It speeds up the alignment of multiple sequences through a multistep process(c) It first conducts pair wise alignments for each possible pair of sequences using the Needleman–Wunsch global alignment method and records these similarity scores from the pair wise comparisons(d) Its drawback is it slows down the alignment of multiple sequences through a single step processThis question was addressed to me in examination.I'm obligated to ask this question of Heuristic Algorithms topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» The correct OPTION is (d) Its drawback is it SLOWS down the alignment of multiple SEQUENCES through a SINGLE step process Explanation: Progressive alignment speeds up the alignment of multiple sequences through a multistep process further, the scores can either be percent identity or similarity scores based on a particular substitution matrix. Both scores correlate with the evolutionary distances between sequences.

Discussion

21.	Which of the following scores are not considered while calculating the SP scores?(a) All possible pair wise matches(b) All possible mismatches(c) All possible gap costs(d) Number of gap penaltiesThe question was asked in my homework.I'd like to ask this question from Exhaustive Algorithms topic in division Multiple Sequence Alignment of Bioinformatics
Answer» Right ANSWER is (d) Number of gap penalties The explanation is: In calculating the SP scores, each column is scored by summing the scores for all possible pair wise matches, mismatches and gap COSTS. The SCORE of the entire alignment is the sum of all of the column scores. The score of the entire alignment is the sum of all of the column scores. In that case, option d BECOMES irrelevant choice here.

Discussion

22.	The alignment score is the sum of substitution scores and gap penalties in this type of algorithm.(a) True(b) FalseI have been asked this question in quiz.The origin of the question is Needleman topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» Right choice is (a) True Easiest explanation: USE of +1 as a reward for a match, -1 as the penalty for a mismatch, and IGNORING gaps is the scoring scheme of this type of method. Hence, the alignment score is the sum of substitution SCORES and gap penalties in this type of algorithm.

Discussion

23.	Which of the following is not the objective to perform sequence comparison?(a) To observe patterns of conservation(b) To find the common motifs present in both sequences(c) To study the physical properties of molecules(d) To study evolutionary relationshipsI had been asked this question in homework.Asked question is from Needleman in division Multiple Sequence Alignment of Bioinformatics
Answer» Right answer is (C) To study the PHYSICAL properties of molecules To explain: To assess whether it is likely that two sequences evolved from the same sequence COMPARISON is required. ALSO, to FIND out which sequences from the database are similar to the sequence at hand, sequence comparison is carried out.

Discussion

24.	Which of the following is not true regarding the BLOCKS?(a) The BLOCKS server can extract a conserved, ungapped region from a MSA to produce a sequence block(b) The server can also find blocks in a set of unaligned, input sequences and maintains a large database of blocks based on an analysis of proteins in the Prosite catalog(c) Blocks are found by the Protomat program(d) The program MOTIF doesn’t locate spaced patternsI got this question during an online exam.This key question is from Localized Alignments in Sequences topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» Right answer is (d) The PROGRAM MOTIF doesn’t locate spaced patterns Easiest explanation: Blocks are found in two steps: FIRST, the program MOTIF described on the previous page is USED to locate spaced patterns. The SECOND step takes the best and most consistent patterns found in step 1 and uses the program MOTOMAT to merge overlapping triplets and extend them, orders the resulting blocks, and chooses those that are in the largest subset of sequences.

Discussion

25.	Which of the following is not true about iterative methods?(a) Genetic Algorithm is method used for under this(b) Hidden Markov Models are used for Multiple Sequence Alignment(c) The objective is to improve the overall alignment score(d) MultAlin recalculates global scoresI got this question in exam.My doubt stems from Iterative Methods of Multiple Sequence Alignment topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Correct answer is (d) MultAlin recalculates global scores Easiest explanation: MultAlin (CORPET 1988) recalculates pair-wise scores during the production of a progressive Alignment. In addition, it uses these scores to recalculate the tree, which is then USED to REFINE the alignment in an EFFORT to improve the score.

Discussion

26.	For the 10-residue DNA sequence example, there are _______ possible starting sites for a 20-residue-long site.(a) 30(b) 21(c) 81(d) 60This question was addressed to me by my college director while I was bunking the class.Origin of the question is Statistical Methods for Aiding Alignment topic in section Multiple Sequence Alignment of Bioinformatics
Answer» Correct option is (c) 81 Explanation: For the 10-residue DNA SEQUENCE EXAMPLE, there are 100 – 20 +1 possible starting sites for a 20-residue-long site. Where the first one is at POSITION 1 in the sequence ENDING one at 20 and the last beginning at position 81 and ending at 100 (there is not enough sequence for a 20-residue-long site beyond position 81).

Discussion

27.	Progressive alignment methods use the dynamic programming method to build an MSA starting with the most related sequences and then progressively adding less related sequences or groups of sequences to the initial alignment.(a) True(b) FalseThis question was posed to me during a job interview.The above asked question is from Progressive Methods of Multiple Sequence Alignment in section Multiple Sequence Alignment of Bioinformatics
Answer» Right choice is (a) True Easiest EXPLANATION: The PROGRESSIVE alignment methods use the dynamic PROGRAMMING method. RELATIONSHIPS among the SEQUENCES are modeled by an evolutionary tree in which the outer branches or leaves are the sequences. The tree is based on pair-wise comparisons of the sequences using one of the phylogenetic methods.

Discussion

28.	Like other alignment programs, CLUSTAL uses a null score for opening a gap in a sequence alignment and a penalty for extending the gap by one residue.(a) True(b) FalseThe question was posed to me during an interview.The query is from Progressive Methods of Multiple Sequence Alignment topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Correct ANSWER is (b) False For explanation: CLUSTAL uses a penalty for opening a gap in a sequence alignment and an additional penalty for extending the gap by one residue. These penalties are user-defined. Gaps found in the initial alignments remain fixed. New gaps INTRODUCED as more SEQUENCES are added also receive this same gap penalty, even when they occur within an existing gap, but the gap penalties for alignment are then modified according to the average match value in the SUBSTITUTION matrix, the percent identity between the sequences, and the sequence lengths.

Discussion

29.	Which of the following is untrue regarding the progressive alignment method?(a) The program also applies a weighting scheme to increase the reliability of aligning divergent sequences (sequences with less than 25% identity)(b) The progress is done by down weighting redundant and closely related groups of sequences in the alignment by a certain factor(c) This scheme is useful in enhancing similar sequences from dominating the alignment(d) This scheme is useful in enhancing similar sequences from dominating the alignmentThis question was addressed to me in exam.Origin of the question is Heuristic Algorithms in chapter Multiple Sequence Alignment of Bioinformatics
Answer» The correct answer is (C) This scheme is useful in enhancing similar SEQUENCES from DOMINATING the alignment To explain: This scheme is useful in enhancing similar sequences from dominating the alignment. Further, the weight factor for each sequence is DETERMINED by its branch length on the guide tree. The branch LENGTHS are normalized by how many times sequences share a basal branch from the root of the tree.

Discussion

30.	The quality and quantity of information provided by the PSSM also varies for ________ in the motif.(a) each row(b) each column(c) rows and columns(d) neither the rows nor the columnsThe question was asked during an internship interview.I would like to ask this question from Position topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» Right answer is (b) each column Easiest explanation: The QUALITY and quantity of INFORMATION PROVIDED by the PSSM also varies for each column in the motif, and this variation profoundly influences the matches found with sequences. This situation can be accurately DESCRIBED by information theory, and the results can be displayed by a colored graph called a sequence logo.

Discussion

31.	In the program DIALIGN, pairs of sequences are aligned to locate aligned regions that do not include gaps, much like continuous diagonals in a dot matrix plot.(a) True(b) FalseThe question was posed to me in final exam.I want to ask this question from Iterative Methods of Multiple Sequence Alignment topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» Correct choice is (a) True Explanation: The program DIALIGN finds an alignment by a DIFFERENT iterative method. Pairs of SEQUENCES are aligned to locate aligned regions that do not INCLUDE gaps, much like continuous diagonals in a dot matrix plot. Diagonals of various lengths are identified. A consistent collection of weighted diagonals that provides an alignment which is a maximum sum of WEIGHTS is then FOUND.

Discussion

32.	Which of the following is untrue about PILEUP program?(a) It is the MSA program that is a part of the Genetics Computer Group package of sequence analysis programs(b) It is owned since 1997 by Oxford Communications and is widely used due to the popularity and availability of this package(c) It uses a method for MSA that is polar opposite to CLUSTALW(d) The sequences are aligned pair-wise using the Needleman- Wunsch dynamic programming algorithmI have been asked this question in semester exam.Enquiry is from Progressive Methods of Multiple Sequence Alignment topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» RIGHT option is (C) It uses a method for MSA that is polar opposite to CLUSTALW Best explanation: PILEUP uses a method for MSA that is very similar to CLUSTALW. The sequences are aligned pair-wise using the Needleman- WUNSCH dynamic programming algorithm, and the scores are used to produce a TREE by the unweighted pair-group method using arithmetic averages. The resulting tree is then used to guide the alignment of the most closely related sequences and groups of sequences. The resulting alignment is a global alignment produced by the Needleman-Wunsch algorithm.

Discussion

33.	Two considerations arise in trying to tune the PSSM so that it adequately represents the training sequences. Which of the following is not their description?(a) If a given column in 20 sequences has only isoleucine, it is not very likely that different amino acid will be found in other sequences with that motif because the residue is probably important for function(b) If a given column in 20 sequences has only isoleucine, it is very likely that different amino acid will be found in other sequences with that motif because the residue is probably important for function(c) If the number of sequences with the found motif is large and reasonably diverse, the sequences represent a good statistical sampling of all sequences that are ever likely to be found with that same motif(d) Another column in the motif from the 20 sequences may have several amino acids, and some amino acids may not be represented at allI have been asked this question in an online interview.This intriguing question originated from Position topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» Right option is (B) If a given column in 20 sequences has only isoleucine, it is very likely that DIFFERENT AMINO acid will be found in other sequences with that motif because the residue is probably IMPORTANT for function Easy explanation: The PSSM is constructed by a simple logarithmic TRANSFORMATION of a matrix giving the frequency of each amino acid in the motif. Even more variation may be expected at that position in other sequences, although the more abundant amino acids already found in that column would probably be favored.

Discussion

34.	Which of the following about MEME is untrue?(a) It is a Web resource for performing local MSAs (Multiple Sequence Alignment) by the above expectation maximization method is the program MEME(b) It stands for Multiple EM for Motif Elicitation(c) It was developed at developed at the University of California at San Diego Supercomputing Center(d) The Web page has multiple versions for searching blocks by an EM algorithmThe question was asked in an online quiz.My enquiry is from Statistical Methods for Aiding Alignment in section Multiple Sequence Alignment of Bioinformatics
Answer» The correct choice is (d) The Web page has multiple versions for searching blocks by an EM algorithm The best explanation: The Web page for two versions of MEME, ParaMEME, a Web program that SEARCHES for blocks by an EM algorithm (Described below), and a similar program MetaMEME (which searches for profiles USING HMMs, described below).The Motif Alignment and Search TOOL (MAST) for searching through databases for matches to MOTIFS.

Discussion

35.	The Expectation Maximization algorithm has been used to identify conserved domains in unaligned proteins only.(a) True(b) FalseThe question was posed to me in semester exam.This is a very interesting question from Statistical Methods for Aiding Alignment topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Right OPTION is (b) False The explanation: This ALGORITHM has been used to identify both conserved domains in unaligned proteins and protein-binding sites in unaligned DNA sequences (Lawrence and Reilly 1990), including sites that may include GAPS (CARDON and Stormo 1992). Given are a set of sequences that are expected to have a common sequence pattern and may not be easily recognizable by eye.

Discussion

36.	Block analysis methods use substitution matrices such as the PAM and BLOSUM matrices to score matches.(a) True(b) FalseThe question was posed to me in examination.I need to ask this question from Localized Alignments in Sequences topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» The correct option is (b) False Explanation: These methods do not use substitution matrices such as the PAM and BLOSUM matrices to score matches. RATHER, they are based on finding exact matches that have the same SPACING in at least some of the input sequences, and that may be repeated in a GIVEN sequence.

Discussion

37.	The global sequence alignment is suitable when the two sequences are of dissimilar length, with a negligible degree of similarity throughout.(a) True(b) FalseI got this question during an internship interview.The origin of the question is Needleman topic in division Multiple Sequence Alignment of Bioinformatics
Answer» Correct choice is (b) False For explanation I would say: The GLOBAL sequence alignment is suitable when the two sequences are of SIMILAR length, with a SIGNIFICANT degree of similarity throughout. It GIVES the BEST alignment over the entire length of two sequences.

Discussion

38.	Which of the following is untrue about PRRN?(a) PRRN is a web-based program that uses a double nested iterative strategy for multiple alignment(b) It performs multiple alignments through two sets of iterations: inner iteration and outer iteration(c) In the outer iteration, an initial random alignment is generated that is used to derive a UPGMA tree(d) In the inner iteration, the sequences are randomly divided into multiple groupsI have been asked this question during an internship interview.I would like to ask this question from Heuristic Algorithms topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Right answer is (d) In the inner iteration, the sequences are randomly divided into multiple groups To explain: In the inner iteration, the sequences are randomly divided into two groups. Randomized alignment is used for each group in the INITIAL cycle, after which the alignment positions in each group are fixed. The two groups, each TREATED as a single SEQUENCE, are then aligned to each other using global dynamic programming. The process is repeated through MANY cycles until the total SP score no longer increases. At this point, the resulting alignment is used to construct a new UPGMA tree.

Discussion

39.	Which of the following is untrue regarding T-Coffee?(a) It stands for Tree-based Consistency Objective Function for alignment Evaluation(b) It performs progressive sequence alignments as in Clustal.(c) The global pair wise alignment is not performed using the Clustal program.(d) The local pair wise alignment is generated by the Lalign program, from which the top ten scored alignments are selectedThis question was addressed to me by my college professor while I was bunking the class.The doubt is from Heuristic Algorithms in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Right choice is (c) The global pair WISE alignment is not performed using the CLUSTAL program. The explanation: The global pair wise alignment is performed using the Clustal program. The main difference is that, in processing a query, T-Coffee performs both global and local pair wise alignment for all possible pairs involved. The collection of local and global sequence ALIGNMENTS is pooled to FORM a library. The consistency of the alignments is evaluated.

Discussion

40.	Progenitor sequences represented by the ______ branches of the tree are derived by alignment of the _______ sequences.(a) outer, outermost(b) inner, outermost(c) inner, innermost(d) outer, innermostI have been asked this question in an interview for job.This intriguing question originated from Progressive Methods of Multiple Sequence Alignment in division Multiple Sequence Alignment of Bioinformatics
Answer» The correct ANSWER is (B) inner, OUTERMOST For explanation I would say: Progenitor SEQUENCES represented by the inner branches of the tree are derived by alignment of the outermost sequences. These inner branches will have uncertainties where positions in the outermost sequences are DISSIMILAR.

Discussion

41.	Which of the following is untrue about DCA?(a) It stands for Divide-and-Conquer Alignment(b) It works by breaking each of the sequences into two smaller sections(c) The breaking points during the process are determined based on regional similarity of the sequences(d) If the sections are not short enough, further divisions are restricted as wellThis question was addressed to me in a job interview.Asked question is from Exhaustive Algorithms topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» Right choice is (d) If the sections are not short enough, further divisions are restricted as well The best I can explain: This is a web-based program that is in fact semi exhaustive because certain STEPS of computation are REDUCED to heuristics. If the sections are not short enough, further divisions are carried out. When the lengths of the sequences reach a predefined threshold, DYNAMIC programming is applied for aligning each set of subsequences. The resulting short alignments are joined together head to TAIL to yield a multiple ALIGNMENT of the entire length of all sequences.

Discussion

42.	If the data set is _______ then unless the motif has __________ amino acids in each column, the column frequencies in the motif may not be highly representative of all other occurrences of the motif.(a) small, distinct(b) small, almost identical(c) large, almost identical(d) large, distinctThis question was addressed to me in homework.This intriguing question originated from Position topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Correct answer is (b) small, almost identical The explanation is: The number of sequences for producing the MOTIF may be small, highly diverse, or COMPLEX, GIVING rise to a second level of consideration. If the data set is small, then unless the motif has almost identical amino acids in each column, the column frequencies in the motif may not be highly representative of all other occurrences of the motif. In such cases, it is desirable to improve the estimates of the amino acid frequencies by adding extra amino acid counts, called pseudocounts, to OBTAIN a more reasonable DISTRIBUTION of amino acid frequencies in the column.

Discussion

43.	The program Profilemake can be used to produce a profile from an MSA.(a) True(b) FalseThis question was posed to me by my school principal while I was bunking the class.The above asked question is from Localized Alignments in Sequences in section Multiple Sequence Alignment of Bioinformatics
Answer» The correct answer is (a) True Best explanation: A version of the Profilesearch program, which performs a database search for matches to a profile, is available at the University of Pittsburgh Supercomputer CENTER. A special GRANT application may be NEEDED to use this FACILITY. Profile-generating programs are available by FTP and are included in the Genetics Computer Group suite of programs.

Discussion

44.	CLUSTALW is a more recent version of CLUSTAL with the W standing for ________(a) weakening(b) winding(c) weighting(d) wipingI got this question at a job interview.My question is taken from Progressive Methods of Multiple Sequence Alignment in chapter Multiple Sequence Alignment of Bioinformatics
Answer» The correct option is (C) weighting Explanation: The W in CLUSTALW stands for ‘weighting’ to REPRESENT the ability of the program to provide weights to the SEQUENCE and program parameters. CLUSTAL has been around for more than 10 YEARS and lots of improvements in the program have been made.

Discussion

45.	There are two types matrices involved in the study- score matrices and trace matrices.(a) True(b) FalseThis question was addressed to me in final exam.My query is from Needleman topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» CORRECT option is (a) True To explain: The Needleman-Wunsch algorithm consists of three steps where these matrices play their role as follows: i. Initialization of the score MATRIX ii. CALCULATION of SCORES and filling the traceback matrix iii. Deducing the ALIGNMENT from the traceback matrix

Discussion

46.	Which of the following is untrue about Protein substitution matrices?(a) They are significantly more complex than DNA scoring matrices(b) They have the N x N matrices of the amino acids(c) Protein substitution matrices have quite important role in evolutionary studies(d) They are significantly quite less complex than DNA scoring matricesThe question was posed to me in exam.I want to ask this question from Needleman in chapter Multiple Sequence Alignment of Bioinformatics
Answer» The correct answer is (d) They are significantly quite less COMPLEX than DNA scoring matrices For explanation I WOULD say: Protein substitution matrices are significantly more complex than DNA scoring matrices. Proteins are composed of twenty amino acids, and physico-chemical PROPERTIES of individual amino acids vary considerably. A protein substitution matrix can be BASED on any property of amino acids: size, polarity, CHARGE, hydrophobicity.

Discussion

47.	Which of the following is untrue about DIALIGN2?(a) It is a web based program designed to detect local similarities(b) It is designed to detect global similarities(c) It does not apply gap penalties and thus is not sensitive to long gaps(d) The method breaks each of the sequences down to smaller segments and performs all possible pair wise alignments between the segmentsI have been asked this question during an interview.I need to ask this question from Heuristic Algorithms in section Multiple Sequence Alignment of Bioinformatics
Answer» The correct ANSWER is (b) It is designed to detect global similarities To elaborate: High-scoring segments, called blocks, AMONG different sequences are then compiled in a progressive manner to ASSEMBLE a full multiple alignment. It places emphasis on block-to-block comparison rather than residue-to-residue comparison. The sequence regions between the blocks are left unaligned. The program has been shown to be especially suitable for aligning divergent sequences with only LOCAL SIMILARITY.

Discussion

48.	Which of the following is not a feature of editors and formatters?(a) provision for displaying the sequence on a color monitor with residue colors to aid in a clear visual representation of the alignment(b) recognition of the multiple sequence format that was output by the MSA (Multiple Sequence Alignment)program(c) maintenance of the alignment in a suitable format when the editing is completed(d) disallowing shading conserved residues in the alignmentThis question was addressed to me at a job interview.My query is from Position in portion Multiple Sequence Alignment of Bioinformatics
Answer» The correct choice is (d) disallowing shading conserved residues in the alignment The best I can explain: In addition to this, provision of a SUITABLE WINDOWS interface, allowing USE of the mouse to add, delete, or move sequence followed by an UPDATED display of the alignment, is a feature. In addition, there are other types of editing that are COMMONLY performed on MSAs (Multiple Sequence Alignment) program such as, for example, shading conserved residues in the alignment.

Discussion

49.	Analysis of s for conserved blocks of sequence leads to production of the position-specific scoring matrix.(a) True(b) FalseThe question was posed to me during an interview.My question is from Position topic in section Multiple Sequence Alignment of Bioinformatics
Answer» Correct choice is (a) True To explain I would say: The analysis of MSAS (Multiple Sequence ALIGNMENT) for conserved blocks of sequence leads to production of the position-specific scoring matrix or PSSM. The PSSM may be used to SEARCH a sequence to obtain the most probable location or locations of the motif represented by the PSSM. Alternatively, the PSSM may be used to search an entire database to identify additional sequences that also have the same motif.

Discussion

50.	Iterative methods include repeatedly realigning subgroups of the sequences and then by aligning these subgroups into a local alignment of all of the sequences.(a) True(b) FalseThe question was posed to me in an online interview.My query is from Iterative Methods of Multiple Sequence Alignment in chapter Multiple Sequence Alignment of Bioinformatics
Answer» RIGHT choice is (B) False Explanation: Subgroups are aligned into a global alignment of all of the sequences. The objective is to improve the overall alignment score, such as a sum of pairs score. Selection of these groups may be based on the ordering of the sequences on a phylogenetic tree PREDICTED in a MANNER similar to that of PROGRESSIVE alignment, separation of one or two of the sequences from the rest, or a random selection of the groups.

Discussion

Explore topic-wise InterviewSolutions in .

Even if many pseudocounts are added in comparison to real sequence counts, the amino acid frequencies will not have any effect or influence.(a) True(b) FalseI got this question in an online interview.My enquiry is from Position in division Multiple Sequence Alignment of Bioinformatics

Clustal is a progressive multiple alignment program available either as a stand-alone or on-line program.(a) True(b) FalseI had been asked this question in an interview.Enquiry is from Heuristic Algorithms in division Multiple Sequence Alignment of Bioinformatics

The alignment score is the sum of substitution scores and gap penalties in this type of algorithm.(a) True(b) FalseI have been asked this question in quiz.The origin of the question is Needleman topic in portion Multiple Sequence Alignment of Bioinformatics

Block analysis methods use substitution matrices such as the PAM and BLOSUM matrices to score matches.(a) True(b) FalseThe question was posed to me in examination.I need to ask this question from Localized Alignments in Sequences topic in portion Multiple Sequence Alignment of Bioinformatics

The program Profilemake can be used to produce a profile from an MSA.(a) True(b) FalseThis question was posed to me by my school principal while I was bunking the class.The above asked question is from Localized Alignments in Sequences in section Multiple Sequence Alignment of Bioinformatics

CLUSTALW is a more recent version of CLUSTAL with the W standing for ________(a) weakening(b) winding(c) weighting(d) wipingI got this question at a job interview.My question is taken from Progressive Methods of Multiple Sequence Alignment in chapter Multiple Sequence Alignment of Bioinformatics

There are two types matrices involved in the study- score matrices and trace matrices.(a) True(b) FalseThis question was addressed to me in final exam.My query is from Needleman topic in portion Multiple Sequence Alignment of Bioinformatics

Analysis of s for conserved blocks of sequence leads to production of the position-specific scoring matrix.(a) True(b) FalseThe question was posed to me during an interview.My question is from Position topic in section Multiple Sequence Alignment of Bioinformatics