79 + Interview Questions in Multiple Sequence Alignment in Bioinformatics Page 2 InterviewSolution

51.	Match-Box compares segments of some of the nine residues of possible Pair wise alignments.(a) True(b) FalseI have been asked this question in class test.My question is taken from Heuristic Algorithms in section Multiple Sequence Alignment of Bioinformatics
Answer» Correct answer is (b) False The best I can explain: Match-Box compares segments of every nine residues of all possible pair wise alignments. It is a web-based server that also aims to identify conserved blocks (or BOXES) among sequences. The PROGRAM compares segments of every nine residues of all possible pair wise alignments. If the similarity of particular segments is above a certain threshold across all sequences, they are USED as an anchor to ASSEMBLE MULTIPLE alignments; residues between blocks are unaligned.

Discussion

52.	MACAW is a local multiple sequence alignment program only.(a) True(b) FalseI have been asked this question in semester exam.This is a very interesting question from Position topic in section Multiple Sequence Alignment of Bioinformatics
Answer» Correct ANSWER is (B) False The best explanation: MACAW is both a local multiple sequence alignment program and a sequence editing tool. Given a set of SEQUENCES, the program finds ungapped blocks in the sequences and gives their statistical significance. Later versions of the program find blocks by one of three user-chosen METHODS.

Discussion

53.	If a good sampling of sequences is _______ the number of sequences is _________ and the motif structure is ________ it should, in principle, be possible to obtain frequencies highly representative of the same motif in other sequences also.(a) available, sufficiently large, not too complex(b) unavailable, sufficiently large, not too complex(c) unavailable, sufficiently small, not too complex(d) available, sufficiently large, too complexI got this question during an online interview.This question is from Position in chapter Multiple Sequence Alignment of Bioinformatics
Answer» The CORRECT option is (a) available, sufficiently LARGE, not too complex To elaborate: The more abundant AMINO acids already found in that column would probably be favored. Thus, if a good sampling of sequences is available, the number of sequences is sufficiently large, and the MOTIF structure is not too complex, it should, in principle, be possible to obtain frequencies HIGHLY representative of the same motif in other sequences also (Henikoff and Henikoff 1996).

Discussion

54.	Which of the following about the Gibbs sampler is untrue?(a) It is a statistical method for finding motifs in sequences(b) It is dissimilar to the principle of the EM method(c) It searches for the statistically most probable motifs(d) It can find the optimal width and number of given motifs in each sequenceThe question was asked in a national level competition.The above asked question is from Statistical Methods for Aiding Alignment in chapter Multiple Sequence Alignment of Bioinformatics
Answer» The correct answer is (b) It is dissimilar to the principle of the EM METHOD Explanation: It is ANOTHER STATISTICAL method for finding motifs in SEQUENCES is the Gibbs sampler. The method is similar in principle to the EM method described above, but the algorithm is different. A combinatorial approach of the Gibbs sampler and MOTIF may be used to make blocks at the BLOCKS Web SITE.

Discussion

55.	The scoring of gaps in a MSA (Multiple Sequence Alignment) has to be performed in a different manner from scoring gaps in a pair-wise alignment.(a) True(b) FalseThe question was posed to me by my school principal while I was bunking the class.I'd like to ask this question from Progressive Methods of Multiple Sequence Alignment topic in division Multiple Sequence Alignment of Bioinformatics
Answer» The correct choice is (a) True Explanation: As more sequences are added to a PROFILE of an existing MSA, gaps accumulate and influence the ALIGNMENT of further sequences. CLUSTALW CALCULATES gaps in a NOVEL way designed to place them between conserved domains.

Discussion

56.	There are two approaches viz. exhaustive and heuristic approaches used in multiple sequence alignment.(a) True(b) FalseThe question was posed to me in homework.The above asked question is from Exhaustive Algorithms topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» Right answer is (a) True The explanation is: The exhaustive alignment method involves examining all possible aligned positions simultaneously. SIMILAR to dynamic programming in pair wise alignment, which involves the use of a two-dimensional MATRIX to SEARCH for an optimal alignment, to use dynamic programming for multiple sequence alignment, EXTRA dimensions are needed to TAKE all possible ways of sequence matching into consideration.

Discussion

57.	Two commonly encountered examples are the Genetics Computer Group’s MSF format and the CLUSTALW ALN format.(a) True(b) FalseI had been asked this question during a job interview.This key question is from Position topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Correct answer is (a) True Best EXPLANATION: This is because these formats follow a precise OUTLINE, one may be readily converted to another by computer programs. READSEQ by D.G.Gilbert at INDIANA University at Bloomington is one such PROGRAM.

Discussion

58.	The pattern searching method type of analysis was performed on groups of related proteins, and the amino acid patterns that were located may be found in the Prosite catalog.(a) True(b) FalseThis question was addressed to me in examination.My doubt stems from Localized Alignments in Sequences in section Multiple Sequence Alignment of Bioinformatics
Answer» CORRECT choice is (a) True The EXPLANATION: This Prosite catalog groups proteins that have similar biochemical functions on the basis of amino acid patterns such as those in the ACTIVE site. Subsequently, these FAMILIES were searched for amino acid patterns by the MOTIF program (Smith et al. 1990), which finds patterns of the type aa1 d1 aa2 d2 aa3, where aa1 and aa2 are conserved amino acids and d1 and d2 are stretches of INTERVENING sequence up to 24 amino acids long.

Discussion

59.	An approach for obtaining a higher-scoring MSA (Multiple Sequence Alignment) by rearranging an existing alignment uses a probability approach called simulated annealing.(a) True(b) FalseI had been asked this question by my college professor while I was bunking the class.I would like to ask this question from Iterative Methods of Multiple Sequence Alignment in section Multiple Sequence Alignment of Bioinformatics
Answer» Right choice is (a) True The explanation is: The PROGRAM MSASA (MULTIPLE Sequence Alignment by SIMULATED Annealing) starts with a heuristic MSA (Multiple Sequence Alignment). Further, it CHANGES the alignment by following an algorithm designed to identify changes that increase the alignment score.

Discussion

60.	Which of the following is untrue about CLUSTAL program?(a) CLUSTAL performs a global-multiple sequence alignment by a different method than MSA (Multiple Sequence Alignment)(b) The initial heuristic alignment obtained by MSA is calculated in a different way(c) The initial step includes performing pair-wise alignments of all of the sequences(d) The intermediate step includes use the alignment scores to produce a phylogenetic treeThe question was asked in quiz.I would like to ask this question from Progressive Methods of Multiple Sequence Alignment in portion Multiple Sequence Alignment of Bioinformatics
Answer» The correct option is (b) The initial heuristic ALIGNMENT obtained by MSA is calculated in a different way Explanation: The initial heuristic alignment obtained by MSA is calculated the same way, although it performs a global-multiple sequence alignment by a different method than MSA (Multiple Sequence Alignment). As the MENTIONED options are first two steps, the last is aligning the SEQUENCES SEQUENTIALLY, guided by the PHYLOGENETIC relationships indicated by the tree.

Discussion

61.	The number of possible global alignments between two sequences of length N is _____(a) \(\frac{2^N}{\sqrt{πN}}\)(b) \(\frac{2^{2N}}{\sqrt{πN}}\)(c) \(\frac{2^{(N-1)}}{\sqrt{πN}}\)(d) \(\frac{2^{2N}}{\sqrt{N}}\)I have been asked this question during an interview for a job.My doubt is from Needleman topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Right choice is (b) \(\frac{2^{2N}}{\sqrt{πN}}\) For EXPLANATION I would SAY: By the total number of permutations and combinations option b gives the accurate number of possible GLOBAL ALIGNMENTS between two sequences of length N. For two sequences of 250 residues this is 10^149.

Discussion

62.	Which of the following is not among the methods for finding localized sequence similarity?(a) Profile Analysis(b) Block Analysis(c) Extraction of Blocks from a Global or Local MSA(d) Pattern blockingThis question was addressed to me in an international level competition.My query is from Localized Alignments in Sequences in division Multiple Sequence Alignment of Bioinformatics
Answer» The correct option is (d) Pattern blocking For explanation I would SAY: Pattern Searching is the correct name of the method for finding localized sequence similarity. This type of analysis was performed on groups of RELATED PROTEINS, and the amino acid patterns that were located may be FOUND in the PROSITE catalog.

Discussion

63.	The HMM is a statistical model that considers few combinations of matches and gaps to generate an alignment of a set of sequences.(a) True(b) FalseThe question was posed to me in unit test.My enquiry is from Iterative Methods of Multiple Sequence Alignment topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» The correct option is (b) False The best explanation: The HMM is a statistical model that CONSIDERS all possible COMBINATIONS of matches, mismatches, and gaps to generate an alignment of a set of sequences. A localized region of similarity, including insertions and deletions, may also be modeled by an HMM. Analysis of sequences by an HMM is discussed on page 185 ALONG with other statistical methods.

Discussion

64.	The first step in Genetic Algorithm is arranging the sequences to be aligned in rows.(a) True(b) FalseI got this question in an online interview.Query is from Iterative Methods of Multiple Sequence Alignment in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Correct choice is (a) True The explanation: The SEQUENCES to be aligned are WRITTEN in rows, as on a page, except that they are made to OVERLAP by a random amount of sequence, up to 50 residues long for sequences about 200 in length. The ends are then padded with gaps. A TYPICAL population of 100 of these MSAs is made, although other numbers may be set.

Discussion

65.	In Needleman-Wunsch algorithm, the gaps are scored -2.(a) True(b) FalseThis question was addressed to me in an interview for job.This key question is from Needleman topic in section Multiple Sequence Alignment of Bioinformatics
Answer» Right choice is (B) False The best EXPLANATION: In Needleman-Wunsch ALGORITHM, the gaps are IGNORED. Amount of gap penalty is zero here. A gap corresponds to an insertion or a deletion of a Residue.

Discussion

66.	There is a unique advantage of multiple sequence alignment because it reveals more biological information than many pair wise alignments can.(a) True(b) FalseI had been asked this question in an online interview.My doubt is from Exhaustive Algorithms in portion Multiple Sequence Alignment of Bioinformatics
Answer» Right option is (a) True Best EXPLANATION: It is truly an ADVANTAGE of MULTIPLE sequence alignment. For example, it allows the identification of conserved sequence patterns and motifs in the whole sequence family, which are not obvious to DETECT by comparing only two sequences.

Discussion

67.	Which of the following is untrue regarding Expectation Maximization algorithm?(a) An initial guess is made as to the location and size of the site of interest in each of the sequences, and these parts of the sequence are aligned(b) The alignment provides an estimate of the base or amino acid composition of each column in the site(c) The column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences(d) The row-by-column composition of the site already available is used to estimate the probabilityThe question was asked during an interview.Asked question is from Statistical Methods for Aiding Alignment topic in division Multiple Sequence Alignment of Bioinformatics
Answer» Correct choice is (d) The row-by-column composition of the site already available is used to estimate the probability Easiest explanation: The EM algorithm then consists of two steps, which are repeated consecutively. In step 1, the EXPECTATION step, the column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences. These PROBABILITIES are used in turn to PROVIDE new information as to the expected base or amino acid DISTRIBUTION for each column in the site.

Discussion

68.	As the amount of computational time and memory space required increases exponentially with the number of sequences, it makes the multidimensional search matrix method computationally prohibitive to use for a large data set.(a) True(b) FalseI got this question by my school teacher while I was bunking the class.I'd like to ask this question from Exhaustive Algorithms in portion Multiple Sequence Alignment of Bioinformatics
Answer» Correct OPTION is (a) True To explain: This is indeed the drawback of that method. For this reason, full dynamic programming is limited to small DATASETS of less than ten SHORT sequences. For the same reason, few multiple alignment programs employing this “brute force” approach are PUBLICLY available.

Discussion

69.	The scoring function for multiple sequence alignment is based on the concept of sum of pairs (SP).(a) True(b) FalseThe question was posed to me during a job interview.Enquiry is from Exhaustive Algorithms in chapter Multiple Sequence Alignment of Bioinformatics
Answer» CORRECT ANSWER is (a) True For explanation: Multiple sequence alignment is to arrange sequences in such a way that a maximum number of residues from each sequence are matched up according to a particular scoring FUNCTION and is based on the concept of sum of pairs (SP). As the name suggests, it is the sum of the SCORES of all possible pairs of sequences in a multiple alignment based on a particular scoring matrix.

Discussion

70.	In Genetic Algorithm, in the mutation process _______(a) sequence is changed(b) gaps are not inserted(c) sequence is not changed(d) gaps are not rearrangedI got this question in an internship interview.This interesting question is from Iterative Methods of Multiple Sequence Alignment in portion Multiple Sequence Alignment of Bioinformatics
Answer» CORRECT choice is (c) sequence is not changed The BEST I can explain: In the mutation PROCESS, the sequence is not changed (ELSE it would no longer be an ALIGNMENT), but gaps are inserted and rearranged in an attempt to create a better-scoring MSA. In the gap insertion process, the sequences in a given MSA are divided into two groups based on an estimated phylogenetic tree, and gaps of random length are inserted into random positions in the alignment.

Discussion

71.	The CLUSTALX provides a graphic interface.(a) True(b) FalseThe question was posed to me during an internship interview.My question comes from Progressive Methods of Multiple Sequence Alignment in portion Multiple Sequence Alignment of Bioinformatics
Answer» Right option is (a) True For explanation I WOULD say: Two examples of programs that use progressive METHODS are CLUSTALW and the Genetics COMPUTER Group program PILEUP. CLUSTALX provides a graphic interface. These changes provide more realistic alignments that should reflect the evolutionary changes in the aligned sequences and the more APPROPRIATE distribution of gaps between conserved domains.

Discussion

72.	In the initial step of EM algorithm, the 20-residue-long binding motif patterns in each sequence are aligned as an initial guess of the motif.(a) True(b) FalseThe question was asked by my college director while I was bunking the class.I want to ask this question from Statistical Methods for Aiding Alignment in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Right option is (a) True The explanation: The BASE composition of each COLUMN in the aligned patterns is then determined. The composition of the flanking sequence on each side of the site provides the surrounding base or amino acid composition for COMPARISON. Each sequence is assumed to be the same length and to be aligned by the ENDS.

Discussion

73.	In the method of extraction of blocks from a global or local MSA, a global MSA of related protein sequences usually includes regions that have been aligned without gaps in any of the sequences.(a) True(b) FalseI have been asked this question in an interview for internship.The query is from Localized Alignments in Sequences in section Multiple Sequence Alignment of Bioinformatics
Answer» Right answer is (a) True The best EXPLANATION: These ungapped patterns may be EXTRACTED from these aligned regions and used to produce blocks. Blocks FOUND in this manner are only as good as the MSA from which they are DERIVED. A global MSA of related PROTEIN sequences usually includes regions that have been aligned without gaps in any of the sequences.

Discussion

74.	The substitution matrices are rarely used in this type of matching.(a) True(b) FalseI have been asked this question during an online interview.Enquiry is from Needleman topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» Correct answer is (b) False For explanation: The substitution matrices are QUITE commonly used in this type of matching. A concise way to EXPRESS the RESIDUE substitution costs can be achieved with a N x N matrix where, N is 4 for DNA and 20 for proteins as 4 nucleotides in DNA and 20 amino acid residues in proteins are in picture RESPECTIVELY.

Discussion

75.	Which of the following is untrue about iterative approach?(a) The iterative approach is based on the idea that an optimal solution can be found by repeatedly modifying existing suboptimal solutions(b) Because the order of the sequences used for alignment is different in each iteration(c) This method is also heuristic in nature and does not have guarantees for finding the optimal alignment(d) This method is not based on heuristic methodsI had been asked this question in an international level competition.This intriguing question comes from Heuristic Algorithms topic in portion Multiple Sequence Alignment of Bioinformatics
Answer» Correct OPTION is (d) This method is not based on heuristic methods For explanation: This method is based on heuristic methods. The procedure starts by producing a low-quality ALIGNMENT and GRADUALLY improves it by iterative REALIGNMENT through well-defined procedures until no more IMPROVEMENTS in the alignment scores can be achieved.

Discussion

76.	Which of the following is not a drawback of the progressive alignment method?(a) The progressive alignment method is not suitable for comparing sequences of different lengths because it is a global alignment–based method(b) In this method the use of affine gap penalties, long gaps are not allowed, and, in some cases, this may limit the accuracy of the method(c) In this method the use of affine gap penalties, long gaps is allowed, and, in some cases, this may limit the accuracy of the method(d) The final alignment result is also influenced by the order of sequence additionThis question was posed to me in homework.My query is from Heuristic Algorithms in section Multiple Sequence Alignment of Bioinformatics
Answer» Correct choice is (c) In this method the use of affine gap penalties, long GAPS is ALLOWED, and, in some cases, this may limit the accuracy of the method To explain I would say: Another major limitation is the “greedy” nature of the algorithm: it depends on initial pair WISE alignment. Once gaps introduced in the early STEPS of alignment, they are fixed. The final alignment could be far from optimal. The problem can be more glaring when dealing with divergent sequences.

Discussion

77.	The major drawback of the progressive and iterative alignment strategies is that they are largely global alignment based and may therefore fail to recognize conserved domains and motifs among highly divergent sequences of varying lengths.(a) True(b) FalseI got this question in homework.This key question is from Heuristic Algorithms topic in chapter Multiple Sequence Alignment of Bioinformatics
Answer» Right option is (a) True Best explanation: For such divergent sequences that share only regional SIMILARITIES, a local alignment based approach has to be used. The strategy IDENTIFIES a BLOCK of ungapped alignment SHARED by all the sequences, hence, the block-based local alignment strategy.

Discussion

78.	Related sequences are identified through the database similarity searching and as the process generates multiple matching sequence pairs, it is often necessary to convert the numerous pair wise alignments into a single alignment.(a) True(b) FalseThe question was posed to me at a job interview.I want to ask this question from Exhaustive Algorithms topic in section Multiple Sequence Alignment of Bioinformatics
Answer» The correct CHOICE is (a) True The best I can explain: A natural extension of pair WISE alignment is multiple SEQUENCE alignment, which is to ALIGN multiple related sequences to achieve optimal matching of the sequences. Related sequences are identified through the database similarity searching. As the process generates multiple matching sequence pairs, it is often necessary to convert the numerous pair wise alignments into a single alignment, which arranges sequences in such a way that EVOLUTIONARILY equivalent positions across all sequences are matched.

Discussion

79.	Which of the following cannot be related to multiple sequence alignment?(a) Many conserved and functionally critical amino acid residues can be identified in a protein multiple alignment(b) Multiple sequence alignment is also an essential prerequisite to carrying out phylogenetic analysis of sequence families and prediction of protein secondary and tertiary structures(c) Multiple sequence alignment also has applications in designing degenerate polymerase chain reaction (PCR) primers based on multiple related sequences(d) This method does not contribute much to degenerate polymerase chain reaction (PCR) primers creationThe question was asked during an internship interview.I would like to ask this question from Exhaustive Algorithms topic in section Multiple Sequence Alignment of Bioinformatics
Answer» Right answer is (d) This method does not contribute much to degenerate polymerase CHAIN reaction (PCR) primers creation The explanation: In practice, heuristic approaches are most often used. MULTIPLE sequence ALIGNMENT has applications in designing degenerate (PCR) primers based on multiple related SEQUENCES.

Discussion

Explore topic-wise InterviewSolutions in .

Match-Box compares segments of some of the nine residues of possible Pair wise alignments.(a) True(b) FalseI have been asked this question in class test.My question is taken from Heuristic Algorithms in section Multiple Sequence Alignment of Bioinformatics

MACAW is a local multiple sequence alignment program only.(a) True(b) FalseI have been asked this question in semester exam.This is a very interesting question from Position topic in section Multiple Sequence Alignment of Bioinformatics

There are two approaches viz. exhaustive and heuristic approaches used in multiple sequence alignment.(a) True(b) FalseThe question was posed to me in homework.The above asked question is from Exhaustive Algorithms topic in portion Multiple Sequence Alignment of Bioinformatics

Two commonly encountered examples are the Genetics Computer Group’s MSF format and the CLUSTALW ALN format.(a) True(b) FalseI had been asked this question during a job interview.This key question is from Position topic in chapter Multiple Sequence Alignment of Bioinformatics

The first step in Genetic Algorithm is arranging the sequences to be aligned in rows.(a) True(b) FalseI got this question in an online interview.Query is from Iterative Methods of Multiple Sequence Alignment in chapter Multiple Sequence Alignment of Bioinformatics

In Needleman-Wunsch algorithm, the gaps are scored -2.(a) True(b) FalseThis question was addressed to me in an interview for job.This key question is from Needleman topic in section Multiple Sequence Alignment of Bioinformatics

The scoring function for multiple sequence alignment is based on the concept of sum of pairs (SP).(a) True(b) FalseThe question was posed to me during a job interview.Enquiry is from Exhaustive Algorithms in chapter Multiple Sequence Alignment of Bioinformatics

The CLUSTALX provides a graphic interface.(a) True(b) FalseThe question was posed to me during an internship interview.My question comes from Progressive Methods of Multiple Sequence Alignment in portion Multiple Sequence Alignment of Bioinformatics

The substitution matrices are rarely used in this type of matching.(a) True(b) FalseI have been asked this question during an online interview.Enquiry is from Needleman topic in portion Multiple Sequence Alignment of Bioinformatics