116 + Interview Questions in GENERAL QA IN BIOINFORMATICS Page 2 InterviewSolution

51.	Which of the following is untrue about threading and fold recognition?(a) It assess the compatibility of an amino acid sequence with a known structure in a fold library(b) If the protein fold to be predicted does not exist in the fold library, the method won’t necessarily fail(c) If the protein fold to be predicted does not exist in the fold library, the method will fail(d) Threading and fold recognition do not generate fully refined atomic models for the query sequences
Answer» Correct answer is (b) If the protein fold to be predicted does not exist in the fold library, the method won’t necessarily fail To explain: A disadvantage compared to homology modeling lies in the fact that threading and fold recognition do not generate fully refined atomic models for the query sequences. This is because accurate alignment between distant homologs is difficult to achieve. Instead, threading and fold recognition procedures only provide a rough approximation of the overall topology of the native structure.

Discussion

52.	In regular expressions, which of the following pair of pattern is wrongly matched with its significance?(a) [ ] – Or(b) { } – Not(c) ( ) – Repeats(d) Z – Any
Answer» The correct answer is (d) Z – Any The explanation: Regular Expression Symbols have their own significances in regular expressions system means [GA] .g.e rFo ‘G or A’, {V,P} means not P or V, x(4) means (xxxx). Likewise, X denotes any character.

Discussion

53.	Which of the following wrongly describes protein domains?(a) They are made up of one secondary structure(b) Defined as independently foldable units(c) They are stable structures as compared to motifs(d) They are separated by linker regions
Answer» Right choice is (a) They are made up of one secondary structure To elaborate: Protein domains are made up of two or more motifs i.e. the secondary structure to form stable and folded 3-D structures. They are conserved part of the protein sequence and can evolve, function, and exist independently of the rest of the protein chain.

Discussion

54.	What does this representation mean- R.L.[EQD]?(a) An arginine- Amino acid- Leucine- Amino acid- Either Apartic acid, glutamic acid or glutamine(b) An arginine- Leucine- Either Apartic acid, glutamic acid or glutamine(c) An arginine- Leucine- Amino acid- Either Apartic acid, glutamic acid or glutamine(d) An arginine- Leucine- Apartic acid and glutamic acid and glutamine
Answer» Correct option is (a) An arginine- Amino acid- Leucine- Amino acid- Either Apartic acid, glutamic acid or glutamine For explanation I would say: This is an example of pexel motif. Here, the ‘.’ represents the ‘end’ i.e. the amino acid as mentioned in the answer and the [ ] means ‘or’ i.e. either of the mentioned residue is present in the given position.

Discussion

55.	If the data set is _______ then unless the motif has __________ amino acids in each column, the column frequencies in the motif may not be highly representative of all other occurrences of the motif.(a) small, distinct(b) small, almost identical(c) large, almost identical(d) large, distinct
Answer» Correct answer is (b) small, almost identical The explanation is: The number of sequences for producing the motif may be small, highly diverse, or complex, giving rise to a second level of consideration. If the data set is small, then unless the motif has almost identical amino acids in each column, the column frequencies in the motif may not be highly representative of all other occurrences of the motif. In such cases, it is desirable to improve the estimates of the amino acid frequencies by adding extra amino acid counts, called pseudocounts, to obtain a more reasonable distribution of amino acid frequencies in the column.

Discussion

56.	Local alignments are more used when _____________(a) There are totally similar and equal length sequences(b) Dissimilar sequences are suspected to contain regions of similarity(c) Similar sequence motif with larger sequence context(d) Partially similar, different length and conserved region containing sequences
Answer» Right choice is (a) There are totally similar and equal length sequences To explain: The given description is suitable for global alignment. It attempts to align maximum of the entire sequence unlike local alignment where the partially similar sequences are analyzed.

Discussion

57.	Which of the following does not describe global alignment algorithm?(a) Score can be negative in this method(b) It is based on dynamic programming technique(c) For two sequences of length m and n, the matrix to be defined should be of dimensions m+1 and n+1(d) For two sequences of length m and n, the matrix to be defined should be of dimensions m and n
Answer» The correct answer is (d) For two sequences of length m and n, the matrix to be defined should be of dimensions m and n The explanation: For two sequences of length m and n, the matrix to be defined should be of dimensions m+1 and n+1so that there is margin for addition of the score along the diagonal. Also, corresponding score is further calculated at the end cumulatively.

Discussion

58.	Which of the following statements about PANTHER and TIGRFAMs databases is incorrect regarding its features?(a) TIGRFAMs provides a tool for identifying functionally related proteins based on sequence homology(b) TIGRFAMs is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation(c) Hidden Markov models (HMMs) are not used in PANTHER(d) PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertise
Answer» Right option is (c) Hidden Markov models (HMMs) are not used in PANTHER Easy explanation: In PANTHER the subfamilies model the divergence of specific functions within protein families, allowing more accurate association with function (human-curated molecular function and biological process classifications and pathway diagrams), as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences.

Discussion

59.	Which of the following statements about InterPro is incorrect regarding its features?(a) Protein relatedness is defined by the P-values from the BLAST alignments(b) The most closely related sequences are grouped into the lowest level clusters(c) More distant protein groups are merged into higher levels of clusters(d) The outcome of this cluster merging is a tree-like structure of functional categories
Answer» Correct choice is (a) Protein relatedness is defined by the P-values from the BLAST alignments The best I can explain: InterPro is a database of clusters of homologous proteins similar to COG. Protein relatedness is defined by the E-values from the BLAST alignments. The database further provides gene ontology information for protein cluster at each level as well as keywords from InterPro domains for functional prediction.

Discussion

60.	Which of the following is not a feature of editors and formatters?(a) provision for displaying the sequence on a color monitor with residue colors to aid in a clear visual representation of the alignment(b) recognition of the multiple sequence format that was output by the MSA (Multiple Sequence Alignment) program(c) maintenance of the alignment in a suitable format when the editing is completed(d) disallowing shading conserved residues in the alignment
Answer» The correct choice is (d) disallowing shading conserved residues in the alignment The best I can explain: In addition to this, provision of a suitable windows interface, allowing use of the mouse to add, delete, or move sequence followed by an updated display of the alignment, is a feature. In addition, there are other types of editing that are commonly performed on MSAs (Multiple Sequence Alignment) program such as, for example, shading conserved residues in the alignment.

Discussion

61.	The overall height of a logo position reflects how conserved the position is, and the _____ of each letter in a position reflects the _______ of the residue in the alignment.(a) height, relative frequency(b) width, relative frequency(c) height, amplitude(d) width, amplitude
Answer» The correct option is (a) height, relative frequency For explanation: The height expresses the data about the extent of the conservation of the position and each letter shows the frequency of that particular residue. The amplitude, here in this case, is irrelevant option.

Discussion

62.	_______ is an interactive program for generating sequence logos.(a) EMBOSS(b) WebLogo(c) LOGOLY(d) BLAST
Answer» Correct choice is (b) WebLogo To elaborate: In WebLogo, a user needs to enter the sequence alignment in FASTA format to allow the program to compute the logos. A graphic file is returned to the user as a result.

Discussion

63.	Which of the following is untrue about PRRN?(a) PRRN is a web-based program that uses a double nested iterative strategy for multiple alignment(b) It performs multiple alignments through two sets of iterations: inner iteration and outer iteration(c) In the outer iteration, an initial random alignment is generated that is used to derive a UPGMA tree(d) In the inner iteration, the sequences are randomly divided into multiple groups
Answer» Right answer is (d) In the inner iteration, the sequences are randomly divided into multiple groups To explain: In the inner iteration, the sequences are randomly divided into two groups. Randomized alignment is used for each group in the initial cycle, after which the alignment positions in each group are fixed. The two groups, each treated as a single sequence, are then aligned to each other using global dynamic programming. The process is repeated through many cycles until the total SP score no longer increases. At this point, the resulting alignment is used to construct a new UPGMA tree.

Discussion

64.	Gaps are added to the alignment because it ______(a) increases the matching of identical amino acids at subsequent portions in the alignment(b) increases the matching of or dissimilar amino acids at subsequent portions in the alignment(c) reduces the overall score(d) enhances the area of the sequences
Answer» The correct choice is (a) increases the matching of identical amino acids at subsequent portions in the alignment Explanation: In the alignment process, gaps are added to the alignment in a manner that increases the matching of identical or similar amino acids at subsequent portions in the alignment. Ideally, when two similar protein sequences are aligned, the alignment should have long regions of identical or related amino acid pairs and very few gaps. As the sequences become more distant, more mismatched amino acid pairs and gaps should appear.

Discussion

65.	Which of the following statements about COG is incorrect regarding its features?(a) Currently, there are 4,873 clusters in the COG databases derived from unicellular organisms(b) It is constructed by comparing protein sequences encoded in forty-three completely sequenced genomes, which are mainly from prokaryotes, representing thirty major phylogenetic lineages(c) The interface for sequence searching in the COG database is the COGnitor program, which is based on gapped BLAST(d) It is a protein family database based on structural classification
Answer» The correct option is (d) It is a protein family database based on structural classification The explanation is: COG which stands for Cluster of Orthologous Groups, is a protein family database based on phylogenetic classification. Because orthologous proteins shared by three or more lineages are considered to have descended through a vertical evolutionary scenario, if the function of one of the members is known, functionality of other members can be assigned.

Discussion

66.	Which of the following is true regarding the assumptions in the method of constructing the Dayhoff scoring matrix?(a) it is assumed that each amino acid position is equally mutable(b) it is assumed that each amino acid position is not equally mutable(c) it is assumed that each amino acid position is not mutable at all(d) sites do not vary in their degree of mutability
Answer» Correct choice is (a) it is assumed that each amino acid position is equally mutable Explanation: In this process, first, it is assumed that each amino acid position is equally mutable, whereas, in fact, sites vary considerably in their degree of mutability. Mutagenesis hot spots are well known in molecular genetics, and variations in mutability of different amino acid sites in proteins are well known.

Discussion

67.	Which of the following is not a member database of InterPro?(a) SCOP(b) HAMAP(c) PANTHER(d) Pfam
Answer» Correct option is (a) SCOP To elaborate: The signatures from InterPro come from 11 member databases viz. CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, TIGRFAMs.

Discussion

68.	A length and distance that gives the highest overall probability may then be determined. Such alignments are initially found using ________(a) a particular scoring matrix only(b) an alignment algorithm only(c) an alignment algorithm and a particular scoring matrix(d) dot method
Answer» The correct answer is (c) an alignment algorithm and a particular scoring matrix Explanation: Analysis of the yeast and C. elegans genomes for such repeats has underscored the importance of using a range of DNA scoring matrices such as PAM1 to PAM120 if most repeats are to be found. The application of the above Bayesian analysis allows a determination of the probability distributions as a function of both length of the repeated region and evolutionary distance.

Discussion

69.	If the two sequences share significant similarity, it is extremely ______ that the extensive similarity between the two sequences has been acquired randomly, meaning that the two sequences must have derived from a common evolutionary origin.(a) unlikely(b) possible(c) likely(d) relevant
Answer» Right answer is (a) unlikely To explain I would say: Sequence alignment provides inference for the relatedness of two sequences under study. Regions that are aligned but not identical represent residue substitutions; regions which residues from one sequence correspond to nothing in the other represent insertions or deletions that have taken place on one of the sequences during evolution.

Discussion

70.	Conserved positions have _____ residues and bigger symbols.(a) fewer(b) more(c) maximum(d) minimum
Answer» Right option is (a) fewer The explanation: The options maximum and minimum are comparatively obsolete as there involves the studies of alignment. Conserved positions have fewer residues and bigger symbols; whereas less conserved positions have a more heterogeneous mixture of smaller symbols stacked together. In general, a sequence logo provides a clearer description of a consensus sequence.

Discussion

71.	In ab initio approach, generally, when a base pairing is formed, the energy of the molecule is _________ because of attractive interactions between the two strands.(a) lowered(b) increased(c) multiplied(d) kept stable
Answer» Right answer is (a) lowered The explanation: Here, the algorithms can be designed to search for a stable RNA structure with the lowest free energy. Thus, to search for a most stable structure, ab initio programs are designed to search for a structure with the maximum number of base pairs.

Discussion

72.	The attractive interactions lead to _________ energy.(a) increased(b) higher(c) lower(d) no change in
Answer» Right choice is (c) lower To explain: If a base pair is next to other base pairs, the base pairs tend to stabilize each other through attractive stacking interactions between aromatic rings of the base pairs. The attractive interactions lead to even lower energy. Parameters for calculating the co-operativity of the base-pair formation have been determined and can be used for structure prediction.

Discussion

73.	What is used to generate parameters for the extreme distribution?(a) The pool of alignment scores from the shuffled sequences(b) A single score of a shuffled sequence(c) The pool of alignment scores from the unshuffled sequences(d) The basic optimal score computed at the beginning of the test
Answer» The correct choice is (a) The pool of alignment scores from the shuffled sequences To explain I would say: Maximum scores are obtained through repeated shuffling. Then the pool of alignment scores from the shuffled sequences is used to generate parameters for the extreme distribution. The original alignment score is then compared against the distribution of random alignments to determine whether the score is beyond random chance.

Discussion

74.	When did Smith–Waterman first describe the algorithm for local alignment?(a) 1950(b) 1970(c) 1981(d) 1925
Answer» Correct option is (c) 1981 Explanation: The algorithm was first proposed by Temple F. Smith and Michael S. Waterman in 1981. The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences.

Discussion

75.	Which of the following is not correct about the X-ray Crystallography?(a) In x-ray protein crystallography, proteins need to be grown into large crystals in which their positions are fixed in a repeated, ordered fashion(b) The protein crystals are illuminated with an intense x-ray beam(c) The x-rays are deflected by the electron clouds surrounding the atoms in the crystal producing a regular pattern of diffraction(d) The protein crystals are illuminated with an intense infrared beam
Answer» The correct option is (d) The protein crystals are illuminated with an intense infrared beam The explanation: The diffraction pattern is composed of thousands of tiny spots recorded on a x-ray film. The diffraction pattern can be converted into an electron density map using a mathematical procedure known as Fourier transform. To interpret a three-dimensional structure from two-dimensional electron density maps requires solving the phases in the diffraction data.

Discussion

76.	On analysis of the alignment scores of random sequences will reveal that the scores follow a different distribution than the normal distribution called the _________(a) Gumbel equal value distribution(b) Gumbel extreme value distribution(c) Gumbel end value distribution(d) Gumbel distribution
Answer» Correct choice is (b) Gumbel extreme value distribution The best explanation: Originally, the significance of sequence alignment scores was evaluated on the basis of the assumption that alignment scores followed a normal statistical distribution. If sequences are randomly generated in a computer by a Monte Carlo or sequence shuffling method, as in generating a sequence by picking marbles representing four bases or 20 amino acids out of a bag, the distribution may look normal at first glance. But on further analysis the above result was obtained.

Discussion

77.	Which of the following is not correct about FASTA?(a) Its stands for FAST ALL(b) It was in fact the first database similarity search tool developed, preceding the development of BLAST(c) FASTA uses a ‘hashing’ strategy to find matches for a short stretch of identical residues with a length of k(d) The string of residues is known as blocks
Answer» Correct answer is (d) The string of residues is known as blocks The explanation: The string of residues is known as ktuples or ktups, which are equivalent to words inBLAST, but are normally shorter than the words. Typically, a ktup is composed of two residues for protein sequences and six residues for DNA sequences.

Discussion

78.	The type of algorithm that _____ predefined alignment is ______ for reasonably conserved sequences.(a) doesn’t require, more successful(b) requires, less successful(c) doesn’t require, relatively successful(d) requires, relatively successful
Answer» The correct option is (d) requires, relatively successful Best explanation: The requirement for using this type of program is an appropriate set of homologous sequences that have to be similar enough to allow accurate alignment, but divergent enough to allow covariations to be detected. If this condition is not met, correct structures cannot be inferred.

Discussion

79.	BLAST uses a _______ to find matching words, whereas FASTA identifies identical matching words using the _____(a) substitution matrix, hashing procedure(b) substitution matrix, blocks(c) hashing procedure, substitution matrix(d) ktups, substitution matrix
Answer» The correct choice is (a) substitution matrix, hashing procedure Easy explanation: BLAST and FASTA have been shown to perform almost equally well in regular database searching; However, there are some notable differences between the two approaches. The major difference is in the seeding step– BLAST uses a substitution matrix to find matching words, whereas FASTA identifies identical matching words using the hashing procedure.

Discussion

80.	In SW algorithm, to align two sequences of lengths of m and n _________ time is required.(a) O(mn)(b) O(m^2n)(c) O(m^2n^3)(d) O(mn^2)
Answer» Right option is (b) O(m^2n) Explanation: The Smith–Waterman algorithm is quite demanding of time. Hence if two sequences of lengths of m and n have to be aligned, the required time is O(m^2n). It requires O(mn) calculation steps.

Discussion

81.	FASTA and BLAST are __________ but __________ for larger datasets.(a) faster, more sensitive(b) faster, less sensitive(c) slower, less sensitive(d) slower, more sensitive
Answer» Correct option is (b) faster, less sensitive Easy explanation: Empirical tests have indeed shown that the exhaustive method produces superior results over the heuristic methods like BLAST and FASTA. But heuristic methods are better and practical when it comes to assess larger datasets with comparatively low sensitivity.

Discussion

82.	In a dot matrix, two sequences to be compared are written in the _____________ of the matrix.(a) horizontal and vertical axes(b) 2 parallel horizontal axes(c) 2 parallel vertical axes(d) horizontal axis (one preceding another)
Answer» The correct option is (a) horizontal and vertical axes To elaborate: The comparison is done by scanning each residue of one sequence for similarity with all residues in the other sequence. If a residue match is found, a dot is placed within the graph. Otherwise, the matrix positions are left blank.

Discussion

83.	For significantly aligning sequences what is the resulting structure on the plot?(a) Intercrossing lines(b) Crosses everywhere(c) Vertical lines(d) A diagonal and lines parallel to diagonal
Answer» Right option is (d) A diagonal and lines parallel to diagonal Explanation: If there is alignment of sequences there is a significantly bold diagonal visible on the plot. And if the is a bit imperfect, the diagonal is shattered too to an extent and forms small parallel lines to it.

Discussion

84.	Which of the following is not a software for dot plot analysis?(a) SIMMI(b) DOTLET(c) DOTMATCHER(d) LALIGN
Answer» The correct option is (a) SIMMI Easiest explanation: For the purpose of dot plot interpretation there are various softwares currently present. Among these SIM is used for these kinds of alignments through dot-plot method that is wrongly abbreviated.

Discussion

85.	For palindromic sequences, what is the structure of the dot plot?(a) 2 intersecting diagonal lines at the midpoint(b) One diagonal(c) Two parallel diagonals(d) No diagonal
Answer» Right option is (a) 2 intersecting diagonal lines at the midpoint To elaborate: For perfectly aligned sequences there is a diagonal formation of dot plot. For palindromic sequences i. e. for sequences that are symmetrical from the midpoint of the sequence, there exist 2 intersecting diagonals on the plot.

Discussion

86.	The softwares for dot plot analysis perform several tasks. Which one of them is not performed by them?(a) Gap open penalty(b) Gap extend penalty(c) Expectation threshold(d) Change or mutate residues
Answer» Right answer is (d) Change or mutate residues Easiest explanation: The gap penalties mentioned above are for the determination of score of the aligning sequences. The change in residue barely takes place as there are number of other softwares for that purpose and also the main objective is to find the score of the alignment.

Discussion

87.	In sequence alignment by BLAST, each word from query sequence is typically _______ residues for protein sequences and _______ residues for DNA sequences.(a) ten, eleven(b) three, three(c) three, eleven(d) three, ten
Answer» The correct option is (c) three, eleven For explanation: The first step is to create a list of words from the query sequence. Each word is typically three residues for protein sequences and eleven residues for DNA sequences. The list includes every possible word extracted from the query sequence. This step is also called seeding.

Discussion

88.	In Rosetta, The segments with assigned _______ structures are subsequently assembled into a ______ dimensional configuration.(a) primary, three(b) secondary, three(c) secondary, two(d) primary, three
Answer» Correct answer is (b) secondary, three To explain I would say: Through random combinations of the fragments, a large number of models are built and their overall energy potentials calculated. The conformation with the lowest global free energy is chosen as the best model.

Discussion

89.	Which of the following is not correct about BLAST?(a) The BLAST web server has been designed in such away as to simplify the task of program selection(b) The programs are organized based on the type of query sequences(c) The programs are organized based on the type of nucleotide sequences, or nucleotide sequence to be translated(d) BLAST is not based on heuristic searching methods
Answer» Correct choice is (d) BLAST is not based on heuristic searching methods The explanation: BLAST and FASTA are based on heuristic searching methods. In addition, programs for special purposes are grouped separately; for example, bl2seq, immunoglobulin BLAST, and VecScreen, a program for removing contaminating vector sequences.

Discussion

90.	Which of the following is incorrect about protein structure comparison?(a) The comparative approach is important in finding remote protein homologs(b) Protein structures have a much higher degree of conservation than the sequences(c) Protein structures have a much lesser degree of conservation than the sequences(d) Proteins can share common structures even without sequence similarity
Answer» The correct choice is (c) Protein structures have a much lesser degree of conservation than the sequences To explain I would say: Structure comparison is one of the fundamental techniques in protein structure analysis. Structure comparison can often reveal distant evolutionary relationships between proteins, which is not feasible using the sequence-based alignment approach alone. In addition, protein structure comparison is a prerequisite for protein structural classification into different fold classes.

Discussion

91.	Which of the following is untrue regarding PHD?(a) It stands for Profile network from Heidelberg(b) It is a web-based program that combines neural network only(c) It first performs a BLASTP of the query sequence against a non redundant protein sequence database(d) In initial steps it finds a set of homologous sequences, which are aligned with the MAXHOM program (a weighted dynamic programming algorithm performing global alignment)
Answer» Correct answer is (b) It is a web-based program that combines neural network only The best explanation: It is a web-based program that combines neural network with multiple sequence alignment. After the initial steps, the resulting alignment in the form of a profile is fed into a neural network that contains three hidden layers. The first hidden layer makes raw prediction based on the multiple sequence alignment by sliding a window of thirteen positions.

Discussion

92.	Which of the following is untrue about CLUSTAL program?(a) CLUSTAL performs a global-multiple sequence alignment by a different method than MSA (Multiple Sequence Alignment)(b) The initial heuristic alignment obtained by MSA is calculated in a different way(c) The initial step includes performing pair-wise alignments of all of the sequences(d) The intermediate step includes use the alignment scores to produce a phylogenetic tree
Answer» The correct option is (b) The initial heuristic alignment obtained by MSA is calculated in a different way Explanation: The initial heuristic alignment obtained by MSA is calculated the same way, although it performs a global-multiple sequence alignment by a different method than MSA (Multiple Sequence Alignment). As the mentioned options are first two steps, the last is aligning the sequences sequentially, guided by the phylogenetic relationships indicated by the tree.

Discussion

93.	Which of the following is untrue about homology modeling?(a) Homology modeling predicts protein structures based on sequence homology with known structures(b) It is also known as comparative modeling(c) The principle behind it is that if two proteins share a high enough sequence similarity, they are likely to have very similar three-dimensional structures(d) It doesn’t involve the evolutionary distances anywhere
Answer» The correct option is (d) It doesn’t involve the evolutionary distances anywhere The explanation: As the name suggests, homology modeling predicts protein structures based on sequence homology with known structures. Homology modeling produces an all-atom model based on alignment with template proteins.

Discussion

94.	Which of the following is untrue about the PRSS program?(a) It stands for Probability of Random Shuffles(b) It is a web-based program that can be used to evaluate the statistical significance of DNA or protein sequence alignment(c) It first aligns two sequences using the Needleman-Wunsch algorithm and calculates the score(d) It holds one sequence in its original form and randomizes the order of residues in the other sequence.
Answer» Right option is (c) It first aligns two sequences using the Needleman-Wunsch algorithm and calculates the score To explain: It first aligns two sequences using the Smith–Waterman algorithm and calculates the score. The shuffled sequence is realigned with the unshuffled sequence. The resulting alignment score is recorded. This process is iterated many (normally 1,000) times to help generate data for fitting the Gumble distribution.

Discussion

95.	Which of the following is not a drawback of the progressive alignment method?(a) The progressive alignment method is not suitable for comparing sequences of different lengths because it is a global alignment–based method(b) In this method the use of affine gap penalties, long gaps are not allowed, and, in some cases, this may limit the accuracy of the method(c) In this method the use of affine gap penalties, long gaps is allowed, and, in some cases, this may limit the accuracy of the method(d) The final alignment result is also influenced by the order of sequence addition
Answer» Correct choice is (c) In this method the use of affine gap penalties, long gaps is allowed, and, in some cases, this may limit the accuracy of the method To explain I would say: Another major limitation is the “greedy” nature of the algorithm: it depends on initial pair wise alignment. Once gaps introduced in the early steps of alignment, they are fixed. The final alignment could be far from optimal. The problem can be more glaring when dealing with divergent sequences.

Discussion

96.	Which of the following is untrue regarding the progressive alignment method?(a) The program also applies a weighting scheme to increase the reliability of aligning divergent sequences (sequences with less than 25% identity)(b) The progress is done by down weighting redundant and closely related groups of sequences in the alignment by a certain factor(c) This scheme is useful in enhancing similar sequences from dominating the alignment(d) This scheme is useful in enhancing similar sequences from dominating the alignment
Answer» The correct answer is (c) This scheme is useful in enhancing similar sequences from dominating the alignment To explain: This scheme is useful in enhancing similar sequences from dominating the alignment. Further, the weight factor for each sequence is determined by its branch length on the guide tree. The branch lengths are normalized by how many times sequences share a basal branch from the root of the tree.

Discussion

97.	Which of the following is untrue about Backbone Model Building Step?(a) Once optimal alignment is achieved, residues in the aligned regions of the target protein can assume a similar structure as the template proteins(b) Coordinates of the corresponding residues of the template proteins can be simply copied onto the target protein(c) If the two residues differ, everything other than the backbone atoms can be copied(d) If the two aligned residues are identical, coordinates of the side chain atoms are copied along with the main chain atoms
Answer» Correct choice is (c) If the two residues differ, everything other than the backbone atoms can be copied The best I can explain: Option “Once optimal alignment is achieved, residues in the aligned regions of the target protein can assume a similar structure as the template proteins” and “Coordinates of the corresponding residues of the template proteins can be simply copied onto the target protein” mean the same. If the two residues differ, only the backbone atoms can be copied. The side chain atoms are rebuilt in a subsequent procedure. In backbone modeling, it is simplest to use only one template structure. The structure with the best quality and highest resolution is normally chosen if multiple options are available.

Discussion

98.	Which of the following is untrue about specialized programs for loop modeling?(a) PETRA is a web server that models loops using the database approach(b) FREAD is a web server that models loops using the database approach(c) CODA is a web server that uses a consensus method based on the prediction results from FREAD and PETRA(d) For loops of three to eight residues, CODA uses consensus conformation of both methods
Answer» Right answer is (a) PETRA is a web server that models loops using the database approach To explain I would say: PETRA is a web server that uses the ab initio method to model loops. For nine to thirty residues, CODA uses FREAD prediction only.

Discussion

99.	Which of the following is untrue about Sequence Alignment Step?(a) Once the structure with the highest sequence similarity is identified as a template, the full-length sequences of the template and target proteins need to be realigned using refined alignment algorithms to obtain optimal alignment(b) The realignment is the most critical step in homology modeling(c) The realignment directly affects the quality of the final model(d) Errors made in the alignment step can be corrected in the following modeling steps
Answer» Right choice is (d) Errors made in the alignment step can be corrected in the following modeling steps Best explanation: Incorrect alignment at this stage leads to incorrect designation of homologous residues and therefore to incorrect structural models. Errors made in the alignment step cannot be corrected in the following modeling steps. Therefore, the best possible multiple alignment algorithms, such as Praline and T-Coffee should be used for this purpose.

Discussion

100.	Which of the following is not correct about the Coils and Loops?(a) They are regular structures(b) They are irregular structures(c) The loops are often characterized by sharp turns or hairpin-like structures(d) If the connecting regions are completely irregular, they belong to random coils
Answer» The correct choice is (a) They are regular structures To explain I would say: Residues in the loop or coil regions tend to be charged and polar and located on the surface of the protein structure. They are often the evolutionarily variable regions where mutations, deletions, and insertions frequently occur. They can be functionally significant because these locations are often the active sites of proteins.

Discussion

Explore topic-wise InterviewSolutions in .

In regular expressions, which of the following pair of pattern is wrongly matched with its significance?(a) [ ] – Or(b) { } – Not(c) ( ) – Repeats(d) Z – Any

Which of the following wrongly describes protein domains?(a) They are made up of one secondary structure(b) Defined as independently foldable units(c) They are stable structures as compared to motifs(d) They are separated by linker regions

If the data set is _ then unless the motif has ____ amino acids in each column, the column frequencies in the motif may not be highly representative of all other occurrences of the motif.(a) small, distinct(b) small, almost identical(c) large, almost identical(d) large, distinct

The overall height of a logo position reflects how conserved the position is, and the _ of each letter in a position reflects the ___ of the residue in the alignment.(a) height, relative frequency(b) width, relative frequency(c) height, amplitude(d) width, amplitude

_______ is an interactive program for generating sequence logos.(a) EMBOSS(b) WebLogo(c) LOGOLY(d) BLAST

Which of the following is not a member database of InterPro?(a) SCOP(b) HAMAP(c) PANTHER(d) Pfam

A length and distance that gives the highest overall probability may then be determined. Such alignments are initially found using ________(a) a particular scoring matrix only(b) an alignment algorithm only(c) an alignment algorithm and a particular scoring matrix(d) dot method

If the two sequences share significant similarity, it is extremely ______ that the extensive similarity between the two sequences has been acquired randomly, meaning that the two sequences must have derived from a common evolutionary origin.(a) unlikely(b) possible(c) likely(d) relevant

Conserved positions have _____ residues and bigger symbols.(a) fewer(b) more(c) maximum(d) minimum

In ab initio approach, generally, when a base pairing is formed, the energy of the molecule is _________ because of attractive interactions between the two strands.(a) lowered(b) increased(c) multiplied(d) kept stable

The attractive interactions lead to _________ energy.(a) increased(b) higher(c) lower(d) no change in

What is used to generate parameters for the extreme distribution?(a) The pool of alignment scores from the shuffled sequences(b) A single score of a shuffled sequence(c) The pool of alignment scores from the unshuffled sequences(d) The basic optimal score computed at the beginning of the test

When did Smith–Waterman first describe the algorithm for local alignment?(a) 1950(b) 1970(c) 1981(d) 1925

On analysis of the alignment scores of random sequences will reveal that the scores follow a different distribution than the normal distribution called the _________(a) Gumbel equal value distribution(b) Gumbel extreme value distribution(c) Gumbel end value distribution(d) Gumbel distribution

The type of algorithm that _ predefined alignment is __ for reasonably conserved sequences.(a) doesn’t require, more successful(b) requires, less successful(c) doesn’t require, relatively successful(d) requires, relatively successful

BLAST uses a ___ to find matching words, whereas FASTA identifies identical matching words using the _(a) substitution matrix, hashing procedure(b) substitution matrix, blocks(c) hashing procedure, substitution matrix(d) ktups, substitution matrix

In SW algorithm, to align two sequences of lengths of m and n _________ time is required.(a) O(mn)(b) O(m^2n)(c) O(m^2n^3)(d) O(mn^2)

FASTA and BLAST are but for larger datasets.(a) faster, more sensitive(b) faster, less sensitive(c) slower, less sensitive(d) slower, more sensitive

In a dot matrix, two sequences to be compared are written in the _____________ of the matrix.(a) horizontal and vertical axes(b) 2 parallel horizontal axes(c) 2 parallel vertical axes(d) horizontal axis (one preceding another)

For significantly aligning sequences what is the resulting structure on the plot?(a) Intercrossing lines(b) Crosses everywhere(c) Vertical lines(d) A diagonal and lines parallel to diagonal

Which of the following is not a software for dot plot analysis?(a) SIMMI(b) DOTLET(c) DOTMATCHER(d) LALIGN

For palindromic sequences, what is the structure of the dot plot?(a) 2 intersecting diagonal lines at the midpoint(b) One diagonal(c) Two parallel diagonals(d) No diagonal

The softwares for dot plot analysis perform several tasks. Which one of them is not performed by them?(a) Gap open penalty(b) Gap extend penalty(c) Expectation threshold(d) Change or mutate residues

In sequence alignment by BLAST, each word from query sequence is typically _ residues for protein sequences and _ residues for DNA sequences.(a) ten, eleven(b) three, three(c) three, eleven(d) three, ten

In Rosetta, The segments with assigned _ structures are subsequently assembled into a dimensional configuration.(a) primary, three(b) secondary, three(c) secondary, two(d) primary, three

Which of the following is not correct about the Coils and Loops?(a) They are regular structures(b) They are irregular structures(c) The loops are often characterized by sharp turns or hairpin-like structures(d) If the connecting regions are completely irregular, they belong to random coils

Explore topic-wise InterviewSolutions in .

In regular expressions, which of the following pair of pattern is wrongly matched with its significance?(a) [ ] – Or(b) { } – Not(c) ( ) – Repeats(d) Z – Any

Which of the following wrongly describes protein domains?(a) They are made up of one secondary structure(b) Defined as independently foldable units(c) They are stable structures as compared to motifs(d) They are separated by linker regions

If the data set is _______ then unless the motif has __________ amino acids in each column, the column frequencies in the motif may not be highly representative of all other occurrences of the motif.(a) small, distinct(b) small, almost identical(c) large, almost identical(d) large, distinct

The overall height of a logo position reflects how conserved the position is, and the _____ of each letter in a position reflects the _______ of the residue in the alignment.(a) height, relative frequency(b) width, relative frequency(c) height, amplitude(d) width, amplitude

_______ is an interactive program for generating sequence logos.(a) EMBOSS(b) WebLogo(c) LOGOLY(d) BLAST

Which of the following is not a member database of InterPro?(a) SCOP(b) HAMAP(c) PANTHER(d) Pfam

A length and distance that gives the highest overall probability may then be determined. Such alignments are initially found using ________(a) a particular scoring matrix only(b) an alignment algorithm only(c) an alignment algorithm and a particular scoring matrix(d) dot method

If the two sequences share significant similarity, it is extremely ______ that the extensive similarity between the two sequences has been acquired randomly, meaning that the two sequences must have derived from a common evolutionary origin.(a) unlikely(b) possible(c) likely(d) relevant

Conserved positions have _____ residues and bigger symbols.(a) fewer(b) more(c) maximum(d) minimum

In ab initio approach, generally, when a base pairing is formed, the energy of the molecule is _________ because of attractive interactions between the two strands.(a) lowered(b) increased(c) multiplied(d) kept stable

The attractive interactions lead to _________ energy.(a) increased(b) higher(c) lower(d) no change in

What is used to generate parameters for the extreme distribution?(a) The pool of alignment scores from the shuffled sequences(b) A single score of a shuffled sequence(c) The pool of alignment scores from the unshuffled sequences(d) The basic optimal score computed at the beginning of the test

When did Smith–Waterman first describe the algorithm for local alignment?(a) 1950(b) 1970(c) 1981(d) 1925

On analysis of the alignment scores of random sequences will reveal that the scores follow a different distribution than the normal distribution called the _________(a) Gumbel equal value distribution(b) Gumbel extreme value distribution(c) Gumbel end value distribution(d) Gumbel distribution

The type of algorithm that _____ predefined alignment is ______ for reasonably conserved sequences.(a) doesn’t require, more successful(b) requires, less successful(c) doesn’t require, relatively successful(d) requires, relatively successful

BLAST uses a _______ to find matching words, whereas FASTA identifies identical matching words using the _____(a) substitution matrix, hashing procedure(b) substitution matrix, blocks(c) hashing procedure, substitution matrix(d) ktups, substitution matrix

In SW algorithm, to align two sequences of lengths of m and n _________ time is required.(a) O(mn)(b) O(m^2n)(c) O(m^2n^3)(d) O(mn^2)

FASTA and BLAST are __________ but __________ for larger datasets.(a) faster, more sensitive(b) faster, less sensitive(c) slower, less sensitive(d) slower, more sensitive

In a dot matrix, two sequences to be compared are written in the _____________ of the matrix.(a) horizontal and vertical axes(b) 2 parallel horizontal axes(c) 2 parallel vertical axes(d) horizontal axis (one preceding another)

For significantly aligning sequences what is the resulting structure on the plot?(a) Intercrossing lines(b) Crosses everywhere(c) Vertical lines(d) A diagonal and lines parallel to diagonal

Which of the following is not a software for dot plot analysis?(a) SIMMI(b) DOTLET(c) DOTMATCHER(d) LALIGN

For palindromic sequences, what is the structure of the dot plot?(a) 2 intersecting diagonal lines at the midpoint(b) One diagonal(c) Two parallel diagonals(d) No diagonal

The softwares for dot plot analysis perform several tasks. Which one of them is not performed by them?(a) Gap open penalty(b) Gap extend penalty(c) Expectation threshold(d) Change or mutate residues

In sequence alignment by BLAST, each word from query sequence is typically _______ residues for protein sequences and _______ residues for DNA sequences.(a) ten, eleven(b) three, three(c) three, eleven(d) three, ten

In Rosetta, The segments with assigned _______ structures are subsequently assembled into a ______ dimensional configuration.(a) primary, three(b) secondary, three(c) secondary, two(d) primary, three

Which of the following is not correct about the Coils and Loops?(a) They are regular structures(b) They are irregular structures(c) The loops are often characterized by sharp turns or hairpin-like structures(d) If the connecting regions are completely irregular, they belong to random coils

If the data set is _ then unless the motif has ____ amino acids in each column, the column frequencies in the motif may not be highly representative of all other occurrences of the motif.(a) small, distinct(b) small, almost identical(c) large, almost identical(d) large, distinct

The overall height of a logo position reflects how conserved the position is, and the _ of each letter in a position reflects the ___ of the residue in the alignment.(a) height, relative frequency(b) width, relative frequency(c) height, amplitude(d) width, amplitude

The type of algorithm that _ predefined alignment is __ for reasonably conserved sequences.(a) doesn’t require, more successful(b) requires, less successful(c) doesn’t require, relatively successful(d) requires, relatively successful

BLAST uses a ___ to find matching words, whereas FASTA identifies identical matching words using the _(a) substitution matrix, hashing procedure(b) substitution matrix, blocks(c) hashing procedure, substitution matrix(d) ktups, substitution matrix

FASTA and BLAST are but for larger datasets.(a) faster, more sensitive(b) faster, less sensitive(c) slower, less sensitive(d) slower, more sensitive

In sequence alignment by BLAST, each word from query sequence is typically _ residues for protein sequences and _ residues for DNA sequences.(a) ten, eleven(b) three, three(c) three, eleven(d) three, ten

In Rosetta, The segments with assigned _ structures are subsequently assembled into a dimensional configuration.(a) primary, three(b) secondary, three(c) secondary, two(d) primary, three