40 + Interview Questions in Gene and Promoter Prediction in Bioinformatics Page 1 InterviewSolution

1.	Which of the following is true regarding the methods of gene prediction?(a) They solely consist of a type called ab initio–based methods(b) The ab initio–based approach predicts genes based on the given sequence alone(c) The ab initio–based approach predicts genes based on the given sequence and relative homology data(d) They solely consist of a type called homology-based approachesThe question was posed to me in homework.I need to ask this question from Categories of Gene Prediction Programs in section Gene and Promoter Prediction of Bioinformatics
Answer» CORRECT answer is (B) The ab initio–based APPROACH predicts genes based on the GIVEN sequence alone The explanation: The current gene prediction methods can be classified into two major CATEGORIES, ab initio–based and homology-based approaches. The ab initio–based approach predicts genes based on the given sequence alone.

Discussion

2.	The putative ORF can be translated into a protein sequence, which is then used to search against a protein database.(a) True(b) FalseThis question was posed to me in semester exam.This intriguing question originated from Gene Prediction in Prokaryotes topic in division Gene and Promoter Prediction of Bioinformatics
Answer» RIGHT option is (a) True Explanation: The putative frame is further manually confirmed by the PRESENCE of other signals such as a start codon and Shine–Delgarno sequence. Detection of homologs from this SEARCH is PROBABLY the STRONGEST indicator of a protein-coding frame.

Discussion

3.	Which of the following is untrue about EST2Genome?(a) It is a web-based program purely based on the sequence alignment approach to define intron–exon boundaries(b) It compares an EST (or cDNA) sequence with a genomic DNA sequence containing the corresponding gene(c) The alignment is rarely done using a dynamic programming–based algorithm(d) Advantage of the approach is the ability to find very small exons and alternatively spliced exons that are very difficult to predict by any ab initio–type algorithmsI got this question in a national level competition.Query is from Gene Prediction in Eukaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» Right choice is (C) The alignment is rarely done using a dynamic programming–based algorithm To explain I would say: The alignment is done using dynamic programming–based algorithm. Another advantage is that there is no need for model training, which provides much more flexibility for gene prediction. The limitation is that EST or cDNA SEQUENCES OFTEN contain errors or even introns if the TRANSCRIPTS are not completely spliced before reverse transcription.

Discussion

4.	GENSCAN is awebbased program that makes predictions based on fifth-order HMMs.(a) True(b) FalseI had been asked this question in a job interview.My query is from Gene Prediction in Eukaryotes in section Gene and Promoter Prediction of Bioinformatics
Answer» Correct choice is (a) True For explanation I would say: It COMBINES hexamer frequencies with coding signals (initiation CODONS, TATA box, cap site, poly-A, etc.) in prediction. Putative exons are assigned a probability score (P) of being a true exon. Only predictions with P > 0.5 are deemed RELIABLE. This PROGRAM is trained for sequences from VERTEBRATES, Arabidopsis, and maize. It has been used extensively in annotating the human genome.

Discussion

5.	The use of Markov models in gene finding exploits the fact that oligonucleotide distributions in the coding regions are different from those for the noncoding regions.(a) True(b) FalseThe question was asked at a job interview.This interesting question is from Gene Prediction in Prokaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» Correct ANSWER is (a) True The explanation: These can be REPRESENTED with various orders of Markov models. Since a fixed-order Markov chain describes the probability of a particular nucleotide that DEPENDS on previous k nucleotides, the longer the oligomer unit, the more non-randomness can be DESCRIBED for the coding region. Therefore, the higher the order of a Markov MODEL, the more accurately it can predict a gene.

Discussion

6.	The presence of these codons at The beginning of the frame _____ give a clear indication of the translation initiation site.(a) always(b) does not necessarily(c) does not(d) neverI had been asked this question in final exam.My question is from Gene Prediction in Prokaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» Right OPTION is (b) does not necessarily To explain I would say: Because there may be multiple ATG, GTG, or TGT codons in a frame, the presence of these codons at the beginning of the frame does not necessarily GIVE a clear indication of the translation initiation site. Instead, to HELP identify this initiation codon, other features associated with translation are USED.

Discussion

7.	In the ab initio–based approaches—they rely on two major features associated with genes: one of thembeing gene content, which is statistical description of coding regions.(a) True(b) FalseThis question was posed to me in homework.This key question is from Categories of Gene Prediction Programs in chapter Gene and Promoter Prediction of Bioinformatics
Answer» Correct answer is (a) True Easiest explanation: It has been observed that NUCLEOTIDE composition and statistical PATTERNS of the coding regions tend to vary SIGNIFICANTLY from those of the non-coding regions. The UNIQUE features can be detected by employing probabilistic models such as Markov models or hidden Markov models to HELP distinguish coding from non-coding regions.

Discussion

8.	RBS finder is a UNIX program that uses the prediction output from Glimmer and searches for the Shine–Delgarno sequences in the vicinity of predicted start sites.(a) True(b) FalseI have been asked this question in final exam.My enquiry is from Categories of Gene Prediction Programs topic in section Gene and Promoter Prediction of Bioinformatics
Answer» RIGHT OPTION is (a) True Explanation: A high-scoring site is found by the intrinsic PROBABILISTIC model, a START codon is CONFIRMED. Otherwise the program moves to other putative translation start sites and repeats the process.

Discussion

9.	Which of the following is untrue about Homology-Based Programs?(a) They are based on the fact that exon structures and exon sequences of related species are less conserved(b) This approach assumes that the database sequences are correct(c) It is a reasonable assumption in light of the fact that many homologous sequences to be compared with are derived from cDNA or expressed sequence tags (ESTs) of the same species(d) Potential coding frames in a query sequence are translated and used to align with closest protein homologs found in databasesThe question was posed to me by my college director while I was bunking the class.My question is taken from Gene Prediction in Eukaryotes topic in section Gene and Promoter Prediction of Bioinformatics
Answer» Correct choice is (a) They are based on the fact that EXON STRUCTURES and exon sequences of related species are less conserved Explanation: Homology-based programs are based on the fact that exon structures and exon sequences of related species are highly conserved. When POTENTIAL coding frames in a QUERY sequence are translated and used to align with closest protein homologs FOUND in databases, near perfectly matched regions can be used to reveal the exon boundaries in the query.

Discussion

10.	Which of the following is a wrong statement?(a) Prokaryotes include bacteria and Archaea(b) Prokaryotes have relatively large genomes(c) Prokaryotes have relatively small genomes(d) In Prokaryotes, The gene density in the genomes is high, with more than 90% of a genome sequence containing coding sequenceThe question was asked in semester exam.Question is taken from Gene Prediction in Prokaryotes in division Gene and Promoter Prediction of Bioinformatics
Answer» The CORRECT choice is (b) Prokaryotes have relatively large genomes The best I can explain: Prokaryotes have relatively small genomes with sizes RANGING from0.5 to 10Mbp (1Mbp=106 bp). Each prokaryotic gene is COMPOSED of a SINGLE contiguous stretch of ORF CODING for a single protein or RNA with no interruptions within a gene.

Discussion

11.	Which of the following wrong about HMM GENE?(a) It is also an HMM-based web program(b) It uses a criterion called the conditional maximum likelihood to discriminate coding from non-coding features(c) HMM prediction is unbiased toward the locked region(d) If a sequence already has a sub-region identified as coding region, which may be based on similarity with cDNAs or proteins in a database, these regions are locked as coding regionsI got this question in an interview.The query is from Gene Prediction in Eukaryotes in division Gene and Promoter Prediction of Bioinformatics
Answer» The CORRECT choice is (c) HMM prediction is unbiased toward the LOCKED region Best explanation: An HMM prediction is subsequently made with a bias toward the locked region and is extended from the locked region to predict the rest of the GENE CODING regions and even neighboring genes. The program is in a way a hybrid algorithm that uses both ab initio-based and homology-based CRITERIA.

Discussion

12.	Because different prediction programs have different levels of sensitivity and specificity, it makes sense to combine results of multiple programs based on consensus. This idea has prompted development of consensus-based algorithms.(a) True(b) FalseThe question was asked in an international level competition.The above asked question is from Gene Prediction in Eukaryotes topic in section Gene and Promoter Prediction of Bioinformatics
Answer» Correct answer is (a) True For explanation: These programs work by retaining common predictions agreed by most programs and removing inconsistent predictions. Such an integrated APPROACH may IMPROVE the specificity by correcting the false positives and the PROBLEM of over prediction. However, since this procedure punishes novel predictions, it may lead to lowered sensitivity and MISSED predictions.

Discussion

13.	Which of the following is untrue about translation and transcription?(a) The first is capping is at the 5’ end of the transcript which involves methylation at the initial residue of the RNA(b) The splicing process involves a large RNA-protein complex called spliceosome(c) The second event is splicing, which is the process of removing exons and joining introns(d) The second event is splicing, which is the process of removing introns and joining exonsThe question was asked in quiz.My question is from Gene Prediction in Eukaryotes in chapter Gene and Promoter Prediction of Bioinformatics
Answer» Correct answer is (c) The second event is SPLICING, which is the process of removing exons and joining introns To elaborate: The reaction requires intermolecular interactions between a pair of nucleotides at each end of an intron and the RNA component of the spliceosome. To MAKE the matter EVEN more complex, some eukaryotic genes can have their transcripts spliced and joined in different WAYS to generate more than one transcript per gene. This is the phenomenon of alternative splicing.

Discussion

14.	In Ab Initio–Based Programs for Gene Prediction– Gene content refers to coding statistics, which includes nonrandom nucleotide distribution, amino acid distribution, synonymous codon usage, and hexamer frequencies.(a) True(b) FalseThis question was addressed to me by my college professor while I was bunking the class.This key question is from Gene Prediction in Eukaryotes in division Gene and Promoter Prediction of Bioinformatics
Answer» Right option is (a) True Easy explanation: Among these features, the hexamer frequencies appear to be most discriminative for coding potentials. To derive an assessment for this FEATURE, HMMscan be used, which require proper training. In ADDITION to HMMs, neural network-based ALGORITHMS are also common in the GENE prediction field.

Discussion

15.	TwinScan is also a similarity-based gene-finding Server and it is similar to GenomeScan in that it uses GenScan to predict all possible exons from the genomic sequence.(a) True(b) FalseThe question was asked in unit test.Enquiry is from Gene Prediction in Eukaryotes in division Gene and Promoter Prediction of Bioinformatics
Answer» Right OPTION is (a) True The best I can EXPLAIN: The putative EXONS are used for BLAST searching to find closest homologs. The putative exons and homologs from BLAST searching are aligned to identify the best match. Only the closest match from a genome database is used as a template for refining the previous exon selection and exon boundaries.

Discussion

16.	The drawback of Homology-based approach is its reliance on the presence of homologs in databases.(a) True(b) FalseThis question was posed to me in an interview.Query is from Gene Prediction in Eukaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» The correct choice is (a) True For explanation: If the homologs are not available in the DATABASE, the method cannot be USED. NOVEL genes in a new SPECIES cannot be discovered without matches in the database. A number of publicly available programs USE this approach.

Discussion

17.	Which of the following is untrue?(a) Eukaryotic nuclear genomes are much larger than prokaryotic ones(b) They tend to have a very high gene density(c) Eukaryotic nuclear genomes’ sizes range from 10 Mbp to 670 Gbp (1 Gbp = 109 bp)(d) They tend to have a very high gene densityI got this question in an interview for job.My doubt stems from Gene Prediction in Eukaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» The CORRECT answer is (b) They tend to have a very high gene density To elaborate: In HUMANS, for instance, only 3% of the genome codes for genes, with about 1 gene PER 100 kbp on average. The space between genes is OFTEN very LARGE and rich in repetitive sequences and transposable elements.

Discussion

18.	Which of the following is a wrong statement regarding the conventional determination of open reading frames?(a) Without the use of specialized programs, prokaryotic gene identification can rely on manual determination of ORFs and major signals related to prokaryotic genes(b) Prokaryotic DNA is first subject to conceptual translation in all six possible frames, two frames forward and four frames reverse(c) A stop codon occurs in about every twenty codons by chance in a noncoding region(d) Prokaryotic DNA is first subject to conceptual translation in all six possible frames, three frames forward and three frames reverseI had been asked this question during a job interview.My enquiry is from Gene Prediction in Prokaryotes topic in division Gene and Promoter Prediction of Bioinformatics
Answer» Right option is (b) Prokaryotic DNA is first subject to conceptual translation in all six possible frames, two frames forward and four frames reverse Easy explanation: Prokaryotic DNA is first subject to conceptual translation in all six possible frames, three frames forward and three frames reverse. Because a stop CODON occurs in about every twenty codons by chance in a noncoding region, a frame LONGER than thirty codons without interruption by stop codons is suggestive of a GENE coding region, although the threshold for an ORF is normally set EVEN higher at fifty or SIXTY codons.

Discussion

19.	Which of the following is untrue about FGENES?(a) It stands for FindGenes(b) It is a web-based program that uses LDA(c) It is used to determine whether a signal is an exon(d) It does not make a use of HMMsThe question was asked in a national level competition.I'm obligated to ask this question of Gene Prediction in Eukaryotes in portion Gene and Promoter Prediction of Bioinformatics
Answer» The correct choice is (d) It does not MAKE a use of HMMs The best explanation: In addition to FGENES, there are MANY variants of the program. Some programs, such as FGENESH, make use of HMMs. There are others, such as FGENESH C, that are SIMILARITY BASED. Some programs, such as FGENESH+, combine both ab initio and similarity-based APPROACHES.

Discussion

20.	GRAIL is a web-based program that is based on a neural network algorithm Which is trained on several statistical features such as splice junctions, start and stop codons, poly-A sites, promoters, and CpG islands.(a) True(b) FalseThe question was asked by my school principal while I was bunking the class.My question is taken from Gene Prediction in Eukaryotes in section Gene and Promoter Prediction of Bioinformatics
Answer» The correct OPTION is (a) True The explanation is: The program SCANS the query SEQUENCE with windows of variable lengths and scores for coding potentials and FINALLY produces an output that is the result of exon candidates. The program is currently trained for human, mouse, Arabidopsis, Drosophila, and Escherichia coli sequences.

Discussion

21.	There are ____ possible stop codons, identification of which is straightforward.(a) five(b) two(c) ten(d) threeThis question was posed to me in final exam.Asked question is from Gene Prediction in Prokaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» Correct OPTION is (d) THREE The explanation is: At the end of the PROTEIN coding region is a stop codon that causes translation to stop. There are three possible stop codons, identification of which is straightforward. Many prokaryotic genes are transcribed TOGETHER as ONE operon.

Discussion

22.	In bacteria, the majority of genes have a start codon ATG (orAUG in mRNA; because prediction is done at the DNA level, T is used in place of U), which codes for methionine.(a) True(b) FalseThe question was asked in an interview for job.This key question is from Gene Prediction in Prokaryotes topic in chapter Gene and Promoter Prediction of Bioinformatics
Answer» The correct answer is (a) True To explain I would say: Occasionally, GTG and TTG are used as alternative start CODONS. But methionine is STILL the actual AMINO acid inserted at the first POSITION.

Discussion

23.	The conventional determination of open reading methods identify only typical genes and tend to miss atypical genes in which the rule of codon bias is not strictly followed.(a) True(b) FalseThis question was addressed to me in an online quiz.This question is from Gene Prediction in Prokaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» The CORRECT choice is (a) True The explanation is: These statistical methods, which are based on EMPIRICAL rules, examine the statistics of a SINGLE nucleotide (either G or C). To improve the PREDICTION accuracies, the new generation of prediction algorithms uses more sophisticated statistical MODELS.

Discussion

24.	The homology-based method makes predictions based on significant matches of the query sequence with sequences of known genes.(a) True(b) FalseThe question was asked in an online quiz.This question is from Categories of Gene Prediction Programs in chapter Gene and Promoter Prediction of Bioinformatics
Answer» CORRECT choice is (a) True The best I can explain: For instance, if a TRANSLATED DNA sequence is found to be similar to a known protein or protein FAMILY from a database search, this can be strong EVIDENCE that the region codes for a protein. Alternatively, when possible exons of a genomic DNA region match a sequenced cDNA, this also provides EXPERIMENTAL evidence for the existence of a coding region.

Discussion

25.	Which of the following is untrue about GeneMark?(a) It is a suite of gene prediction programs based on the fifth-order HMMs(b) The main program is trained on a number of complete microbial genomes(c) A GeneMark heuristic program can be used to improve accuracy(d) If the sequence to be predicted is from a non-listed organism, the most closely related organism can be chosen as the basis for computationI had been asked this question in semester exam.My query is from Categories of Gene Prediction Programs topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» Right choice is (c) A GeneMark heuristic program can be USED to improve accuracy Best explanation: Another option for PREDICTING genes from a new ORGANISM is to use a self-trained program GeneMarkS as long as the user can provide at least 100 KBP of sequence on which to train the model. If the query sequence is shorter than 100 kbp, a GeneMark heuristic program can be used with some loss of accuracy. In addition to predicting prokaryotic genes, GeneMark ALSO has a variant for eukaryotic gene prediction using HMM.

Discussion

26.	Most vertebrate genes use __________ as the translation start codon and have a uniquely conserved flanking sequence call a Kozak sequence (CCGCCATGG).(a) AAG(b) ATG(c) AUG(d) AGGThe question was posed to me in an interview.I want to ask this question from Gene Prediction in Eukaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» The correct choice is (b) ATG To explain: In addition, most of these genes have a high density of CG DINUCLEOTIDES near the transcription start SITE. This REGION is referred to as a CPG island (p refers to the phosphodiester bond connecting the two nucleotides), which helps to identify the transcription initiation site of a eukaryotic gene. The poly-A signal can also help locate the final CODING sequence.

Discussion

27.	GenomeScan is a web-based server that combines GENSCAN prediction results with BLASTX similarity searches.(a) True(b) FalseI got this question in an online quiz.The doubt is from Gene Prediction in Eukaryotes in section Gene and Promoter Prediction of Bioinformatics
Answer» Correct choice is (a) True Explanation: The user PROVIDES genomic DNA and PROTEIN sequences from related species. The genomic DNA is translated in all six frames to cover all possible exons. The translated exons are then USED to compare with the user-supplied protein sequences.

Discussion

28.	The main issue in prediction of eukaryotic genes is the identification of exons, introns, and splicing sites.(a) True(b) FalseThe question was posed to me at a job interview.This intriguing question originated from Gene Prediction in Eukaryotes in chapter Gene and Promoter Prediction of Bioinformatics
Answer» Correct answer is (a) True Easy EXPLANATION: From a computational point of view, it is a very complex and challenging problem. Because of the PRESENCE of split GENE structures, alternative SPLICING, and very low gene densities, the difficulty of finding GENES in such an environment is likened to finding a needle in a haystack.

Discussion

29.	Which of the following is a wrong statement regarding Gene Prediction Using Markov Models and Hidden Markov Models?(a) Markov models and HMMs can be very helpful in providing finer statistical description of a gene(b) A Markov model describes the probability of the distribution of nucleotides in a DNA sequence(c) In a Markov model the conditional probability of a particular sequence position depends on k alternate positions(d) A zero-order Markov model assumes each base occurs independently with a given probabilityThe question was posed to me by my school principal while I was bunking the class.The question is from Gene Prediction in Prokaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» Correct option is (c) In a Markov MODEL the conditional probability of a particular sequence position depends on k alternate POSITIONS For explanation: In a Markov model the conditional probability of a particular sequence position depends on k previous positions. In this case, k is the order of a Markov model. In a zero-order Markov model, it is often the case for noncoding SEQUENCES. A first-order Markov model ASSUMES that the occurrence of a base depends on the base preceding it. A second-order model LOOKS at the preceding two bases to determine which base follows, which is more characteristic of codons in a coding sequence.

Discussion

30.	Which of the following is untrue about Ab Initio–Based Programs for Gene Prediction?(a) The goal of the ab initio gene prediction programs is to discriminate exons from noncoding sequences(b) The goal is joining exons together in the correct order(c) The main difficulty is correct identification of exons(d) To predict exons, the algorithms rely solely on gene signalsThis question was addressed to me during an interview.Query is from Gene Prediction in Eukaryotes topic in portion Gene and Promoter Prediction of Bioinformatics
Answer» The correct ANSWER is (d) To predict exons, the algorithms rely solely on gene signals Easiest explanation: To predict exons, the algorithms rely on two FEATURES, gene signals and gene content. Signals include gene start and stop sites and putative SPLICE sites, recognizable consensus SEQUENCES such as poly-A sites.

Discussion

31.	Which of the following is untrue about Prediction Using Discriminant Analysis for Gene Prediction?(a) QDA draws a curved line based on a quadratic function(b) LDA works by drawing a diagonal line that best separates coding signals from noncoding signals based on knowledge learned from training data sets of known gene structures(c) Some gene prediction algorithms rely on discriminant analysis, either LDA or quadratic discriminant analysis (QDA), to improve accuracy(d) LDA works by plotting a three-dimensional graph of coding signals versus all potential 3’ splice site positionsI have been asked this question in an interview for job.I would like to ask this question from Gene Prediction in Eukaryotes in division Gene and Promoter Prediction of Bioinformatics
Answer» Right option is (d) LDA works by plotting a three-dimensional graph of coding signals versus all potential 3’ splice site positions Explanation: QDA draws a curved line BASED on a quadratic function instead of drawing a STRAIGHT line to separate coding and noncoding features. This STRATEGY is designed to be more flexible and provide a more optimal separation between the data POINTS.

Discussion

32.	Which of the following is untrue about Prediction Using Neural Networks for Gene Prediction?(a) A neural network is constructed with multiple layers; the input, output, and hidden layers(b) The input is the gene sequence with intron and exon signals(c) The model is not fed with a sequence of known gene structure(d) The output is the probability of an exon structureThe question was posed to me in an online quiz.My question is from Gene Prediction in Eukaryotes topic in section Gene and Promoter Prediction of Bioinformatics
Answer» Correct choice is (c) The model is not fed with a SEQUENCE of known gene structure To explain: Between input and output, there MAY be one or SEVERAL hidden layers where the machine learning takes place. The machine learning process starts by feeding the model with a sequence of known gene structure. The gene structure information is separated into several classes of features such as HEXAMER frequencies, splice sites, and GC composition during training. The weight functions in the hidden layers are adjusted during this process to recognize the NUCLEOTIDE patterns and their relationship with known structures.

Discussion

33.	Which of the following is a wrong statement regarding TESTCODE method?(a) This is based on the nucleotide composition of the third position of a codon(b) In practice, because genes can be in any of the six frames, the statistical patterns are computed for all possible frames(c) It is implemented in the commercial GCG package(d) It exploits the fact that the third codon nucleotides in a coding region fails to repeat themselvesI have been asked this question in exam.This key question is from Gene Prediction in Prokaryotes in section Gene and Promoter Prediction of Bioinformatics
Answer» Correct option is (d) It exploits the fact that the third codon nucleotides in a coding region fails to REPEAT themselves Explanation: In a coding SEQUENCE, it has been OBSERVED that this position has a preference to use G or C over A or T. By plotting the GC composition at this position, REGIONS with values significantly above the random level can be identified, which are indicative of the presence of ORFs. This method exploits the fact that the third codon nucleotides in a coding region TEND to repeat themselves.

Discussion

34.	Shine-Delgarno sequence, which is a stretch of purine-rich sequence complementary to 16S rRNA in the ribosome.(a) True(b) FalseThe question was posed to me by my school teacher while I was bunking the class.Asked question is from Gene Prediction in Prokaryotes in section Gene and Promoter Prediction of Bioinformatics
Answer» Correct option is (a) True Easiest EXPLANATION: It is located immediately downstream of the TRANSCRIPTION initiation site and slightly upstream of the translation start CODON. In many bacteria, it has a CONSENSUS motif of AGGAGGT. Identification of the ribosome binding site can help locate the start codon.

Discussion

35.	In the ab initio–based approaches—they rely on two major features associated with genes: one of them being the existence of gene signals, which include start and stop codons, intron splice signals, transcription factor binding sites, etc.(a) True(b) FalseThis question was posed to me in examination.The doubt is from Categories of Gene Prediction Programs in chapter Gene and Promoter Prediction of Bioinformatics
Answer» Right option is (a) True Explanation: They also INCLUDE ribosomal binding SITES, and polyadenylation (poly-A) sites. In addition, the TRIPLET codon structure LIMITS the coding frame length to multiples of three, which can be USED as a condition for gene prediction.

Discussion

36.	Which of the following is untrue about SGP-1?(a) The program translates all potential exons in each sequence and does pair wise alignment for the translated protein sequences using a dynamic programming approach(b) The near-perfect matches at the protein level define coding regions(c) It is a similarity-based web program that aligns two genomic DNA sequences from distinctly related organisms(d) It stands for Syntenic Gene PredictionI had been asked this question in an interview.Enquiry is from Gene Prediction in Eukaryotes topic in chapter Gene and Promoter Prediction of Bioinformatics
Answer» Right option is (c) It is a similarity-based web PROGRAM that aligns two genomic DNA sequences from DISTINCTLY related organisms Explanation: It aligns two genomic DNA sequences from closely related organisms. Similar to EST2Genome, there is no training NEEDED. The limitation is the need for two homologous sequences having similar genes with similar exon structures; if this condition is not met, a GENE escapes detection from one sequence when there is no counterpart in ANOTHER sequence.

Discussion

37.	Because a protein-encoding gene is composed of nucleotides in triplets as codons, more effective Markov models are built in sets of three nucleotides, describing nonrandom distributions of trimers or hexamers, and so on.(a) True(b) FalseThis question was posed to me in an online interview.Question is from Gene Prediction in Prokaryotes topic in division Gene and Promoter Prediction of Bioinformatics
Answer» Correct answer is (a) True The best I can explain: The PARAMETERS of a Markov Model have to be trained using a SET of SEQUENCES with known gene LOCATIONS. Once the parameters of the model are established, it can be used to compute the nonrandom distributions of trimers or HEXAMERS in a new sequence to find regions that are compatible with the statistical profiles in the learning set.

Discussion

38.	Which of the following is untrue about PredictionUsing NeuralNetworks for Gene Prediction?(a) A neural networkis a statistical model with a special architecture for pattern recognition and classification(b) It is composed of a network of mathematical variables(c) They resembles ab initio approaches(d) The variables in NeuralNetworks resemble the biological nervous system, with variables or nodes connected by weighted functions that are analogous to synapsesThis question was posed to me in class test.The above asked question is from Gene Prediction in Eukaryotes topic in division Gene and Promoter Prediction of Bioinformatics
Answer» Correct ANSWER is (c) They resembles ab initio approaches Easy explanation: Aspect of the model that makes it look like a BIOLOGICAL neural network is its ability to “learn” and then make predictions after being trained. The network is able to process information and modify parameters of the weight functions between variables during the TRAINING stage. Once it is trained, it is able to make automatic predictions about the unknown. This is QUITE different than the ab initio METHODS.

Discussion

39.	FGENESB is a web-based program that is also based on fifth-order HMMs for detecting coding regions.(a) True(b) FalseThis question was posed to me by my college director while I was bunking the class.My doubt stems from Categories of Gene Prediction Programs in chapter Gene and Promoter Prediction of Bioinformatics
Answer» CORRECT choice is (a) True The explanation: The program is specifically trained for bacterial sequences. It uses the Vertibi ALGORITHM to find an optimal match for the query SEQUENCE with the intrinsic MODEL. A linear discriminant analysis (LDA) is used to further DISTINGUISH coding signals from non-coding signals.

Discussion

40.	Which of the following is untrue about Glimmer?(a) It stands for Gene Locator and Interpolated Markov Modeler(b) It is a UNIX program from TIGR(c) It does not necessarily usethe IMM algorithm(d) It is used topredict potential coding regionsI got this question during an interview.I'm obligated to ask this question of Categories of Gene Prediction Programs in portion Gene and Promoter Prediction of Bioinformatics
Answer» The correct option is (c) It does not necessarily usethe IMM algorithm Best EXPLANATION: The computation consists of two steps, namely model building and gene prediction. The model building involves training by the input sequence, which optimizes the parameters of the model. In an actual gene prediction, the overlapping frames are “flagged” to alert the USER for further inspection. Glimmer also has a variant, GlimmerM, for EUKARYOTIC gene prediction.

Discussion

Explore topic-wise InterviewSolutions in .

GENSCAN is awebbased program that makes predictions based on fifth-order HMMs.(a) True(b) FalseI had been asked this question in a job interview.My query is from Gene Prediction in Eukaryotes in section Gene and Promoter Prediction of Bioinformatics

The drawback of Homology-based approach is its reliance on the presence of homologs in databases.(a) True(b) FalseThis question was posed to me in an interview.Query is from Gene Prediction in Eukaryotes topic in portion Gene and Promoter Prediction of Bioinformatics

There are ____ possible stop codons, identification of which is straightforward.(a) five(b) two(c) ten(d) threeThis question was posed to me in final exam.Asked question is from Gene Prediction in Prokaryotes topic in portion Gene and Promoter Prediction of Bioinformatics

GenomeScan is a web-based server that combines GENSCAN prediction results with BLASTX similarity searches.(a) True(b) FalseI got this question in an online quiz.The doubt is from Gene Prediction in Eukaryotes in section Gene and Promoter Prediction of Bioinformatics