1.

Accession Numbers Are Not Present For Fasta Sequence Files.if You Parse A Fasta Sequence Format File With Bio::seqio The Sequences Won't Have The Accession Number. What To Do?

Answer»

All the data is in the $seq->display_id it just needs to be parsed out. Here is some code to set the accession number.

my ($gi,$acc,$locus);

(undef,$gi,undef,$acc,$locus) = SPLIT(/|/,$seq->display_id);

$seq->accession_number($acc);

Why don't we just go ahead and do this? For one, we don't make any assumptions about the format of the ID part of the sequence. Perhaps the parser code could try and detect if it is a GenBank formatted ID and go ahead and set the accession number field. It WOULD be trivial to do, just no one has volunteered the time - put it on the Project priority LIST if you think it is important and BETTER yet, volunteer the code patch!

All the data is in the $seq->display_id it just needs to be parsed out. Here is some code to set the accession number.

my ($gi,$acc,$locus);

(undef,$gi,undef,$acc,$locus) = split(/|/,$seq->display_id);

$seq->accession_number($acc);

Why don't we just go ahead and do this? For one, we don't make any assumptions about the format of the ID part of the sequence. Perhaps the parser code could try and detect if it is a GenBank formatted ID and go ahead and set the accession number field. It would be trivial to do, just no one has volunteered the time - put it on the Project priority list if you think it is important and better yet, volunteer the code patch!



Discussion

No Comment Found