1.

How Come I Am Not Getting Any Text From The Pdf Document?

Answer»

Text extraction from a pdf document is a complicated task and there are many factors involved that effect the POSSIBILITY and accuracy of text extraction. It would be helpful to the PDFBox TEAM if you could try a couple things.

  • Open the PDF in Acrobat and try to extract text from there. If Acrobat can extract text then PDFBox should be able to as well and it is a BUG if it cannot. If Acrobat cannot extract text then PDFBox ‘probably’ cannot either.
  • It might really be an image instead of text. Some PDF documents are just images that have been scanned in. You can tell by using the SELECTION tool in Acrobat, if you can’t select any text then it is probably an image.

Text extraction from a pdf document is a complicated task and there are many factors involved that effect the possibility and accuracy of text extraction. It would be helpful to the PDFBox team if you could try a couple things.



Discussion

No Comment Found