| 1. |
Solve : Renaming and reading PDF files? |
|
Answer» Hi the names of the pdf-files are 20071119-0001.pdf, 20071119-0002.pdf, 20071119-0003.pdf, etc There seems to be a pattern here where the file name of each PDF file ends with contracter number minus two leading zeroes. Perhaps this pattern information could be useful in WRITING your script. Thanks for your response. I tried to do exactly as you told me, but it does not work yet. First I tried a normal read, but the variable text gives only garbage: Dim text, readfile, contents set readfile = FSO.OpenTextFile(src, 1, False) '1=ForReading text=readfile.Read (38) Then I tried 2 SkipLine and a ReadLine, but again, garbage: Dim readfile, contents set readfile = FSO.OpenTextFile(src, 1, False) '1=ForReading contents = readfile.SkipLine contents = readfile.SkipLine contents = readfile.ReadLine MsgBox "contents:" & contents Also the .readall method gives garbage: Dim readfile, contents set readfile = FSO.OpenTextFile(src, 1, False) '1=ForReading contents = readfile.ReadAll readfile.close 'determine contractnr in 3rd line of pdf-file dim pos pos=InStr(contents,"contractnummer: ",1) The value of contents is: "%PDF-1.4 %äüöß 2 0 obj <</Length 3 0 R/Filter/FlateDecode>> stream xœ}PËjÃ0¼ë+ö°º«ÇZ!pÒøÐ[@ÐCé-mo…ø’ß?´²›%Íìcf?šà¬N€€åå¯-Gz€ùÞ7ðÛjå›Ô6+Ï:@ï?î!áe""0ùû#""%ŠhRg""Ztè+åEè8¦ÎÆVܦ¢í„s!ŸùMí³:ükHN›fh€ÌjXG? »\…Æ}#,¾|“Aä?BÄAxñøw7ý¼{}Ú¯«ˆâ½_ºòžÊæ?IÒ?Ëð$”PÚÜÓ³9°æ‡³©?ý|{«?ü|²d—”#¹´&$ŸjÇ=™¯±pÌÁtÆ endstream endobj 3 0 obj 238 endobj 5 0 obj <</Length 6 0 R/Filter/FlateDecode/Length1 26468>> stream xœí|y\TGºhU?Óûvš¥7ú@Cƒ4‚Š(‘FÀ%ÜБZieínD'Nâ³™ÌdŸ‰f™ì‰-Mrãd½Y?É2'‰&1Û̽ó2ÉÍ""ý¾ªs01sçÞ÷þx¿ßKªÎWU_U}õmõU5õ? ""ùÚºü½+Λµ!ô*B8¡mCD¼éíüc" like I said, garbage It doesn't make a difference whether the code says: set readfile = FSO.OpenTextFile(src, 1, False,0) set readfile = FSO.OpenTextFile(src, 1, False,-2) set readfile = FSO.OpenTextFile(src, 1, False,-1) (the last number, 0, -1 or -2 , indicates the format of the file. The code does read textfiles, so I think it has to do with the pdf files, but I don't know how to fix it (=how to read the content of a pdf-file in vbscript). To answer your question, yes, the contractnr: is only in the 3rd line. (And the pdf-files I'm currently working with, the ones with "contractnr: 000001", "contractnr: 000002", etc, are just test files. In reality there are different contracts scanned as pdf and there is no pattern in their contractnumbers, so that's why I want to read the pdf-files: to read the unique contractnumber, and to give the pdf-file the name of that contractnumber. Hope you have any more suggestions on how to read the contractnumber of a pdf-file (by the way, it is also possible to scan the files as jpg or TIF-files instead of pdf-files, but I think it is impossible to read a contractnumber from a jpg or tif-file). BEST, Leonard I should have removed my post as soon as I realized that PDF files are not your normal text files (duh! ). In fact they are proprietary Adobe files. You will need to find (use Google) an ActiveX component (either a DLL or OCX module) which can handle PDF files, specifically reading them. I found many for creating PDF but they are quite pricey ($$$). Good luck. another way is to convert pdf files to text for reading you can download tools from 3rd party such as xpdf and others. after converting,you can read them using your script. If you have experience is other scripting languages such as Perl/Python,there may be modules for reading PDF files, you can try them also. |
|