1.

Solve : Renaming and reading PDF files?

Answer»

Hi

i want to make a program that has the following requirements:

- there are some pdf-files in directory: C:\Test\temp
- the names of the pdf-files are 20071119-0001.pdf, 20071119-0002.pdf, 20071119-0003.pdf, etc
- the 3rd LINE of the first pdf-file is "contractnr: 000001", of the second pdf-file "contractnr: 000002", the 3rd "contractnr: 000003", etc
- the pdf-files have to be moved to the directory C:\Test\contract\000001\000001.pdf

This is the code I have, my problem is how to read the pdf-file

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim FSO
Dim srcFolder, dstFolder
Dim i
Dim contractnr
Dim pdfBase
Dim myFiles

Set FSO = CreateObject("Scripting.FileSystemObject")

srcFolder="c:\test\temp" 'Server source folder location
dstFolder="c:\test\contract" 'Destination Folder as desired

i=0 'array counter
For Each myFiles In FSO.GetFolder(srcFolder).Files
'MsgBox srcFolder & FSO.GetExtensionName(myFiles)
If FSO.GetExtensionName(myFiles) = "pdf" Then
pdfBase = FSO.GetBaseName(myFiles)
'open pdf file and determine contractnr in 3rd line of pdf-file

'I haven't got a clue whether this is possible?
contractnr = "00000" & i+1

'Copy the file to the new directory: dstFolder="c:\test\contract"
'MsgBox "i: " & i & "srcFolder:" & srcFolder & "\" & pdfBase & ".pdf" & " , dstFolder: " & dstFolder & "\" & contractnr & ".pdf"
src = srcFolder & "\" & pdfBase & ".pdf"
dest = dstFolder & "\" & contractnr & "\" & contractnr & ".pdf"
MsgBox src
MsgBox dest
FSO.CreateFolder dstFolder & "\" & contractnr
FSO.MoveFile src, dest

i=i+1

End If
Next
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''






Hope anybody can help!

Thanks!
Stronky
PDF like most text files only allow sequential access. You could program two reads and then a third to pick up the contractnr: 000001 data in the third line. You can also use the .skipline method (twice) to skip the first two lines and then use the .readline method to get the data in the third line.

Another WAY would be to use .readall method to dump the entire contents of the file into a data name. Using the instr function you could find the start of contractnr: in the data name. Knowing that contractnr: is 12 bytes (including the trailing space) and the contracter number is 6 bytes, you could use the mid function to pluck the contracter number out of the data field.

We don't have enough information on this PDF file. Is contractnr: only found in the third line? Is it in every file?

Quote

the names of the pdf-files are 20071119-0001.pdf, 20071119-0002.pdf, 20071119-0003.pdf, etc
- the 3rd line of the first pdf-file is "contractnr: 000001", of the second pdf-file "contractnr: 000002", the 3rd "contractnr: 000003", etc

There seems to be a pattern here where the file name of each PDF file ends with contracter number minus two leading zeroes. Perhaps this pattern information could be useful in WRITING your script.

Thanks for your response. I tried to do exactly as you told me, but it does not work yet.

First I tried a normal read, but the variable text gives only garbage:

Dim text, readfile, contents
set readfile = FSO.OpenTextFile(src, 1, False) '1=ForReading
text=readfile.Read (38)



Then I tried 2 SkipLine and a ReadLine, but again, garbage:

Dim readfile, contents
set readfile = FSO.OpenTextFile(src, 1, False) '1=ForReading
contents = readfile.SkipLine
contents = readfile.SkipLine
contents = readfile.ReadLine
MsgBox "contents:" & contents


Also the .readall method gives garbage:


Dim readfile, contents
set readfile = FSO.OpenTextFile(src, 1, False) '1=ForReading
contents = readfile.ReadAll
readfile.close

'determine contractnr in 3rd line of pdf-file
dim pos
pos=InStr(contents,"contractnummer: ",1)


The value of contents is: "%PDF-1.4
%äüöß
2 0 obj
<</Length 3 0 R/Filter/FlateDecode>>
stream
xœ}PËjÃ0¼ë+ö°º«ÇZ!pÒøÐ[@ÐCé-mo…ø’ß?´²›%Íìcf?šà¬N€€åå¯-Gz€ùÞ7ðÛjå›Ô6+Ï:@ï?î!áe""0ùû#""%ŠhRg""Ztè+åEè8¦ÎÆVܦ¢í„s!ŸùMí³:ükHN›fh€ÌjXG? ­»\…Æ}#,¾|“Aä?BÄAxñøw7ý¼{}Ú¯«ˆâ½_ºòžÊæ?IÒ?Ëð$”PÚÜÓ³9°æ‡³©?ý|{«?ü|²d—”#¹´&$ŸjÇ=™¯±pÌÁtÆ
endstream
endobj

3 0 obj
238
endobj

5 0 obj
<</Length 6 0 R/Filter/FlateDecode/Length1 26468>>
stream
xœí|y\TGºhU?Óûvš¥7ú@Cƒ4‚Š(‘FÀ%ÜБZieínD'Nâ³™ÌdŸ‰f™ì‰-Mrãd½Y?É2'‰&1Û̽ó2ÉÍ""ý¾ªs01sçÞ÷þx¿ßKªÎWU_U}õmõU5õ?
""ùÚºü½+Λµ!ô*B8¡mCD¼éíüc"

like I said, garbage

It doesn't make a difference whether the code says:
set readfile = FSO.OpenTextFile(src, 1, False,0)
set readfile = FSO.OpenTextFile(src, 1, False,-2)
set readfile = FSO.OpenTextFile(src, 1, False,-1)

(the last number, 0, -1 or -2 , indicates the format of the file.


The code does read textfiles, so I think it has to do with the pdf files, but I don't know how to fix it (=how to read the content of a pdf-file in vbscript).

To answer your question, yes, the contractnr: is only in the 3rd line. (And the pdf-files I'm currently working with, the ones with "contractnr: 000001", "contractnr: 000002", etc, are just test files. In reality there are different contracts scanned as pdf and there is no pattern in their contractnumbers, so that's why I want to read the pdf-files: to read the unique contractnumber, and to give the pdf-file the name of that contractnumber.

Hope you have any more suggestions on how to read the contractnumber of a pdf-file
(by the way, it is also possible to scan the files as jpg or TIF-files instead of pdf-files, but I think it is impossible to read a contractnumber from a jpg or tif-file).


BEST,
Leonard

I should have removed my post as soon as I realized that PDF files are not your normal text files (duh! ). In fact they are proprietary Adobe files.

You will need to find (use Google) an ActiveX component (either a DLL or OCX module) which can handle PDF files, specifically reading them. I found many for creating PDF but they are quite pricey ($$$).

Good luck.
another way is to convert pdf files to text for reading
you can download tools from 3rd party such as xpdf and others. after converting,you can read them using your script. If you have experience is other scripting languages such as Perl/Python,there may be modules for reading PDF files, you can try them also.










Discussion

No Comment Found