|
Answer» I have an existing batch process to which I need to add a routine which merges lines from 2 text files into a single file. I'm not looking to simply append one file to the other, but need to merge the files on a line-by-line basis.
I have a routine which works using a nested FOR loop, but it currently takes about 25 minutes to run due to inefficiencies with the algorithm and the large number of lines to be merged.
I've attached my current code as well as 2 sample input files and the output file that is generated from those input files.
I had been doing this previously with the Unix "join" command when the overall process was handled by a Unix script, but the whole process now has to run under Windows.
Basically, each input file contains a list of database tables along with each table's row count. Each line has the table name, a pipe ) delimiter, and the row count. The first file contains "before" row counts, while the 2nd file contains the "after" row counts. The current code successfully combines the counts for each table on a single row and also calculates the difference if the number of rows changed from "before" to "after", but it runs slowly.
I'd appreciate any suggestions on a better way to handle this within a batch process running under Windows Server 2003.Received MESSAGE saying upload folder was full. Here's what sample input and output files look like:
Input #1 -- pre_merge_table_counts.txt:
Table_1 | 5 Table_2 | 10 Table_3 | 50 Table_4 | 15 Table_5 | 1030 Table_6 | 520 Table_7 | 8040 Table_8 | 620 Table_9 | 75 Table_10 | 220 Table_11 | 2330 Table_12 | 710
Input # 2 -- post_merge_table_counts.txt:
Table_1 | 100 Table_2 | 200 Table_3 | 50 Table_4 | 15 Table_5 | 1030 Table_6 | 520 Table_7 | 8040 Table_8 | 620 Table_9 | 300 Table_10 | 400 Table_11 | 2330 Table_12 | 710
Output -- output.txt
TABLE COUNTS
Table_1 Before: 5 After: 100 Difference = 95 Table_2 Before: 10 After: 200 Difference = 190 Table_3 Before: 50 After: 50 Table_4 Before: 15 After: 15 Table_5 Before: 1030 After: 1030 Table_6 Before: 520 After: 520 Table_7 Before: 8040 After: 8040 Table_8 Before: 620 After: 620 Table_9 Before: 75 After: 300 Difference = 225 Table_10 Before: 220 After: 400 Difference = 180 Table_11 Before: 2330 After: 2330 Table_12 Before: 710 After: 710 Here's the code I'm currently using:
@echo off
setlocal enabledelayedexpansion
rem --------------------------------------------------------------- rem Procedure Generate_Report
:GENERATE_REPORT
echo. > output.txt echo TABLE COUNTS >> output.txt echo. >> output.txt echo. >> output.txt
Set Count=0 For /F "tokens=1,2* delims=|" %%A in (pre_merge_table_counts.txt) Do ( Set PRE_TABLE_NAME=%%A Set PRE_TABLE_SIZE=%%B Set /A COUNT+=1 Set COUNT2=0 For /F "tokens=1,2* delims=|" %%C in (post_merge_table_counts.txt) Do ( Set POST_TABLE_NAME=%%C Set POST_TABLE_SIZE=%%D Set /A COUNT2+=1 If !COUNT2!==!COUNT! ( If !PRE_TABLE_NAME!==!POST_TABLE_NAME! ( If !PRE_TABLE_SIZE!==!POST_TABLE_SIZE! ( echo !PRE_TABLE_NAME! Before: !PRE_TABLE_SIZE! After: !POST_TABLE_SIZE! >> output.txt ) else ( Set /A TABLE_DIFFERENCE=POST_TABLE_SIZE-PRE_TABLE_SIZE echo !PRE_TABLE_NAME! Before: !PRE_TABLE_SIZE! After: !POST_TABLE_SIZE! Difference = !TABLE_DIFFERENCE! >> output.txt ) ) else ( echo !PRE_TABLE_NAME! Before: !PRE_TABLE_SIZE! After: !POST_TABLE_NAME! !POST_TABLE_SIZE! >> output.txt ) ) ) rem if !Count!==10 goto TEMP_DONE_WITH_SMALL_NUMBER_OF_LINES )
:TEMP_DONE_WITH_SMALL_NUMBER_OF_LINES echo Done. pause
don't KNOW how big is your file, but you can try this vbscript: Code: [Select]Set objFSO = CreateObject("Scripting.FileSystemObject") myInputFile1 = "C:\temp\a.txt" myInputFile2 = "C:\temp\b.txt" Dim a1() Dim a2() Dim i,j,t1,t2 i=0 j=0 Set objFile1 = objFSO.OpenTextFile(myInputFile1,1) Do Until objFile1.AtEndOfStream line = objFile1.ReadLine ReDim Preserve a1(i) a1(i) = line i=i+1 Loop objFile1.Close Set objFile2 = objFSO.OpenTextFile(myInputFile2,1) Do Until objFile2.AtEndOfStream line = objFile2.ReadLine ReDim Preserve a2(j) a2(j) = line j=j+1 Loop objFile2.Close For k = LBound(a1) To UBound(a1) t1 = Split(a1(k),"|") t2 = Split(a2(k),"|") WScript.Echo t1(0) & " Before: " & t1(1) & " After: " & t2(1) & " difference = " & t2(1) - t1(1) Next
output: save as myscript.vbs and type: cscript /nologo myscript.vbs > outfile Code: [Select] Table_1 Before: 5 After: 100 difference = 95 Table_2 Before: 10 After: 200 difference = 190 Table_3 Before: 50 After: 50 difference = 0 Table_4 Before: 15 After: 15 difference = 0 Table_5 Before: 1030 After: 1030 difference = 0 Table_6 Before: 520 After: 520 difference = 0 Table_7 Before: 8040 After: 8040 difference = 0 Table_8 Before: 620 After: 620 difference = 0 Table_9 Before: 75 After: 300 difference = 225 Table_10 Before: 220 After: 400 difference = 180 Table_11 Before: 2330 After: 2330 difference = 0 Table_12 Before: 710 After: 710 difference = 0
Thanks - this works well. I tweaked it just slightly to add an "if" for the echo in the final loop so that the output will be more consistent with the original format where the "Difference" is only displayed if the counts are, indeed, different. (This makes it easier to locate the tables that actually changed when scanning through the lines of output.)
I replaced the original Echo line in the final For loop with the following:
If t1(1) = t2(1) Then WScript.Echo t1(0) & " Before: " & t1(1) & " After: " & t2(1) Else WScript.Echo t1(0) & " Before: " & t1(1) & " After: " & t2(1) & " Difference = " & t2(1) - t1(1) End IfThe vb script takes about 1 second to do what the FOR loop in the original batch took 25 minutes to do -- quite an improvement! (FYI -- there are over 2500 tables listed in each file.)
The original FOR loop had to repeatedly go through the second file from the first line to whatever line # is being PROCESSED for the first file, which is why it was so slow, as it had to keep reading lines over and over again from the 2nd file.
I'd still be interested to hear if anyone has an idea of how to accomplish this without needing to CALL an external file. Even if it took a minute or two to process within the batch process, that would be OK. 25 minutes is just too slow, though. Otherwise, the vb script certainly accomplishes the task, but seems to require the use of a second file.
Thanks for the help. Quote from: ranman65 on October 05, 2007, 09:17:20 AM Otherwise, the vb script certainly accomplishes the task, but seems to require the use of a second file.
which second file are you talking about?I've got a batch process (___.bat) which would then call the ___.vbs file, so the ___.vbs file is the second file.
If possible, I'd prefer to have all of the code contained within the original ___.bat file.Quote from: ranman65 on October 05, 2007, 09:54:58 AMI've got a batch process (___.bat) which would then call the ___.vbs file, so the ___.vbs file is the second file.
If possible, I'd prefer to have all of the code contained within the original ___.bat file.
create the vb script dynamically in your batch file. hint: using echo with REDIRECTION > then call the script as normal using cscript.
However, if that's not desired, pls wait for a pure batch solution.OK -- that works like a charm (using "^"s to set off the right parentheses and ampersands so they don't get misinterpretted when the batch job is processed)!
Thanks, again!
|