InterviewSolution
Saved Bookmarks
| 1. |
Solve : MSDOS Bat File program to Copy the Next Line if Findstr Line is True? |
|
Answer» Quote from: briandams on December 28, 2013, 07:25:26 PM you are using type + for loop? or other methods? how about timing? what are you using to time the script?Windows Resource kit comes with a program called TIMEIT. Works quite well.Quote from: briandams on December 28, 2013, 07:33:52 PM its almost a fact that lower level (in the likes of C/C++ ) and how its coded (algorithm) to read big files play a part in performance. The latter far more so than the former, even in this case. C# is compiled to IL (Intermediate Language) and is subsequently run on the .NET CLR. I Created a 8-million and 1 line file consisting of "this is an example line of text, line #" where was of course the current iteration, starting from 0. The resulting size of the file was 382,888,890 Bytes. A C# Program that simply reads it in and counts lines: Code: [Select] class Program { static void Main(string[] args) { Stopwatch watch = new Stopwatch(); watch.Start(); int linecount = 0; using(StreamReader sw = new StreamReader("D:\\testoutput.txt")) { while (!sw.EndOfStream) { String currentline = sw.ReadLine(); linecount++; } } watch.Stop(); Console.WriteLine("Finished. Total Time:" + watch.Elapsed.ToString() + ", read " + linecount + " Lines."); Console.ReadKey(); } } Output: Code: [Select]Finished. Total Time:00:00:07.8800219, read 8000000 Lines. About 8 seconds to process the entire file. My VBScript is a bit rusty but I came up with this: Code: [Select]DIM FSO Dim TStream Dim StringRead,CurrentLine Dim StartTime,EndTime Set FSO = CreateObject("Scripting.FileSystemObject") Set TStream = FSO.OpenTextFile("D:\testoutput.txt") StartTime = Timer Do While Not TStream.AtEndOfStream StringRead = TStream.ReadLine() CurrentLine = CurrentLine + 1 Loop TStream.Close() EndTime = Timer WScript.echo EndTime-StartTime Which should be functionally similar. It gave me this back: Code: [Select]47.14844 So the first thought would be that this extra time must be because VBScript is interpreted. However, I'm not entirely certain this is the case. And this suspicion is proven on some level by inserting the same code into a Visual Basic 6 project. Visual Basic 6 supports compiling to Native code. Doing so yields a time of 55 seconds- almost 10 seconds slower than VBScript. Interestingly, having it compile to P-code, (an intermediate language of sorts) resulting in the program finishing a few seconds faster (53.2 seconds). For VBScript, all Variables use a 'Variant' Data type. This effectively means that any access or assignment to a variable needs to package and unpackage a OLE_VARIANT structure (internally, of course). Additionally, VBScript is Late-bound, which means that it's access to COM objects (such as the File System Object) are all performed using IDispatch. suffice it to say that this is much slower than an Early Bound call; and pretty much means it has to lookup the method name each time it's used. In this case, that's a problem since there is both Variable access (incrementing the line count) as well as late-bound Method CALLS (both the termination expression as well as the actual ReadLine() method call) being done within the loop body. Within Visual Basic 6, I made two changes- I referenced the Scripting Runtime (allowing Early Bound calls), and made all variables strongly typed. this reduced processing time to 28.6. Still not as fast as C#; but, the thing is that C# is always interpreted at the IL level, and in this case Visual Basic 6 is compiling to Native code, so clearly "lower-level" doesn't translate directly to faster performance. In this case the C# version is faster simply because the Interpreter is able to use new PROCESSOR features and run in Long mode (rather than the 32-bit WoW), and that would end up changing what the native code output by the Jitter contains. Visual Basic 6 has a Native code compiler but it will always optimize for a Pentium. Even enabling all advanced optimizations and the "favour Pentium Pro" option didn't allow it to run faster than about 26 seconds. You might think this is related to Visual Basic itself. This appears partly true. Using Visual Studio 2013 and C++ and the following code, with all optimizations set to full and Release: Code: [Select]#include <string> #include <algorithm> #include <vector> #include <hash_map> #include <iostream> #include <fstream> #include <ctime> using namespace std; int _tmain(int argc, _TCHAR* argv[]) { string line; int linecount = 0; ifstream myfile; myfile.open("D:\\testoutput.txt"); cout << "processing..." << ENDL; clock_t startTime = clock(); while (myfile.good()){ getline(myfile, line); linecount++; } cout << double (clock() - startTime)/CLOCKS_PER_SEC << " seconds." << endl; //cout << (double)(clock() – startTime) / (double) CLOCKS_PER_SEC << " seconds." << endl; cout << "Finished." << endl; cout << "processed " << linecount << " lines."; int test; cin >> test; } resulted in this output: Code: [Select]processing... 9.535 seconds. Finished. processed 8000001 lines. (This was with ALL optimizations set to full and for speed (/Ox, /Ot)). The only thing I can think of that ACCOUNTS for the small difference would be that the C# program ran in native 64-bit Mode, whereas the C++ is only compiling to 32-bit (by default), but switching the C++ program to x64 caused it to take about twice as long to complete. My guess as to why it's slower than C# in this case would have to be the ifstream library. Quote from: briandams on December 28, 2013, 07:25:26 PM you are using type + for loop? First batch method @echo off set Tfile="%1" setlocal enabledelayedexpansion set line=1 echo %Tfile% echo %date% %time% for /f "delims=" %%L in ('type "%Tfile%"') do ( set /a line+=1 echo %%L | find "Trigger" >nul && goto found ) :found echo Found previous echo %date% %time% echo Line %line% set /a sk=%line%-1 for /f "skip=%sk% delims=" %%L in ('type %Tfile%') do ( echo %%L goto done ) :done echo %date% %time% Quote or other methods? Second batch method @echo off set Tfile="%1" echo %Tfile% echo FIND start %date% %time% for /f "delims=[] tokens=1*" %%A in ('find /N "Trigger" %Tfile% ^| find "Trigger"') do set triggerlinenumber=%%A echo FIND end %date% %time% echo FOR /F start %date% %time% for /f "skip=%triggerlinenumber% delims=" %%L in ('type %Tfile%') do ( echo %%L goto done ) :done echo FOR /F end %date% %time% VBScript: CONST ForReading = 1 strTextFile = wscript.arguments(0) wscript.echo strTextFile Set objFSO = CreateObject("Scripting.FileSystemObject") start = Timer strData = objFSO.OpenTextFile(strTextFile,ForReading).ReadAll wscript.echo "Read file " & formatnumber(Timer-start, 4, True) & " secs" start = Timer arrLines = Split(strData,vbCrLf) wscript.echo "Split array " & formatnumber(Timer-start, 4, True) & " secs" start = Timer iArrayIndex=0 Do While iArrayIndex <= Ubound(arrLines) if instr(arrLines(iArrayIndex), "Trigger") > 0 then Exit Do End If iArrayIndex = iArrayIndex + 1 Loop wscript.echo "Find line " & formatnumber(Timer-start, 4, True) & " secs" wscript.Echo "Wanted line " & arrLines(iArrayIndex+1) Quote how about timing? what are you using to time the script? Batch: echo Start %date% %time% [Command] echo End %date% %time% VBScript: start = Timer [Code lines] wscript.echo "Elapsed " & formatnumber(Timer-start, 4, True) & " secs" |
|