1.

Solve : findstr get different offset of the same string?

Answer»

Below is my batch script.

CODE: [Select]@echo off
rem character offset in command line
(echo TestStr&echo.)|findstr /o ".*"

rem character offset in file
echo TestStr>a.txt
echo.>>a.txt
findstr /o ".*" a.txt
I used the same string TestStr. Why return different offset? One is 10 while the other one is 9. See below.

Quote

C:\Test>test.bat
0:TestStr
10:
0:TestStr
9:

Anybody can help me out?

------------------------------------------------------------------------------

[Update 2009-04-20 07:30:20 AM] I FIND a new clue. Save below code as a batch file.

Code: [Select](echo TestStr&echo.)|findstr /o ".*"
pause
Run it, get below result.

Quote
C:\Test>(echo TestStr & echo.) | findstr /o ".*"
0:TestStr
10:

C:\Test>pause
Press any key to continue . . .

One strange thing should be pointed out. The Command Line Interpreter adds 2 blanks before &. But generally, the command separator is only 1 blank.

To see the duplicate blank in another way, let's run below command.

Code: [Select](echo TestStr&echo.)|findstr /o ".*">a.txt
We'll find there is a blank at the end of TestStr in the file a.txt.

Why the Interpreter adds 2 blanks before &? Need help for further investigation.
what is the actual problem you are trying to solve?Quote from: gh0std0g74 on April 19, 2009, 08:52:26 PM
what is the actual problem you are trying to solve?
Thanks for your quick reply.

I find this question when calculating the length of a string. Actually, I know how to get the length of a string. Here we go.

Code: [Select]@echo off
rem character offset in command line
for /f "tokens=1 delims=:" %%a in ('^(echo TestStr^&echo.^)^|findstr /o ".*"') do (
set /a StrLen1=%%a-3
)
echo %StrLen1%

rem character offset in file
echo TestStr>a.txt
echo.>>a.txt
for /f "tokens=1 delims=:" %%a in ('findstr /o ".*" a.txt') do (
set /a StrLen2=%%a-2
)
echo %StrLen2%
I just interested in why the offsets are different, as mentioned in the topic.there is this invisible character for each new line, costing 2 bytes. - 0x0d 0x0a

Code: [Select]C:\>(echo teststr&echo.)>a.txt

C:\>dir a.txt|find "a.txt"
20/04/2009 11:59 11 a.txt

11 bytes - (2bytes*2lines) = 7 character
Code: [Select]C:\>findstr/on ".*" a.txt
1:0:teststr
2:9:

offset 9 - (2bytes*(2-1)lines) = 7 character (disregard the CrLf of last line)

a.txt contents:
Code: [Select]C:\>(echo.d100 10a & echo.q)|debug a.txt
-d100 10a
0B20:0100 74 65 73 74 73 74 72 0D-0A 0D 0A teststr....
-q
echo + findstr
Code: [Select]C:\>(echo.teststr&echo.&echo hello)|findstr/on ".*"
1:0:teststr
2:10:
3:13:hellofor the above code, i've no idea why each new line cost 3 bytes Quote
there is this invisible character for each new line, costing 2 bytes. - 0x0d 0x0a
Good Grief! we have traveled in time back to 1979 when a programmer discovers that Intel® systems differ from UNIX system. Intel® uses two chars to end a line because of the prevalence of Model 23 TeleType® machines then being used for development on 8080 CPUs using Isis II. nowadays UNIX systems can run on Intel machines. Now It's an OS specific NEWLINE character that has been discussed before.

Windows/DOS: CRLF (ASCII 13, ASCII 10)
UNIX/LINUX: LF (ASCII 10)
MAC: CR (ASCII 13)In 1979 Intel and others provided 8080 development systems that were not UNIX and used the two chars to end a line. You could send a text file to the TeleType machine.Can't do hat with just a line feed. And in Intel based system it has been that way, because Intel said so, not because of anything about the CPU. And years later when MANY started to use UNIX on Intel CPUs it confused the programmers. Not the fault of the CPU, it was not the Intel way. It was a question of what system you learned on. And old habits die hard. We still use two chars to end the line, but I doubt very much anybody is using any teletype machines nowadays. Mine worked better If I set up three nulls after Lin feed. The output thing would send nulls, if told to do so.
I don't miss it one bit. Excuse the pun.Quote from: Reno on April 19, 2009, 11:02:58 PM
there is this invisible character for each new line, costing 2 bytes. - 0x0d 0x0a
...
for the above code, i've no idea why each new line cost 3 bytes

Yes, I knew the eol (end of line) comment character 0D (CR) 0A (LF).

Why it costs 3 bytes in my first scenario? Let's wait for the answer together : )Three bytes? in the command line there is a space in front of the first item. Could that be it?Quote from: Geek-9pm on April 20, 2009, 01:22:35 AM
Three bytes? in the command line there is a space in front of the first item. Could that be it?

I'm afraid not. Because I find a new clue. Save below code as a batch file.

Code: [Select](echo TestStr&echo.)|findstr /o ".*"
pause
Run it, get below result.

Quote
C:\Test>(echo TestStr & echo.) | findstr /o ".*"
0:TestStr
10:

C:\Test>pause
Press any key to continue . . .

One strange thing should be pointed out. The Command Line Interpreter adds 2 blanks before &. But generally, the command separator is only 1 blank.

To see the duplicate blank in another way, let's run below command.

Code: [Select](echo TestStr&echo.)|findstr /o ".*">a.txt
We'll find there is a blank at the end of TestStr in the file a.txt.

Why the Interpreter adds 2 blanks before &? Need help for further investigation.

I am going to update above info in the topic.


Discussion

No Comment Found