1.

Solve : Batch: convert text file to CSV?

Answer»

I have 11 plain text files which I want to open up in Excel.  Each file has a different record type and different record length with the following attributes:

fixed column length
no blank lines in the files
no delimiters
trailing spaces in some or all of the files

When I open up the files in Excel, I have to go through the tedious task of using the wizard to define the columns.  Would be nice to be able to convert the files into comma quoted ahead of time, that way Excel opens up with each column already defined.  Column headers are not necessary.

I am writing a batch to put double quotes, comma, and more double quotes in-between each of the columns, with one double quotes at the beginning and one double quotes at the end.  I have tested successfully with 7 columns so far.  I wanted to get your thoughts before continuing with the syntax for the rest of the columns.

Is there a limit to the number of bytes that can be included in one command/statement?  The file I have with the most number of columns to define is 50.  This is going to make the syntax extremely long.

Example input records from one file:
Code: [Select]AB012345AC    201399Long name                     55555198900006589673101N008500000008200979999999999900015294260999999999999999         
AB012345ABC   201399Another long name             11111194500000012023203N000313200001840909999999999900000260524999999999999999         
AB012345ABCD  201399three times                   33333199600003609819104N004265000008000979999999999900005823119999999999999999         

I want the output to be like this, for example:
Code: [Select]"AB","012345","AC    ","2013","99","Long name                     ","55555","1989","0000658967","31","01","N","00850000","00082","0097","999999","99999","0001529426","0","999999999999999","          "
"AB","012345","ABC   ","2013","99","Another long name             ","11111","1945","0000001202","32","03","N","00031320","00018","4090","999999","99999","0000026052","4","999999999999999","          "
"AB","012345","ABCD  ","2013","99","three times                   ","33333","1996","0000360981","91","04","N","00426500","00080","0097","999999","99999","0000582311","9","999999999999999","          "

Here is the batch code:
Code: [Select]echo off
setlocal EnableDelayedExpansion
cd/d c:\download
if ErrorLevel 1 (
cls
echo ---------------------------------------------
rem
echo ERROR  ERROR  ERROR
echo Folder "c:\download" not found
echo Press a key to close this window and stop the program.
rem
echo ---------------------------------------------
pause
goto FINISH
)

if exist test1.csv del test1.csv
set single_quote="
set quotes=","
for /f "delims=" %%A in (myfile1.TXT) do (
set partial=%%A
set newstring=%single_quote%!partial:~0,2!%quotes%!partial:~2,6!%quotes%!partial:~8,6!%quotes%!partial:~14,4!%quotes%!partial:~18,2!%quotes%!partial:~20,30!%quotes%!partial:~50,5!%single_quote%
echo !newstring!>> test1.csv
)
start "" test1.csv

:FINISH
exit

Note:  in the code I have defined only 7 columns so far, as stated above, whereas in the example input records I have listed the entire line of one of the input files.
Thanks in advance.Yes there is a character limit for the SET command.  I can't recall what it is.  I bet if you programmed up a SIMPLE FOR /L loop you could find out what it was. Thanks for your reply.  I'm racking my brain trying to figure out how the /L parameter will result in a long line of syntax for a command.  Can you please provide an example?This should hit the number.

Code: [Select]echo off
setlocal EnableDelayedExpansion
for /l %%A in (1,1,1000000000000000000) do (
set last=!a!
set a=!a!0
if "!last!"=="!a!" echo %%A
if "!last!"=="!a!" pause
)

EDIT: Warning it will take a LONG time.It's been running for 3.5 hours with nothing on the screen but the blinking cursor.  I am leaving work now, so I am pressing Ctrl+C.  I am not concerned about the output being too long; just the amount of characters it takes to make up/create a set command or any other command, like so:

Code: [Select]set newstring=%single_quote%!partial:~0,2!%quotes%!partial:~2,6!%quotes%!partial:~8,6!%quotes%!partial:~14,4!%quotes%!partial:~18,2!%quotes%!partial:~20,30!%quotes%!partial:~50,5!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%quotes%!partial:~20,30!%single_quote%

I will try the full code tomorrow.Well the FOR /L didn't need to be set to that high.  8,193 would have sufficed. Pardon me.  Did I miss something?
You said:
Quote

fixed column length
Do you mean fixed width?
If that is true, parsing is not even needed.
I am confused  because your sample implicates fixed width FIELDS.
Please clarify.
Thanks.Geek, what is confusing you about a fixed length field vs a fixed width field? They are one in the same in my world of data processing.Well, whee records and fields are fixed size , parsing is just using what is in the filed, n o need to find the end. But  CSV is often used to compress the size of files and to transport to other programs that do not have the field definitions.
Just wanting  to follow the logic. Pardon me.
Here is one method using a batch script and helper batch file:

This uses a helper batch file called `repl.bat` (by dbenham) - download from:  https://www.dropbox.com/s/qidqwztmetbvklt/repl.bat

Place `repl.bat` in the same folder as the batch file or in a folder that is on the path.

There are only three lines of code here:

Code: [Select]echo off
type "file.txt"|repl "^(.{2})(.{6})(.{6})(.{4})(.{2})(.{30})(.{5})(.{4})(.{10})" "\q$1\q,\q$2\q,\q$3\q,\q$4\q,\q$5\q,\q$6\q,\q$7\q,\q$8\q,\q$9\q," x |repl "^(.{96})(.{2})(.{2})(.{1})(.{8})(.{5})(.{4})(.{6})(.{5})" "$1\q$2\q,\q$3\q,\q$4\q,\q$5\q,\q$6\q,\q$7\q,\q$8\q,\q$9\q," x |repl "^(.{153})(.{10})(.{1})(.{15})(.*)" "$1\q$2\q,\q$3\q,\q$4\q,\q$5\q" x >file2.txt
pause

In the second and third invocations of repl in the same line, the first term in the search term - such as (.{96}) - is equal to the length of the modified output of the command before it, and then it continues with setting the field widths of the following terms.

If you remove the code, in the middle line from the second | and onwards - and run it, then you will see what the first repl command does and which may make it a little clearer.

It's a little clumsy in the construction with multiple calls of repl, but it is very robust and quick on large files, compared to plain batch code.
Dave Benham has just written Jrepl.bat at dostips.com which should do it in a much more straight forward way if you ask him there, and show him this code and your question.

Your input file
Code: [Select]AB012345AC    201399Long name                     55555198900006589673101N008500000008200979999999999900015294260999999999999999         
AB012345ABC   201399Another long name             11111194500000012023203N000313200001840909999999999900000260524999999999999999         
AB012345ABCD  201399three times                   33333199600003609819104N004265000008000979999999999900005823119999999999999999         
The output
Code: [Select]"AB","012345","AC    ","2013","99","Long name                     ","55555","1989","0000658967","31","01","N","00850000","00082","0097","999999","99999","0001529426","0","999999999999999","          "
"AB","012345","ABC   ","2013","99","Another long name             ","11111","1945","0000001202","32","03","N","00031320","00018","4090","999999","99999","0000026052","4","999999999999999","          "
"AB","012345","ABCD  ","2013","99","three times                   ","33333","1996","0000360981","91","04","N","00426500","00080","0097","999999","99999","0000582311","9","999999999999999","          "


If your lines do not contain any commas then this simplification can be used:

Code: [Select]echo off
type "file.txt"|repl "^(.{2})(.{6})(.{6})(.{4})(.{2})(.{30})(.{5})(.{4})(.{10})" "$1,$2,$3,$4,$5,$6,$7,$8,$9," |repl "^(.{77}).(.{2})(.{2})(.{1})(.{8})(.{5})(.{4})(.{6})(.{5})" "$1,$2,$3,$4,$5,$6,$7,$8,$9," |repl "^(.{118}).(.{10})(.{1})(.{15})(.*)" "\q$1,$2,$3,$4,$5\q" x |repl "," "\q,\q" x >file2.txt
pause

Where the terms (.{77}). and (.{118}). refer to the leading parts that were changed by the preceding repl commands, respectively.Hey foxidrive - new site for me

I don't understand why you made your code so complicated. The following long one liner works for me (I used line continuation):
Code: [Select]type myfile.txt|repl "^(..)(.{6})(.{6})(....)(..)(.{30})(.{5})(....)(.{10})(..)(..)(.)(.{8})(.{5})(....)(.{6})(.{5})(.{10})(.)(.{15})(.{10})" ^
"\q$1\q,\q$2\q,\q$3\q,\q$4\q,\q$5\q,\q$6\q,\q$7\q,\q$8\q,\q$9\q,\q$10\q,\q$11\q,\q$12\q,\q$13\q,\q$14\q,\q$15\q,\q$16\q,\q$17\q,\q$18\q,\q$19\q,\q$20\q,\q$21\q" x >test1.csv

You can use up to 99 captured sub-matches. This problem only requires 21.

Using the new JREPL.BAT is nearly identical. You can specify the input and output files as options:
Code: [Select]call jrepl "^(..)(.{6})(.{6})(....)(..)(.{30})(.{5})(....)(.{10})(..)(..)(.)(.{8})(.{5})(....)(.{6})(.{5})(.{10})(.)(.{15})(.{10})" ^
"\q$1\q,\q$2\q,\q$3\q,\q$4\q,\q$5\q,\q$6\q,\q$7\q,\q$8\q,\q$9\q,\q$10\q,\q$11\q,\q$12\q,\q$13\q,\q$14\q,\q$15\q,\q$16\q,\q$17\q,\q$18\q,\q$19\q,\q$20\q,\q$21\q" /x /f "myfile.txt" /o "test1.csv"

It is probably easier / more reliable to let code build the search and replace strings. Because this solution is passing the strings as variables, I am able to embed the quotes directly instead of using \q with the /X option.
Code: [Select]echo off
setlocal enableDelayedExpansion
set "search=^"
set "replace="
set /a n=0
for %%N in (2 6 6 4 2 30 5 4 10 2 2 1 8 5 4 6 5 10 1 15 10) do (
  set /a n+=1
  set "search=!search!(.{%%N})"
  set "replace=!replace!"$!n!","
)
set "replace=!replace:~0,-1!"
call jrepl search replace /v /f test.txt /o test1.csv


Dave Benham Quote from: dbenham on November 14, 2014, 09:48:50 PM
Hey foxidrive - new site for me

Nice to see you here Dave.

Quote
I don't understand why you made your code so complicated.
You can use up to 99 captured sub-matches. This problem only requires 21.

D'oh!  I thought it only worked up to 9 back-references, like GNUsed does!  Mea Culpa!  Thanks for the info.

Quote
It is probably easier / more reliable to let code build the search and replace strings. Because this solution is passing the strings as variables, I am able to embed the quotes directly instead of using \q with the /X option.
Code: [Select]echo off
setlocal enableDelayedExpansion
set "search=^"
set "replace="
set /a n=0
for %%N in (2 6 6 4 2 30 5 4 10 2 2 1 8 5 4 6 5 10 1 15 10) do (
  set /a n+=1
  set "search=!search!(.{%%N})"
  set "replace=!replace!"$!n!","
)
set "replace=!replace:~0,-1!"
call jrepl search replace /v /f test.txt /o test1.csv

And that's a clever way to clarify the field widths in the command.

Welcome Aboard dbenham ! ! Quote from: patio on November 15, 2014, 07:26:58 AM
Welcome Aboard dbenham ! !
DittoSorry for the long delay in replying.  I am doing this at work, and FRIDAY the forum server was down, Monday I was GIVEN a priority assignment for QA, and yesterday I was out of the office.

Thanks for all of your input.  I had seen previously references to repl.bat by dbenham, but I did not see how I could incorporate that logic into my situation.  I will give this code a try.


Discussion

No Comment Found