| 1. |
Solve : Batch: convert text file to CSV? |
|
Answer» I have 11 plain text files which I want to open up in Excel. Each file has a different record type and different record length with the following attributes: fixed column lengthDo you mean fixed width? If that is true, parsing is not even needed. I am confused because your sample implicates fixed width FIELDS. Please clarify. Thanks.Geek, what is confusing you about a fixed length field vs a fixed width field? They are one in the same in my world of data processing.Well, whee records and fields are fixed size , parsing is just using what is in the filed, n o need to find the end. But CSV is often used to compress the size of files and to transport to other programs that do not have the field definitions. Just wanting to follow the logic. Pardon me. Here is one method using a batch script and helper batch file: This uses a helper batch file called `repl.bat` (by dbenham) - download from: https://www.dropbox.com/s/qidqwztmetbvklt/repl.bat Place `repl.bat` in the same folder as the batch file or in a folder that is on the path. There are only three lines of code here: Code: [Select]echo off type "file.txt"|repl "^(.{2})(.{6})(.{6})(.{4})(.{2})(.{30})(.{5})(.{4})(.{10})" "\q$1\q,\q$2\q,\q$3\q,\q$4\q,\q$5\q,\q$6\q,\q$7\q,\q$8\q,\q$9\q," x |repl "^(.{96})(.{2})(.{2})(.{1})(.{8})(.{5})(.{4})(.{6})(.{5})" "$1\q$2\q,\q$3\q,\q$4\q,\q$5\q,\q$6\q,\q$7\q,\q$8\q,\q$9\q," x |repl "^(.{153})(.{10})(.{1})(.{15})(.*)" "$1\q$2\q,\q$3\q,\q$4\q,\q$5\q" x >file2.txt pause In the second and third invocations of repl in the same line, the first term in the search term - such as (.{96}) - is equal to the length of the modified output of the command before it, and then it continues with setting the field widths of the following terms. If you remove the code, in the middle line from the second | and onwards - and run it, then you will see what the first repl command does and which may make it a little clearer. It's a little clumsy in the construction with multiple calls of repl, but it is very robust and quick on large files, compared to plain batch code. Dave Benham has just written Jrepl.bat at dostips.com which should do it in a much more straight forward way if you ask him there, and show him this code and your question. Your input file Code: [Select]AB012345AC 201399Long name 55555198900006589673101N008500000008200979999999999900015294260999999999999999 AB012345ABC 201399Another long name 11111194500000012023203N000313200001840909999999999900000260524999999999999999 AB012345ABCD 201399three times 33333199600003609819104N004265000008000979999999999900005823119999999999999999 The output Code: [Select]"AB","012345","AC ","2013","99","Long name ","55555","1989","0000658967","31","01","N","00850000","00082","0097","999999","99999","0001529426","0","999999999999999"," " "AB","012345","ABC ","2013","99","Another long name ","11111","1945","0000001202","32","03","N","00031320","00018","4090","999999","99999","0000026052","4","999999999999999"," " "AB","012345","ABCD ","2013","99","three times ","33333","1996","0000360981","91","04","N","00426500","00080","0097","999999","99999","0000582311","9","999999999999999"," " If your lines do not contain any commas then this simplification can be used: Code: [Select]echo off type "file.txt"|repl "^(.{2})(.{6})(.{6})(.{4})(.{2})(.{30})(.{5})(.{4})(.{10})" "$1,$2,$3,$4,$5,$6,$7,$8,$9," |repl "^(.{77}).(.{2})(.{2})(.{1})(.{8})(.{5})(.{4})(.{6})(.{5})" "$1,$2,$3,$4,$5,$6,$7,$8,$9," |repl "^(.{118}).(.{10})(.{1})(.{15})(.*)" "\q$1,$2,$3,$4,$5\q" x |repl "," "\q,\q" x >file2.txt pause Where the terms (.{77}). and (.{118}). refer to the leading parts that were changed by the preceding repl commands, respectively.Hey foxidrive - new site for me I don't understand why you made your code so complicated. The following long one liner works for me (I used line continuation): Code: [Select]type myfile.txt|repl "^(..)(.{6})(.{6})(....)(..)(.{30})(.{5})(....)(.{10})(..)(..)(.)(.{8})(.{5})(....)(.{6})(.{5})(.{10})(.)(.{15})(.{10})" ^ "\q$1\q,\q$2\q,\q$3\q,\q$4\q,\q$5\q,\q$6\q,\q$7\q,\q$8\q,\q$9\q,\q$10\q,\q$11\q,\q$12\q,\q$13\q,\q$14\q,\q$15\q,\q$16\q,\q$17\q,\q$18\q,\q$19\q,\q$20\q,\q$21\q" x >test1.csv You can use up to 99 captured sub-matches. This problem only requires 21. Using the new JREPL.BAT is nearly identical. You can specify the input and output files as options: Code: [Select]call jrepl "^(..)(.{6})(.{6})(....)(..)(.{30})(.{5})(....)(.{10})(..)(..)(.)(.{8})(.{5})(....)(.{6})(.{5})(.{10})(.)(.{15})(.{10})" ^ "\q$1\q,\q$2\q,\q$3\q,\q$4\q,\q$5\q,\q$6\q,\q$7\q,\q$8\q,\q$9\q,\q$10\q,\q$11\q,\q$12\q,\q$13\q,\q$14\q,\q$15\q,\q$16\q,\q$17\q,\q$18\q,\q$19\q,\q$20\q,\q$21\q" /x /f "myfile.txt" /o "test1.csv" It is probably easier / more reliable to let code build the search and replace strings. Because this solution is passing the strings as variables, I am able to embed the quotes directly instead of using \q with the /X option. Code: [Select]echo off setlocal enableDelayedExpansion set "search=^" set "replace=" set /a n=0 for %%N in (2 6 6 4 2 30 5 4 10 2 2 1 8 5 4 6 5 10 1 15 10) do ( set /a n+=1 set "search=!search!(.{%%N})" set "replace=!replace!"$!n!"," ) set "replace=!replace:~0,-1!" call jrepl search replace /v /f test.txt /o test1.csv Dave Benham Quote from: dbenham on November 14, 2014, 09:48:50 PM Hey foxidrive - new site for me Nice to see you here Dave. Quote I don't understand why you made your code so complicated. D'oh! I thought it only worked up to 9 back-references, like GNUsed does! Mea Culpa! Thanks for the info. Quote It is probably easier / more reliable to let code build the search and replace strings. Because this solution is passing the strings as variables, I am able to embed the quotes directly instead of using \q with the /X option. And that's a clever way to clarify the field widths in the command. Welcome Aboard dbenham ! ! Quote from: patio on November 15, 2014, 07:26:58 AM Welcome Aboard dbenham ! !DittoSorry for the long delay in replying. I am doing this at work, and FRIDAY the forum server was down, Monday I was GIVEN a priority assignment for QA, and yesterday I was out of the office. Thanks for all of your input. I had seen previously references to repl.bat by dbenham, but I did not see how I could incorporate that logic into my situation. I will give this code a try. |
|