1.

Solve : Batch File To Extract String Of Characters From Txt-File?

Answer»

Hi,

I hope someone here can help me--it would be greatly appreciated. 

I have a folder with a bunch of text files. I'm looking for a batch-file that extracts a string of exactly 5000 characters from all of these and puts them in a seperate file (1 for each). The place within each txt-file that the 5000 characters are taken from should be random (ie. if I run it a couple of times on the same original file, the extracted bit would be different). Afterwards, the original file should no longer contain this string.

So I have:
filename_1.txt
filename_2.txt
filename_3.txt
...

And I want:
filename_1.txt and filename_1_5000.txt
filename_2.txt and filename_2_5000.txt
filename_3.txt and filename_3_5000.txt
...

etc.
Does that make sense? Can anyone help me with this?

Thanks!  It does help if you can give a reason why.
Why 5000 characters?
There already are utilities that will truncate a file after a definite number of characters. People WRITE these things either in C or a  script languages.
Of course, you can USE DOS utilities and batch, if you want. Quote from: Geek-9pm on May 05, 2015, 10:39:02 PM

There already are utilities that will truncate a file after a definite number of characters.

The questions seems to say: take 5000 characters from a random place in a file, remove it and put it in another file.

He's not trying to truncate files as I understand it.

Quote from: Stefan on May 05, 2015, 09:42:22 PM
Does that make sense? Can anyone help me with this?

  • Will the 5000 characters go over different lines?
  • Does the character count include carriage return and linefeed characters?
Thanks you two,

yes that's what I'm trying to do, foxidrive. So the general idea is that I have a bunch of texts and want to run tests on even-sized chunks of those to see if I can identify the author. (I would do strings of 4000, 3000, 2000, 1000 and 500 next).

So the texts I have do not have line-breaks in them... but I'm not entirely sure how that would be different from "carriage return and linefeed characters" (I'm really new to this--sorry).

Thanks!  OK, now it is clear.
Some texts have both the CR and the LF controls. Others have just one of the two.
But now that we see what you want, the CR LF is not important. However, a CR might be used to  end a paragraph.

Foxdrive might help do this in batch.
Myself, I would have to use Q Basic, an old MS program.


Thanks!

Anything would be fine with me. This will help me write a paper--and should it get published, I'll thank you two in it!

What about white space? Is thi some kind of word frequency or vocabulary profile analysis? I have read posts by David Graves who has analysed works by JANE Austen, but I don't know what his software is like (he wrote some himself). You can get lots of books in text format at Project Gutenberg, and you could load one into a good editor which has word and character count (e.g. MS Word) and select blocks of the right length. Here are 5000 characters (not counting spaces) from "The Sleeper Wakes" by HG Wells. I would be inclined to use a more capable language than batch (Visual Basic Script for example)

CHAPTER II. THE TRANCE

The state of cataleptic rigour into which this man had fallen, lasted
for an unprecedented length of time, and then he passed slowly to the
flaccid state, to a lax attitude suggestive of profound repose. Then it
was his eyes could be closed.

He was removed from the hotel to the Boscastle surgery, and from the
surgery, after some weeks, to London. But he still resisted every
attempt at reanimation. After a time, for reasons that will appear
later, these attempts were discontinued. For a great space he lay in
that STRANGE condition, inert and still neither dead nor living but, as
it were, suspended, hanging midway between nothingness and existence.
His was a darkness unbroken by a ray of thought or sensation, a
dreamless inanition, a vast space of peace. The tumult of his mind had
swelled and risen to an abrupt climax of silence. Where was the man?
Where is any man when insensibility takes hold of him?

"It seems only yesterday," said Isbister. "I remember it all as
though it happened yesterday--clearer perhaps, than if it had happened
yesterday."

It was the Isbister of the last chapter, but he was no longer a
young man. The hair that had been brown and a trifle in excess of the
fashionable length, was iron grey and clipped close, and the face that
had been pink and white was buff and ruddy. He had a pointed beard shot
with grey. He talked to an elderly man who wore a summer suit of drill
(the summer of that year was unusually hot). This was Warming, a London
solicitor and next of kin to Graham, the man who had fallen into the
trance. And the two men stood side by side in a room in a house in
London regarding his recumbent figure.

It was a yellow figure lying lax upon a water-bed and clad in a flowing
shirt, a figure with a shrunken face and a stubby beard, lean limbs and
lank nails, and about it was a case of thin glass. This glass seemed
to mark off the sleeper from the reality of life about him, he was a
thing apart, a strange, isolated abnormality. The two men stood close to
the glass, peering in.

"The thing gave me a shock," said Isbister "I feel a queer sort of
surprise even now when I think of his white eyes. They were white, you
know, rolled up. Coming here again brings it all back to me.

"Have you never seen him since that time?" asked Warming.

"Often wanted to come," said Isbister; "but business nowadays is too
serious a thing for much holiday keeping. I've been in America most of
the time."

"If I remember rightly," said Warming, "you were an artist?"

"Was. And then I became a married man. I saw it was all up with black
and white, very soon--at least for a mediocre man, and I jumped on to
process. Those posters on the Cliffs at Dover are by my people."

"Good posters," admitted the solicitor, "though I was sorry to see them
there."

"Last as long as the cliffs, if necessary," exclaimed Isbister with
satisfaction. "The world changes. When he fell asleep, twenty years
ago, I was down at Boscastle with a box of water-colours and a noble,
old-fashioned ambition. I didn't expect that some day my pigments would
glorify the whole blessed coast of England, from Land's End round again
to the Lizard. Luck comes to a man very often when he's not looking."

Warming seemed to doubt the quality of the luck. "I just missed seeing
you, if I recollect aright."

"You came back by the trap that took me to Camelford railway station.
It was close on the Jubilee, Victoria's Jubilee, because I remember the
seats and flags in Westminster, and the row with the cabman at Chelsea."

"The Diamond Jubilee, it was," said Warming; "the second one."

"Ah, yes! At the proper Jubilee--the Fifty Year affair--I was down at
Wookey--a boy. I missed all that.... What a fuss we had with him! My
landlady wouldn't take him in, wouldn't let him stay--he looked so queer
when he was rigid. We had to carry him in a chair up to the hotel. And
the Boscastle doctor--it wasn't the present chap, but the G.P. before
him--was at him until nearly two, with, me and the landlord holding
lights and so forth."

"It was a cataleptic rigour at first, wasn't it?"

"Stiff!--wherever you bent him he stuck. You might have stood him on
his head and he'd have stopped. I never saw such stiffness. Of course
this"--he indicated the prostrate figure by a movement of his head--"is
quite different. And, of course, the little doctor--what was his name?"

"Smithers?"

"Smithers it was--was quite wrong in trying to fetch him round too soon,
according to all accounts. The things he did. Even now it makes me feel
all--ugh! Mustard, snuff, pricking. And one of those beastly little
things, not dynamos--"

"Induction coils."

"Yes. You could see his muscles throb and jump, and he twisted about.
There was just two flaring yellow candles, and all the shadows were
shivering, and the little doctor nervous and putting on side, and
him--stark and squirming in the most unnatural ways. Well, it made me
dream."

Pause.

"It's a strange state," said Warming.

"It's a sort of complete absence," said Isbister.

"Here's the body, empty. Not dead a bit, and yet not alive. It's like a
seat vacant and marked 'engaged.' No feeling, no digestion, no beating
of the HEART--not a flutter. _That_ doesn't make me feel as if there was
a man present. In a sense it's more dead than death, for these doctors
tell me that even the hair has stopped growing. Now with the proper
dead, the hair will go on growing--"

"I know," said Warming, with a flash of pain in his expression.

They peered through the glass again. Graham was indeed in a strange
state, in the flaccid phase of a trance, but a trance unprecedented in
medical history. Trances had lasted for as much as a year before--but at
the end of that time it had ever been waking or a death; sometimes first
one and then the other. Isbister noted the marks the physicians had
made in injecting nourishment, for that device had been resorted to to
postpone collapse; he pointed them out to Warming, who had been trying
not to see them.

"And while he has been lying here," said Isbister, with the zest of a
life freely spent, "I have changed


Hi,

thanks, Salmon Trout--yes that was my plan (and I'm using r for that). I've been doing the text-selecting manually so far but it's just an impossible work-load. I have fifty texts, would like to get strings of 5000, 4000, 3000, 2000 1000 and 500 for each and run it 100 times with chaning lists of most frequent words--so that would be impossible to do this way.

And I've been taking out paragraphing so far--but it looks like it doesn't make a difference for the r-plugin I'm using (same results with or without paragraphs and additional spaces).

But thanks for the tip! Quote from: Stefan on May 06, 2015, 09:12:23 AM
So the texts I have do not have line-breaks in them... but I'm not entirely sure how that would be different from "carriage return and linefeed characters"

In a Windows text file at the end of each line are two invisible characters - the CR/LF pair.
Viewing the file in a hex editer/viewer will show you the characters. 

Linux and Mac native text files have only one character ending each line. 
Linux has LF and Mac has CR if I recall correctly.

Quote from: Salmon Trout on May 06, 2015, 01:19:04 PM
I would be inclined to use a more capable language than batch (Visual Basic Script for example)

This is a sensible option to do it in, with native VBS in Windows, and it is still scriptable in a batch file.


Discussion

No Comment Found