1.

Solve : Sort a Phone List by Last Name.?

Answer»

Quote from: Squashman on January 07, 2018, 03:48:34 PM

Can you do a quick TEST with the Powershell code I posted. I think it will be around the same time as Dave's Jsort.

88,800 names
input: sorted Z-A
output: sorted A-Z
GNU sort 0.40 sec
Benham jsort 6.13 sec
Powershell 8.49 sec
Batch method 664.99 sec (11 min 4.99 sec)


Technically this is a Powershell one-iner, but I broke it down to 4 physical lines for readability. If you type this at the Powershell command prompt, just keep typing when the line wraps.

Code: [Select](Get-Content .\PHONE.txt) -replace '(.*?\d{3})\s(.*?)', '$1-$2' |
ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone |
Sort-Object Last |
Format-Table * -AutoSize

You can change the path and file name in the first line.


Technically this is a Powershell one-liner. I broke it into 4 physical lines for readability. If you do type this in a Powershell window, type all 4 lines as a single line and just keep typing when the line wraps. The interpreter will understand.

Code: [Select](Get-Content .\phone.txt) -replace '(.*?\d{3})\s(.*?)', '$1-$2' |
ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone |
Sort-Object Last |
Format-Table * -AutoSize

The path and file name can be changed as needed.

It did it, but it ECHOED the sorted output to the console.

Powershell has cmdlets for outputting to a file, however in this case redirection might be the simpler way to go.

Code: [Select](Get-Content .\Phone.txt) -replace '(.*?\d{3})\s(.*?)', '$1-$2' |
ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone |
Sort-Object Last |
Format-Table * -AutoSize -HideTableHeaders > .\Phone.new

If the preference is to have the headers in the output file, remove the -HideTableHeaders parameter from the Format-Table cmdlet.

Added a timer:

Code: [Select]$t = Measure-Command {
(Get-Content .\Notabs_names-rev-sorted.names.txt) -replace '(.*?\d{3})\s(.*?)', '$1-$2' |
ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone |
Sort-Object Last |
Format-Table * -AutoSize > out.txt
}
echo "Time: $t"
Result:

Code: [Select]Time: 00:00:13.2834694
The script is clearly doing more work than just sorting: for example it is justifying the columns (input file: 2.6 MB output file: 7.8 MB)

Python

88,800 names
0.138 seconds!

Code: [Select]python sortfile.py > sorted.txt
2018-01-10 21:24:18.494000
2018-01-10 21:24:18.632000

Code: [Select]from __future__ import print_function
from datetime import datetime
import csv
import operator

tstart = datetime.now()
reader = csv.reader(open("Notabs_names-rev-sorted.names.txt"), delimiter=" ")

for line in sorted(reader, key=operator.itemgetter(1)):
print(" " . join(line))

tend = datetime.now()
print (tstart)
print (tend)Better Python (27)

Code: [Select]from __future__ import print_function
from datetime import datetime
import csv
import operator
tstart = datetime.now()
f = open('output.txt', 'w')
reader = csv.reader(open("input.txt"), delimiter=" ")
for line in sorted(reader, key=operator.itemgetter(1)):
print(" " . join(line), file=f)
f.close()
tend = datetime.now()
print ("Elapsed", tend - tstart, "seconds")
88,800 names, 5 runs:

Code: [Select]Elapsed 0:00:00.172000 seconds
Elapsed 0:00:00.156000 seconds
Elapsed 0:00:00.172000 seconds
Elapsed 0:00:00.157000 seconds
Elapsed 0:00:00.172000 seconds
Here are 88,800 names sorted randomly:



[attachment deleted by admin to conserve SPACE]Here's the other, 88,800 names sorted alphabetically by column (2) in reverse ORDER. I notice that the sorted file compresses better.




[attachment deleted by admin to conserve space]It's quicker to sort the reverse-sorted file than the randomly sorted file.

Code: [Select]reverse 0:00:00.156000 seconds
random 0:00:00.265000 seconds


Discussion

No Comment Found