Solve : Sort a Phone List by Last Name.?

1.	Solve : Sort a Phone List by Last Name.?
Answer» Quote from: Squashman on January 07, 2018, 03:48:34 PM Can you do a quick TEST with the Powershell code I posted. I think it will be around the same time as Dave's Jsort. 88,800 names input: sorted Z-A output: sorted A-Z GNU sort 0.40 sec Benham jsort 6.13 sec Powershell 8.49 sec Batch method 664.99 sec (11 min 4.99 sec) Technically this is a Powershell one-iner, but I broke it down to 4 physical lines for readability. If you type this at the Powershell command prompt, just keep typing when the line wraps. Code: [Select](Get-Content .\PHONE.txt) -replace '(.?\d{3})\s(.?)', '$1-$2' \| ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone \| Sort-Object Last \| Format-Table * -AutoSize You can change the path and file name in the first line. Technically this is a Powershell one-liner. I broke it into 4 physical lines for readability. If you do type this in a Powershell window, type all 4 lines as a single line and just keep typing when the line wraps. The interpreter will understand. Code: [Select](Get-Content .\phone.txt) -replace '(.?\d{3})\s(.?)', '$1-$2' \| ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone \| Sort-Object Last \| Format-Table * -AutoSize The path and file name can be changed as needed. It did it, but it ECHOED the sorted output to the console. Powershell has cmdlets for outputting to a file, however in this case redirection might be the simpler way to go. Code: [Select](Get-Content .\Phone.txt) -replace '(.?\d{3})\s(.?)', '$1-$2' \| ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone \| Sort-Object Last \| Format-Table * -AutoSize -HideTableHeaders > .\Phone.new If the preference is to have the headers in the output file, remove the -HideTableHeaders parameter from the Format-Table cmdlet. Added a timer: Code: [Select]$t = Measure-Command { (Get-Content .\Notabs_names-rev-sorted.names.txt) -replace '(.?\d{3})\s(.?)', '$1-$2' \| ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone \| Sort-Object Last \| Format-Table * -AutoSize > out.txt } echo "Time: $t" Result: Code: [Select]Time: 00:00:13.2834694 The script is clearly doing more work than just sorting: for example it is justifying the columns (input file: 2.6 MB output file: 7.8 MB) Python 88,800 names 0.138 seconds! Code: [Select]python sortfile.py > sorted.txt 2018-01-10 21:24:18.494000 2018-01-10 21:24:18.632000 Code: [Select]from __future__ import print_function from datetime import datetime import csv import operator tstart = datetime.now() reader = csv.reader(open("Notabs_names-rev-sorted.names.txt"), delimiter=" ") for line in sorted(reader, key=operator.itemgetter(1)): print(" " . join(line)) tend = datetime.now() print (tstart) print (tend)Better Python (27) Code: [Select]from __future__ import print_function from datetime import datetime import csv import operator tstart = datetime.now() f = open('output.txt', 'w') reader = csv.reader(open("input.txt"), delimiter=" ") for line in sorted(reader, key=operator.itemgetter(1)): print(" " . join(line), file=f) f.close() tend = datetime.now() print ("Elapsed", tend - tstart, "seconds") 88,800 names, 5 runs: Code: [Select]Elapsed 0:00:00.172000 seconds Elapsed 0:00:00.156000 seconds Elapsed 0:00:00.172000 seconds Elapsed 0:00:00.157000 seconds Elapsed 0:00:00.172000 seconds Here are 88,800 names sorted randomly: [attachment deleted by admin to conserve SPACE]Here's the other, 88,800 names sorted alphabetically by column (2) in reverse ORDER. I notice that the sorted file compresses better. [attachment deleted by admin to conserve space]It's quicker to sort the reverse-sorted file than the randomly sorted file. Code: [Select]reverse 0:00:00.156000 seconds random 0:00:00.265000 seconds

Answer»

Quote from: Squashman on January 07, 2018, 03:48:34 PM

Can you do a quick TEST with the Powershell code I posted. I think it will be around the same time as Dave's Jsort.

88,800 names
input: sorted Z-A
output: sorted A-Z
GNU sort 0.40 sec
Benham jsort 6.13 sec
Powershell 8.49 sec
Batch method 664.99 sec (11 min 4.99 sec)

Technically this is a Powershell one-iner, but I broke it down to 4 physical lines for readability. If you type this at the Powershell command prompt, just keep typing when the line wraps.

Code: [Select](Get-Content .\PHONE.txt) -replace '(.*?\d{3})\s(.*?)', '$1-$2' |
ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone |
Sort-Object Last |
Format-Table * -AutoSize

You can change the path and file name in the first line.

Technically this is a Powershell one-liner. I broke it into 4 physical lines for readability. If you do type this in a Powershell window, type all 4 lines as a single line and just keep typing when the line wraps. The interpreter will understand.

Code: [Select](Get-Content .\phone.txt) -replace '(.*?\d{3})\s(.*?)', '$1-$2' |
ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone |
Sort-Object Last |
Format-Table * -AutoSize

The path and file name can be changed as needed.

It did it, but it ECHOED the sorted output to the console.

Powershell has cmdlets for outputting to a file, however in this case redirection might be the simpler way to go.

Code: [Select](Get-Content .\Phone.txt) -replace '(.*?\d{3})\s(.*?)', '$1-$2' |
ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone |
Sort-Object Last |
Format-Table * -AutoSize -HideTableHeaders > .\Phone.new

If the preference is to have the headers in the output file, remove the -HideTableHeaders parameter from the Format-Table cmdlet.

Added a timer:

Code: [Select]$t = Measure-Command {
(Get-Content .\Notabs_names-rev-sorted.names.txt) -replace '(.*?\d{3})\s(.*?)', '$1-$2' |
ConvertFrom-Csv -Delimiter ' ' -Header First,Last,Phone |
Sort-Object Last |
Format-Table * -AutoSize > out.txt
}
echo "Time: $t"
Result:

Code: [Select]Time: 00:00:13.2834694
The script is clearly doing more work than just sorting: for example it is justifying the columns (input file: 2.6 MB output file: 7.8 MB)

Python

88,800 names
0.138 seconds!

Code: [Select]python sortfile.py > sorted.txt
2018-01-10 21:24:18.494000
2018-01-10 21:24:18.632000

Code: [Select]from __future__ import print_function
from datetime import datetime
import csv
import operator

tstart = datetime.now()
reader = csv.reader(open("Notabs_names-rev-sorted.names.txt"), delimiter=" ")

for line in sorted(reader, key=operator.itemgetter(1)):
print(" " . join(line))

tend = datetime.now()
print (tstart)
print (tend)Better Python (27)

Code: [Select]from __future__ import print_function
from datetime import datetime
import csv
import operator
tstart = datetime.now()
f = open('output.txt', 'w')
reader = csv.reader(open("input.txt"), delimiter=" ")
for line in sorted(reader, key=operator.itemgetter(1)):
print(" " . join(line), file=f)
f.close()
tend = datetime.now()
print ("Elapsed", tend - tstart, "seconds")
88,800 names, 5 runs:

Code: [Select]Elapsed 0:00:00.172000 seconds
Elapsed 0:00:00.156000 seconds
Elapsed 0:00:00.172000 seconds
Elapsed 0:00:00.157000 seconds
Elapsed 0:00:00.172000 seconds
Here are 88,800 names sorted randomly:

[attachment deleted by admin to conserve SPACE]Here's the other, 88,800 names sorted alphabetically by column (2) in reverse ORDER. I notice that the sorted file compresses better.

[attachment deleted by admin to conserve space]It's quicker to sort the reverse-sorted file than the randomly sorted file.

Code: [Select]reverse 0:00:00.156000 seconds
random 0:00:00.265000 seconds

Solve : Sort a Phone List by Last Name.?

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment