Solve : help please - identification of duplicates?

1.	Solve : help please - identification of duplicates?
Answer» I need to find to find duplicate lines in a document and then print the line numbers of the duplicates The files contain multiple lines with about 100 numbers on each line I need something that will output the line numbers where duplicates were found ie 1=5=7, 2=34=76 Any SUGGESTIONS would be greatly appreciated - THANKS for your timePresumably by now you've got to the stage in your course where you have been taught about awk. For the purpose of this homework, I would suggest that you first try info gawk (on a Linux system) or info awk (on Unix) and follow some of the examples there. That would be better than us spoonfeeding you the answers. Then if you're still STUCK, come back here for some pointers. You could also do this job in PHP or Perl, but I think the awk solution will probably be simplest.Hi I should clarify - I'm a post doc who is trying to teach myself unix in order to get my work done So there is no class and this is not homework I use awk but aside from spliting the file into a series of one line files and then comparing (which seems very low tech) I don't know how to do this I don't want to delete or count the lines (using uniq) I need to find out which lines match I am seriously looking for suggestions - otherwise I would not have posted this !Quote aside from spliting the file into a series of one line files and then comparing (which seems very low tech) I don't know how to do this Ah, but that's exactly what must be done! Psuedo-code: Code: [Select]OPEN file for reading -> file descriptor one Open file for reading -> file descriptor two Repeat until EOF -> file descriptor one: Read next line -> file descriptor one Repeat until EOF -> file descriptor two: Read next line -> file descriptor two Compare lines; if matching, say so End Repeat End RepeatYou can do this with any of a variety of scripting methods. Unix gurus would probably do it in pure awk/sed. I'm more COMFORTABLE in PHP, so that's what I would use. I can also do this in awk, but it would take me longer to write the script. Do you have a PHP parser on the system in question? (The PHP solution would also be the easiest to understand, IMO.)

Answer»

I need to find to find duplicate lines in a document and then print the line numbers of the duplicates
The files contain multiple lines with about 100 numbers on each line I need something that will output the line numbers where duplicates were found ie 1=5=7, 2=34=76

Any SUGGESTIONS would be greatly appreciated - THANKS for your timePresumably by now you've got to the stage in your course where you have been taught about awk. For the purpose of this homework, I would suggest that you first try info gawk (on a Linux system) or info awk (on Unix) and follow some of the examples there. That would be better than us spoonfeeding you the answers. Then if you're still STUCK, come back here for some pointers.

You could also do this job in PHP or Perl, but I think the awk solution will probably be simplest.Hi I should clarify - I'm a post doc who is trying to teach myself unix in order to get my work done
So there is no class and this is not homework
I use awk but aside from spliting the file into a series of one line files and then comparing (which seems very low tech) I don't know how to do this
I don't want to delete or count the lines (using uniq) I need to find out which lines match

I am seriously looking for suggestions - otherwise I would not have posted this !Quote

aside from spliting the file into a series of one line files and then comparing (which seems very low tech) I don't know how to do this

Ah, but that's exactly what must be done!

Psuedo-code:
Code: [Select]OPEN file for reading -> file descriptor one
Open file for reading -> file descriptor two
Repeat until EOF -> file descriptor one:
Read next line -> file descriptor one
Repeat until EOF -> file descriptor two:
Read next line -> file descriptor two
Compare lines; if matching, say so
End Repeat
End RepeatYou can do this with any of a variety of scripting methods. Unix gurus would probably do it in pure awk/sed. I'm more COMFORTABLE in PHP, so that's what I would use. I can also do this in awk, but it would take me longer to write the script.

Do you have a PHP parser on the system in question? (The PHP solution would also be the easiest to understand, IMO.)

Solve : help please - identification of duplicates?

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment