|
Answer» I have a list of 500 similar urls. approximately 490 of these link to the same page. How can i find the remaining 10 from the list which do not take me to this page? (without OBVIOUSLY clicking on each one)
TIA
Turbodiesel old.txt
http://www.cat.com/collars.htm http://www.cat.com/fish.htm http://www.dog.com/bones.htm http://www.cat.com/milk.htm http://www.cat.com/scratch.htm http://www.dog.com/collars.htm
type old.txt | find /v "www.cat.com" > new.txt
only lines from old.txt which do not contain www.cat.com will end up in new.txt
From the COMMAND line you can use find.exe. The /v switch excludes lines which contain the search string. You can use the > filename syntax to redirect the output to a file.
old.txt
http://www.cat.com/collars.htm http://www.cat.com/fish.htm http://www.dog.com/bones.htm http://www.cat.com/milk.htm http://www.cat.com/scratch.htm http://www.dog.com/collars.htm
type old.txt | find /v "www.cat.com" > new.txt
only lines from old.txt which do not contain www.cat.com will end up in new.txt
If you remove the /v then only lines which do contain www.cat.com will be copied
Code: [Select]C:\>type old.txt http://www.cat.com/collars.htm http://www.cat.com/fish.htm http://www.dog.com/bones.htm http://www.cat.com/milk.htm http://www.cat.com/scratch.htm http://www.dog.com/collars.htm Code: [Select]C:\>type old.txt | find /v "www.cat.com" http://www.dog.com/bones.htm http://www.dog.com/collars.htm Code: [Select]C:\>type old.txt | find "www.cat.com" http://www.cat.com/collars.htm http://www.cat.com/fish.htm http://www.cat.com/milk.htm http://www.cat.com/scratch.htm Code: [Select]C:\>type old.txt | find "collars" http://www.cat.com/collars.htm http://www.dog.com/collars.htmThanks for taking the time to reply m8 I am refering to the destination page rather then the actual content of the original url. For EXAMPLE....
http://www.cat.com/collars/page1 http://www.cat.com/collars/page2 http://www.cat.com/collars/page3 v v v v http://www.cat.com/collars/page499 http://www.cat.com/collars/page500
490 of these urls will redirect me to a page displaying a picture of a cat
10 of these urls will redirect me to 10 different pages displaying different 10 different images
How do find the 10 "different" links without clicking on each of the 500 links to see where it takes me?You really can't do this. You have to preview the page to know what you want to keep.
And even if the URL works for you today there is nothing stopping the web site to change the URL to show up as one of the other 500 you already have....
Is this one of those SITES that have a picture of the day thing going on?
Could play around with wget
|