if you are just looking for a list of unique values, why not just do: cat file1 file2 | sort | uniq > file3 Obviously you could have reasons why this won't suffice for your need, but I've not seen that in your description yet On Sat, Aug 28, 2010 at 9:48 AM, Kevin Faulkner < kondor6c@encryptedforest.net> wrote: > Sorry about the time issue. > On Friday 27 August 2010 23:50:00 you wrote: > > I hope these are small files, the algorithm you wrote is not going to run > > well as file size gets large (over 10,000 entries) Have you checked the > > space/tab situation? Python uses indentation changes to indicate the end > > of a block, so inconsistent use of tabs and spaces freaks it out. Here > are > > a couple questions: > This is not a school project, so you won't be doing my homework or anything > :) > The space/tab issue is okay, but the script does not even get to the > print(i), > I even tried for line in secondaryfile: and the for loop still wouldn't be > executed. > > Are these always numbers? > Yes, they are IP's from an Apache error log. > > Do the files have to remain in their original order, or can you reorder > > them during processing? How often does this have to run? > they are not in order because one list is 852 entries and another list is > 3300 > entries. This script only needs to run once. > > Do you have to "comment" the duplicate, or can you remove it? > The plan is to remove it, but I wanted to see if my removal method would > work, > so I was trying to put a comment next to it. > > Are there any other requirements not obvious from the description below? > No real requirements, if anyone would like the original files I can give > them > to you, a lot of them are bots. > Thank you :) > -Kevin > > > > Kevin Faulkner wrote: > > > I was trying to pull duplicates out of 2 different files. Needless to > say > > > there are duplicates I would place a # next to the duplicate. Example > > > files: file 1: file 2: > > > 433.3 947.3 > > > 543.1 749.0 > > > 741.1 859.2 > > > 238.5 433.3 > > > 839.2 229.1 > > > 583.6 990.1 > > > 863.4 741.1 > > > 859.2 101.8 > > > > > > import string > > > i=1 > > > primaryfile = open('/tmp/extract','r') > > > secondaryfile = open('/tmp/unload') > > > > > > for line in primaryfile: > > > pcompare = line > > > print(pcompare) > > > > > > for row in secondaryfile: > > > i = i + 1 > > > print(i) > > > scompare = row > > > > > > if pcompare == scompare: > > > print(scompare) > > > secondaryfile.write('#') > > > > > > With this code it should go through the files and find a duplicate and > > > place a '#' next to it. But for some reasonson it doesn't even get to > > > the second for statement. I don't know what else to do. Please offer > > > some assistance. :) --------------------------------------------------- > > > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > > > To subscribe, unsubscribe, or to change your mail settings: > > > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss > --------------------------------------------------- > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > To subscribe, unsubscribe, or to change your mail settings: > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss > -- Dazed_75 a.k.a. Larry The spirit of resistance to government is so valuable on certain occasions, that I wish it always to be kept alive. - Thomas Jefferson