Thanks for your help... I dont have access to the computer now but I'll let you know if it works out when I do.


On Mon, Sep 30, 2024, 1:16 PM Rusty Carruth via PLUG-discuss <plug-discuss@lists.phxlinux.org> wrote:
Oops, you are correct, the uniq command should have -w 34 list.of.files
not -w list.of.files.  Sorry!  (here's what I'd typed and what I should
have cut/pasted:

root@rusty-MS-7851:/backups1/backup_system_v2# uniq -c -d -w 34
sorted.new_filesA.md5|less ; wc -l sorted.new_filesA.md5
42279 sorted.new_filesA.md5
root@rusty-MS-7851:/backups1/backup_system_v2# uniq -c -d -w 34
sorted.new_filesA.md5

sorry again!)


Also, if you want to get a list of files and their MD5 sums from 'higher
up' in the directory tree, just change the starting directory in your
find command to that higher up location. However, you might need to run
the entire find and md5sum sequence as root, if the directories (and
files) you care about don't have read permission for you.  (so, to find
ALL files everywhere on your computer, change the ~ to /. You'll
certainly get lots of permission denied errors if you do that as
yourself and not root. But starting at / will traverse ALL directories
on your computer, including /dev, and others you probably don't care
about.  There are some useful options to find (like, don't go to a
different filesystem) you might want to use, see man page for find to
find them ;-)

On 9/30/24 07:05, Michael via PLUG-discuss wrote:
> thank you so much! After running it I find it only finds the duplicates in
> ~. I need to find the duplicates across all the directories under home.
> after looking at the man file and searching for recu it seems it recurses
> by default unless I am reading it wrong.
> I tried the uniq command but:
>
>   uniq -c -d -w list.of.files
>   uniq: list.of.files: invalid number of bytes to compare
>
> isn't uniq used to find the differences between two files? I have
> a very rudimentary understanding of linux so I'm sure I'm wrong
>
> all the files in list.of.files are invisible files. (prefaced with a
> period))
> and isn't there a way to sort things depending on their column (column1
> md5sum, column2 file name)
>
> On Mon, Sep 30, 2024 at 2:56 AM Rusty Carruth via PLUG-discuss <
> plug-discuss@lists.phxlinux.org> wrote:
>
>> On 9/28/24 21:06, Michael via PLUG-discuss wrote:
>>> About a year ago I messed up by accidently copying a folder  with other
>>> folders into another folder. I'm running out of room and need to find
>> that
>>> directory tree and get rid of it. All I know for certain is that it is
>>> somewhere in my home directory. I THINK it is my pictures directory with
>>> ARW files.
>>> chatgpt told me to use fdupes but it told me to use an exclude option
>>> (which I found out it doesn't have) to avoid config files (and I was
>>> planning on adding to that as I discovered other stuff I didn't want).
>> then
>>> it told me to use find but I got an error which leads me to believe it
>>> doesn't know what it's talking about!
>>> coul;d someone help me out?
>>>
>> First, someone said you need to run updatedb before running find.  No,
>> sorry, updatedb is for using locate, not find.  Find actively walks the
>> directory tree.  Locate searches the text (I think) database built by
>> updatedb.
>>
>>
>> Ok, now to answer the question.  I've got a similar situation, but in
>> spades.  Every time I did a backup, I did an entire copy of everything,
>> so I've got ... oh, 10, 20, 30 copies of many things. I'm working on
>> scripts to help reduce that, but for now doing it somewhat manually, I
>> suggest the following command:
>>
>>
>> cd (the directory of interest, possibly your home dir) ; find . -type f
>> -print0 | xargs -0 md5sum | sort > list.of.files
>>
>> this will create a list of files, sorted by their md5sum.  If you want
>> to be lazy and not search that file for duplicate md5sums, consider
>> uniq.  Like this:
>>
>> uniq -c -d -w list.of.files
>>
>>
>> This will print the list of files which are duplicates.  For example,
>> out of a list of 42,279 files in a certain directory on my computer,
>> here's the result:
>>
>>         2 73d249df037f6e63022e5cfa8d0c959b
>>
>> _files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160321-223138.png
>>         5 9b162ac35214691461cc0f0104fb91ce
>> _files/melissa/Documents/EPHESUS/Office Stuff/SPD/SPD SUMMER 2016 (1).pdf
>>         3 b396af67f2cd75658397efd878a01fb8
>> _files/dads_zipdisks/2003-1/CLASS at VBC Sp-03/CLASS BKUP - Music
>> Reading & Sight Singing Class/C  & D Major & Minor Scales & Chords.mct
>>         2 cd83094e0c4aeb9128806b5168444578
>>
>> _files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160318-222051.png
>>         2 d1a5a1bec046cc85a3a3fd53a8d5be86
>>
>> _files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160410-145331.png
>>         2 fa681c54a2bd7cfa590ddb8cf6ca1cea
>>
>> _files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160312-113340.png
>>
>> Originally the _files directory had MANY duplicates, now I've managed to
>> get that down to the above list...
>>
>> Anyway, there you go.  Happy scripting.
>>
>> ---------------------------------------------------
>> PLUG-discuss mailing list: PLUG-discuss@lists.phxlinux.org
>> To subscribe, unsubscribe, or to change your mail settings:
>> https://lists.phxlinux.org/mailman/listinfo/plug-discuss
>>
>
>
> ---------------------------------------------------
> PLUG-discuss mailing list: PLUG-discuss@lists.phxlinux.org
> To subscribe, unsubscribe, or to change your mail settings:
> https://lists.phxlinux.org/mailman/listinfo/plug-discuss
---------------------------------------------------
PLUG-discuss mailing list: PLUG-discuss@lists.phxlinux.org
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss