Hi everyone,
I want to find (and optionally delete) duplicate images
in my directories. I already have written a Perl program
(here : http://ixedix.x10hosting.com/findsame.htm ) that lists
JPG files of the same size, as possible candidates for manual deletion.
To make the deletion automatic, I need to be pretty sure that the images
are the same. From what I read while googling the subject, it seems
that there is no checksum or similar signature (MD5, etc) in the
JPG format. My idea now is to read a 256 bytes block in the middle
of each potentially duplicate image, compare it to a block at the same
offset from the first image of that size, and delete the clone if
the blocks are identical.
Of course, a byte-by-byte comparison of the entire files could do
the job, but it would be a lot slower.
Any suggestions on how to do it the fastest, safest way ?
Ancilliary question : is there a "move file" function in Perl or PHP ?
I'd like to move the duplicate files to a garbage bin before finally
deleting them for good.
I want to find (and optionally delete) duplicate images
in my directories. I already have written a Perl program
(here : http://ixedix.x10hosting.com/findsame.htm ) that lists
JPG files of the same size, as possible candidates for manual deletion.
To make the deletion automatic, I need to be pretty sure that the images
are the same. From what I read while googling the subject, it seems
that there is no checksum or similar signature (MD5, etc) in the
JPG format. My idea now is to read a 256 bytes block in the middle
of each potentially duplicate image, compare it to a block at the same
offset from the first image of that size, and delete the clone if
the blocks are identical.
Of course, a byte-by-byte comparison of the entire files could do
the job, but it would be a lot slower.
Any suggestions on how to do it the fastest, safest way ?
Ancilliary question : is there a "move file" function in Perl or PHP ?
I'd like to move the duplicate files to a garbage bin before finally
deleting them for good.