Get filesize of external/ remote image

learning_brain

New Member
Messages
206
Reaction score
1
Points
0
This is an interesting one.

I have an image crawler (external sites) that gets image size no problem, but filesize is proving tricky.

I'm getting errors with both filesize() and stat() - presumably because I'm not reading from a local directory.

I have two crawl systems - one for images directly (simple) and one for images embedded within html based pages (CURL/DOM method)

My crawl bot is now a bit slow due to calls for getimagesize() and the CURL isn't too fast either, so I don't really want to slow it down further.

Any ideas?
 

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
Unless you're already GETting the images with HTTP, issue a HEAD request. The Content-length header in the response is the size of the image data. If you're GETting the images for other purposes, just use the Content-length from the existing query.
 
Last edited:

learning_brain

New Member
Messages
206
Reaction score
1
Points
0
Fantastic! - Thnaks Misson.

Due to the number of images already obtained, I've written a separate page on cron that does just this.

As my crawler also harvests embedded images, it was tricky to do this as the cURL is for the main page, not each image, but the new page resolves all these issues.

WOW it's fast!!!!

Thanks again.
 
Top