Hotlinking Question...

learning_brain

New Member
Messages
206
Reaction score
1
Points
0
Some of you may know I now run an image search engine that crawls for high quality pics and graphics.

Unfortunately, due to the somewhat unpredictable nature of my 'good' free host, my account has been deleted so I can't demonstrate the problem.

Essentially, there are two parts (well more but two critical ones) to the crawling process.

1) find and store all <img src="whatever">'s
2) create and store small jpg thumbnail for reference.

Simple so far?

Yep, but my index page search has an interesting function. Firstly, it loads all thumbnails related to the search (fast loading). In addition however, it also hotlinks (or loads the image from the originating site) so that a large preview can be shown when you hover over the thumbnail.

Sounds great, so whats the problem?

The problem is that "hotlinking" is frowned upon by many. It eats into the originating site's bandwidth and, as I'm using high resolution images, this can be a severe hit to someone with limited resource. The secondary hang-up for my site is that the page takes a while to load, as each page result has about 16 large images on it.

This practice is not always frowned upon. My site provides the originating site's address, hence creating a back link for them and driving traffic. Also, the ability to create direct links is kinda the point of the world wide web is it not?

I have two options:

1) Drop the large image preview and stick to the 120x80px thumbs which are stored on my site... which will make it more like google and lose part of the USP uniqueness. This will also speed up my page loads.

2) Keep the hotlinked large image preview, maintain my sites uniqueness and risk the consequences... if there are any :S (other than the known "switcheroo" problem which means the originating site may alter the image without warning for something unsuitable)

Any opinions?
 

lemon-tree

x10 Minion
Community Support
Messages
1,420
Reaction score
46
Points
48
If the large images are what defines your site, then removing them wouldn't really get you anywhere. However, another alternative would be to only load the large images when the user clicks or mouseovers the image; this is a balance of how Google will take you out of the page just to see a large image (which is frustrating) and your technique of loading them all (which is slow).
There shouldn't be any consequences of loading images from people's sites, as any that feel they don't want their images shared can use the hot-linking protection. I assume you have code built into your crawler to ensure the image does actually exist and hot-linking can take place.
Finally, it's no surpise you were suspended from a free host if you were running a crawling bot on it; for this sort of computational power etc and accessibility you should really be looking at something more like a VPS.
 

descalzo

Grim Squeaker
Community Support
Messages
9,373
Reaction score
326
Points
83
Opinion?

Hotlinking is theft of bandwidth. And theft is theft.
 

learning_brain

New Member
Messages
206
Reaction score
1
Points
0
Interesting diversity of opinion... as I feared.

Yes I do have functionality built into the crawler to ensure hotlinking can take place, but I'm mimicing the useragent in cURL to look like a browser. The check comes when I try to gain info about the image (like size). If I can't obtain the info, likelihood is, they don't permit hotlinking and the root URL(with directories) gets ignored.

I did check my hosts tos carefully before hosting it with them, but I think they must have cottoned on to the large processing power needed. One url link is OK, but when it spiders to numerous urls, it gets more interesting! And yes I am looking to a paid service, either VPS or dedicated. ATM, I'm developing using XAMP.

Descalzo - your point is also noted and it is for this reason I posted.

The really interesting point you raise lemon_tree, is the "load on hover".. which I asume would be done in Javascript (assuming the browser has JS). Unfortunately, I have no experience with Javascript and have no idea where to start with it... I'll have do do some googling.

Thank you both.

Rich
 

lemon-tree

x10 Minion
Community Support
Messages
1,420
Reaction score
46
Points
48
The Javascript required would really be quite trivial and would just be a case of populating img tags src from an array containing the URLs of the images in the current page. Adding a 'loading image' overlay whilst the image is still transferring would also add to the UI for any particularly slow loading images.
This technique would also reduce the amount of hot-linking required as only the user's desired images are ever loaded fully; this should move it more to the situation that descalzo favours.
 

learning_brain

New Member
Messages
206
Reaction score
1
Points
0
LOL - even trivial JS is hard for me! BTW - I like the Overlay Image idea.

Still reading up....

I've also found a way to avoid the nasty switcheroo problem.

When the crawler finds an image, I could use exif_read_data() and store a concatenated string as a fingerprint of that image. Then, on the image view page (not quite the same as the index thumbs and preview) I can duplicate the check and compare each value. If teh fingerprints don't match, all I'd have to do is display a "Sorry" statement and mark that record to ignore it in future....It would slow down the cralwer even more though :(
 

leafypiggy

Manager of Pens and Office Supplies
Staff member
Messages
3,819
Reaction score
163
Points
63
Content dynamic network?

That's what google uses. Might be helpful for you to read up on it and maybe make your own. That way, you can cache the high-res images on the CDN, and access them when needed. Probably will save loading time as well since it will be a local DNS lookup.
 

learning_brain

New Member
Messages
206
Reaction score
1
Points
0
Wow - CDN's are complex! (Content Delivery Network?)

Yes, this would be the perfect solution.... if I had enough traffic.... and enough money! CDN's are notoriously difficult to set up on your own by the look of it and professional services are extortionate! Some CDN networks claim to offer better performance, but in reality depend on PoP and network Server proximity.

The JS pre-load is an idea I could use for the overlay (simple to you I know), but this becomes tricky to incorporate becasue my images are already in a php array and I'm already using an "image-over" css trick using the span.

Head hurts.. tomorrow I'll look at it again.
 
Last edited:

cybrax

Community Advocate
Community Support
Messages
764
Reaction score
27
Points
0
It's a classic 'Rat Hole' project, not because the script is imposibble but rather the fact nobody will host it for you on a shared server [free or paid] due to the high CPU resource usage. The other thing is of course that the server running the crawler is going to have a fixed IP address and webmasters of sites on the crawl list are going to spot something is wrong fairly quickly.. as will Google Adsense but that's another issue all together.

Plan 'A':
Now it would be nice if all the scraping could be done Client-Side on the visitors PC, using their processor power, IP address and bandwidth etc. Alas though there is no real way of doing this due to the browser security restrictions on cross domain scripting. The closest you can get is by using YQL and Jquery/JSON but it's far from ideal.

Plan 'B':
Run a server at home on your own Internet connection to perform the heavy work of 'searching & storing' data using whatever script and displaying the result of any query. Using a Pro hosted site (free or paid) as a 'Web Anchor' for static content/ indexing / SEO and link the two together...


Plan 'C':
'Know Thy Visitor' - why crawl the web for every user request? People are predictable or rather follow certain trends and a little digging around the webs photostock sites reveals what folk are asking them to find. So...

Querying a database to return stored image URL(s) information for a given rubric gives the same percieved effect to the visitor. Grab an object/noun word list from the web and run it through a trial builder version of 'Djuggler' to generate the DB.
 

learning_brain

New Member
Messages
206
Reaction score
1
Points
0
Thanks Cybrax

I'm shortly going to be moving to paid hosting and have now allocated a .com domain name for it. (not very interesting but great for SEO)

I have already checked with the hosts about the CPU usgae for the crawler and they don't have problem with it a) because they don't have enough clients or more likely b) they have no conept of the degree of processing it will require. Ho hum, that's their problem.

I quite like your Plan B idea, although this is a very new venture and I want to gain some experience of "knowing mine visitor"! Investing in a dedicated server ATM is not a preferred option.

As for your plan C, I had already orgnanised the script to do just that. It will prefer domains including key words such as "wallpaper" for instance (and other common searches), which will aim the content more precisely than if I let it do it's own thing.

The hotlinking issue is still bugging me. I haven't worked out the JS code yet for selective image loads, but the issue still remains to a certain degree. Still working on that....
 
Last edited:

cybrax

Community Advocate
Community Support
Messages
764
Reaction score
27
Points
0
Thanks Cybrax


The hotlinking issue is still bugging me. I haven't worked out the JS code yet for selective image loads, but the issue still remains to a certain degree. Still working on that....

I saw your site before it disappeared so have some idea what you need. Suggest 'Lazy load' extension for jQuery might be useful for your needs with a bit of tinkering.

Pre-loading the large images is not the way to do it, as you found it slows the page loading something awful. Swapping the innerHTML of the image SRC is a better way of doing things than show/hide DIV.

http://glevum.x10.mx/sv4.html
 

learning_brain

New Member
Messages
206
Reaction score
1
Points
0
Brilliant!!!

This means all large "hotlinked" image could be loaded only on demand, bu using an on-hover. This still means it's hotlinked, but not all the images on the page will load at once.... just so simple and effective.

I had a look at the lazy load, but in all honesty, there's alot of code there I don't understand and I prefer to do this myself so that any tinkering can be done with ease.

The psudo search link you gave didn't work for me unfortunately, but I get the principal.

trying it now on XAMP... thx.

----------EDIT-----------

I was just wondering if I could use the following....or something like...

PHP:
<div class= "thumbnail">
<a target="_blank" href="view_image.php?img_url=<?php echo $part_processed_array[$row][IMAGEURL_FC];?>"
onMouseOver="document.roll.src='<?php echo $part_processed_array[$row][IMAGEURL_FC];?>'"
>
<img src="thumbs/<?php echo $part_processed_array[$row][IMAGETHUMB];?>" alt="<?php echo $part_processed_array[$row][IMAGETHUMB];?>"/>                 
</a>
						
<span>
<img src="<?php echo $part_processed_array[$row][IMAGEURL_FC];?>"/>
</span>

</div>

As you can see, I'm using a thumb as an image for the a link, but the span loads the larger original image. I've added an additonal line (onMouseOver) which only loads the large image...er...on mouse over :) The css will then position the span absolutely, and give me the effect I want.

The trouble is, the span will load this image regardless ATM, and I can't figure how to get the span image src to follow the same behaviour as the a link... As the span is positioned away from the cursor, I can't use a hover effect anyway.

New ground...always tricky!

I did try using javascript to change the inline src="whatever", (adding id the img tag) but couldn't get it to work.
 
Last edited:

learning_brain

New Member
Messages
206
Reaction score
1
Points
0
Sorry for the double post but I've cracked it...

Test Page below - which works treat. Large preview only loads onmouseover. In firefox, the image stops loading onmouseout, but in IE and Chrome, it continues loading but not seen until mouseover again...

Simples no?

PHP:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>

<style type="text/css">
/*-----------------thumbnail image library----------------*/

.thumbnail /* container */
  {
	height: 88px;
	width: 120px;
	float: left;
	text-align: center;
	border: 1px solid #999;
	background-color: #333;
  }

.thumbnail img /*image itself*/
  {
  margin-top: 5px;
  border:none;
  background: none;
  max-height:78px;
  height: expression(this.height > 78 ? 78: true); /*IE6 fix*/
  max-width:111px;
  width: expression(this.width > 111 ? 111: true); /*IE6 fix*/
  }
/*-----------end thumbnail image library----------------*/

/*-----------start large image-------------------------*/

.thumbnail span{ /*CSS for enlarged image container*/
	width: 550px;
	height: 528px;
	margin: 10px;
	padding: 5px;
	position: absolute;
	background-color: #333;
	background-image:url(images/imageLoading.gif);
	border: 1px solid #999999;
	visibility: hidden;
	color: #CCCCCC;
	font-size: 10pt;
}

.thumbnail span img{ /*CSS for enlarged image*/
	border-width: 0;
	max-height: 440px;
	height: expression(this.height > 440 ? 440: true);
	max-width:540px;
	width: expression(this.width > 540 ? 540: true);
}

.thumbnail:hover span{ /*CSS for enlarged image*/
	visibility: visible;
	bottom: 0px;
	right: 0px; /*position where enlarged image should offset horizontally */
}


/*----------end large image-----------------------*/

</style>

</head>

<body>

<div class= "thumbnail">
    <a target="_blank" href="http://www.google.com" onMouseOver="document.thumb.src='http://bfme.us/dl/HugeNanduImage20p.jpg'">
        <img src="http://x10hosting.com/forums/images/Example_1_th.jpg" border="0" />
    </a>
    <span>
        <img src="http://x10hosting.com/forums/images/imageLoading.gif" border="0" id="thumb" name="thumb"/>
    </span>
    
</div>

</body>
</html>

___________EDIT____________

Had some problems integrating this into the main index page with results...

... until I learned that the loop was creating duplicate javascripts with duplicate span names, so I just added <?php echo $row;?> as an index to the javascript and span name/id and it works just fine and dandy

Really, really pleased with this result!!

Thanks for all the help.
 
Last edited:
Top