High Quality Image Search Engine

Discussion in 'Review My Site' started by learning_brain, Jun 15, 2010.

  1. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    I welcome any feedback on my new site.

    http://www.smartimagesearch.net76.net

    I only use large images and each image is visually checked for aesthetic quality and safety.

    I have generated an image crawler, which is still very busy in the background. Indexed images are now circa 2,000, although this is tiny in comparison to many other engines.

    This site is different from many in that I don't use thumbnails. This has a disadvantage in that it takes time to load but the significant advantage is that you get a large image preview when you hover over the smaller images.

    My search index is crrently limited to the mysql fulltext search, but this is better than a straightforward LIKE '%$whatever%' query.

    The html and css needs tidying a bit but I wanted to guage a reaction at this point to see how I should proceed.

    Rich
     
  2. djalam

    djalam Member

    Messages:
    89
    Likes Received:
    2
    Trophy Points:
    8
    It looks pretty good so far, i think it would be nice if u could query the results as they are being typed using jquery.
     
  3. catz154

    catz154 New Member

    Messages:
    152
    Likes Received:
    0
    Trophy Points:
    0
    As far as design goes, I like it overall. It is nice and clean, and simple.
     
  4. cybrax

    cybrax Community Advocate Community Support

    Messages:
    764
    Likes Received:
    27
    Trophy Points:
    0
    I have a question, do you own or are you in any way connected with any of the web sites currenty indexed?


    Hotlinked images are risky to use for a number of reasons. A good webmaster will soon spot you stealing bandwidth from their site, plus they could swap the image for something less tasteful.

    Then there is the small matter of upsetting the likes of Google and other advertising providers, you ask visitors to provide "page or image for us to crawl', you are of course aware that 'crawling' a web page registers as a page view even though nobody sees the page which is where advertisers take umbridge.



    Mystery Meat Navigation.../ search

    2000 plus images but not a clue about what kind of images , random image of the day on landing page would be a good way to fill a blank space.
     
    • Like Like x 1
  5. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    Thanks for all the feedback - looks like I might be on the right track.

    I also like a clean layout. I think a site (like google) that presents simply but has massive functionality has more appeal to me personally than a buzy, fussy site that is, in essence, pretty static.

    @Cybrax - you raise some good points here. I have not, and cannot practically have, a connection with all source sites.

    Yes they are hotlinked, although the alternative is simply reverting back to thumbnailed views, which defeats the purpose of the exersize and makes my site like any number of similar image search sites. Presently, it has a USP.

    Fortunately, I have an image checker which runs fairly constantly in the background checking image availability, size and content (although "content" is limited to text). You are right that if a webmaster was particularly annoyed about me stealing a small amount of bandwidth, they could switch the image itself but leave the meta tags the same. I might be able to get round this by using the more detailed exif_read_data function to pull out everything.

    As for advertisers taking umbridge, surely this would depend on whether their tracker (such as google media) recognise/don't recognise the CURL set option: user agent as a crawler. I have a logger that filters out bots so I'm not sure where I lie on this one.

    I have an idea though that will pay dividends for the source site..... :D

    I love your idea on a random image. I agree that the opening page is bland an uninviting, so this is a perfect way to induce the first search.

    Good stuff... thanks.
     
    Last edited: Jun 16, 2010
  6. lemon-tree

    lemon-tree x10 Minion Community Support

    Messages:
    1,420
    Likes Received:
    46
    Trophy Points:
    48
    As said before a random image of the day would do wonders, particularly if you picked a large image that could be run as the homepage background much like we see with Bing. Until then, it all seems a bit dark when navigating and the search results are compressed into the tiny space on the left. To do any image justice it needs space and the best way you be to make the results take up the full browser window and then if the user clicks the image it'll bring it up larger.
     
  7. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    a random image as background? hmmm.... how would I pass the image url to the css style sheet? interesting concept although how I'm going to blend this in with the practical 1024x768 common fixed space, I'm not sure yet.

    Making the results consume the whole window is also interesting and I have had previous attempts. The ony problem with this is that getting the larger image preview on thumbnail rollover is difficult. I'm using a span with css hover to provide an absolute placing of a limited size, but larger preview. I hesitated using a click for larger view at this stage because I feel that a medium preview on rollover is extremely useful.... I may be the only one!

    That said, I have now finished the click-through image view. I have had to use frames (yuck) to get over some hotlinking issues but the exif_read_data gained is a valuable resource to visitors, especially to those checking copyright in the meta data.

    The main issue bugging me ATM is the search itself. I am currently limited by a 4 character fulltext "MATCH..AGAINST" system. I have also tried a loop for multiple word searches using the standard LIKE %whatever%, but this doesn't order by relevance, whereas the MYSQL fulltext does..... I really need a bespoke indexing system with search algorythm but this isn't a simple 2 minute job.
     
  8. lemon-tree

    lemon-tree x10 Minion Community Support

    Messages:
    1,420
    Likes Received:
    46
    Trophy Points:
    48
    Any search engine is going to need some pretty advanced search features. I built a very simple search box into my new project that is used to search for records in a table: functions such as UNION and a few other MySQL query tools are invaluable when it comes to designing searches. Perhaps it would be beneficial if you download some search engine scripts form elsewhere and see how they do it.
     
    • Like Like x 1
  9. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    Yeah - I have done limited searches but most are fairly simple queries and produce random results.

    I'll have to dig deeper!

    Meanwhile, I'm refining my crawler. Currently I have two pages - one for scraping a href's and adding to a image pending queue and one to loop through that queue. Trouble is, a href's perpetuate much more quickly than image results so simply blending them together in one page produces disproportionate queues. I'm now blending them but with a check on the image queue - i.e. if its low, look for a href's. If the image queue has rows, scrape for images. That way I can leave it completely autonomously and just add a link or two every so often if the href crawler runs out of options (unlikely).
     
  10. v4xde

    v4xde New Member

    Messages:
    74
    Likes Received:
    0
    Trophy Points:
    0
    good and fast, the only thing will be to not open a new window, but instead keep visitors within the site.
    walk around in changes to take that into consideration.
    v
     
  11. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    @v4xde - Thanks for that. I understand your reasons for not creating a new window... I'll consider. I might just use target blank on the external url link instead of the view image page.

    Crawler is now fully automated and I'll be adding a link to the main page so visitors can actually see it working!

    I had an issue before because I was only storing large images , but this was slow because when it was crawling another page, it was potentially re-testing smaller unsuitable images it had already scanned. This has now been fixed and the crawler is MUCH faster.
     
  12. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    Sorry for the double post, but this has had a lot of work done.

    www.smartimagesearch.net76.net

    The front page looks similar, but believe me, this is hugely improved... with neat page navigation and great image previews.

    The crawler now works well, and you can watch it working!
    The search is a bespoke relevance engine that I wrote myself (with some help from misson!) which works great.
    The view image page now has full image meta data.
    I now have over 3,000 high res images and going up!

    Further comments to fine tune it are much appreciated.

    Rich
     
  13. lemon-tree

    lemon-tree x10 Minion Community Support

    Messages:
    1,420
    Likes Received:
    46
    Trophy Points:
    48
    The problem I am seeing is that because you are using the full size images in the thumbnails, it takes ages for even the thumbnails to load which is more than a little frustrating. Perhaps the balance would be to load small versions of the images for the thumbnails whilst loading the full size images in the background.
     
  14. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    That.......... is......*thinks*......... a very good idea!! Why didn't I think of that!!!

    I'll have to practice up on my GD skills again because I think this will take a while to sort out. The crawler is already hugely long and crammed with if{}else{} sections and it's starting to get a little confusing - even with the correct indentation...

    It will inevitably slow the crawler down (again), but I think worth it. (I could have another page that searches through existing image urls and checks if it has an associated thumb - if not - create one!)

    This is definately in the pipeline for improvement! thank you.
     
    Last edited: Jun 25, 2010
  15. lemon-tree

    lemon-tree x10 Minion Community Support

    Messages:
    1,420
    Likes Received:
    46
    Trophy Points:
    48
    It would inevitably be slower at crawling, but I think that it will be outweighed by being hugely beneficial to keeping users on the page without them getting bored. Just make sure you manage the thumbnails efficiently, as the storage space required could easily grow exponentially. Good luck.
     
  16. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    Thanks

    Although I store every img url that it comes into contact with, It will disregard all those that do not fit the "suitability" criteria. This means the crawler won't go over old ground and is much more efficient as a result.

    "Suitable" images occurr every 250 or so images tested, so it shouldn't be too difficult to do this only for appropriate ones.

    The more I think about this, the better it seems :D
     
    Last edited: Jun 25, 2010
  17. mattblog

    mattblog New Member

    Messages:
    463
    Likes Received:
    13
    Trophy Points:
    0
    Everything looks good except for theres one issue. by being on that domain (subdomain) net76.net you get spammers making it look bad.

    http://www.siteadvisor.com/sites/ne...aff_id=0&locale=en_ca&os_ver=6.0.1.0&pip=true

    some spammers have made comments leader to virus telling sites. Regardless you can fix this by talking to mcafee siteadvisor and telling them spammers did it and its just the because some of the domains on that site did that. or you can sign up with co.cc

    http://www.co.cc/?id=191029
    Therefore it will be your own domain name.

    Good website though =D
     
    Last edited: Jun 25, 2010
  18. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    Thanks Matt for the comments.

    Didn't know that, although to be honest, this is only a development platform before I buy a "proper" host and domain name... which will resolve the issue.

    I can't develop a crawler on x10 due to Terms of Service and I've have a bad experience with X10 anyway.

    Being part of a free site never helps with Search Engines.

    Still fiddling with the css ATM to get it right.

    And..... crawling....crawling.....crawling
     
    Last edited: Jun 26, 2010
  19. learning_brain

    learning_brain New Member

    Messages:
    206
    Likes Received:
    1
    Trophy Points:
    0
    Sorry for the double post... :(

    I have now made a few big changes. My previous problem was image loading speed. As I was loading all the original images, a set of 18 took quite a while and would have put the visitor off.

    I am now creating thumbs for the majority of images in the db, which realy speeds the process up. (only another 20,000 odd to go....)

    In addition (and I really like this bit), you can change the way the search engine works. By clicking on "similar", it will also check for similar entries. Without "similar" turned on, it will search for the exact string, but with "similar" switched on, it will take into account spelling mistakes and return more results.... neat eh. Beat that google!!
     
    Last edited: Jun 28, 2010
  20. v4xde

    v4xde New Member

    Messages:
    74
    Likes Received:
    0
    Trophy Points:
    0
    Hi, is there a new URL, because I get google code instead of your website. I would like to know how you are doing your web presence.
    v
     

Share This Page