Webbots, Spiders, and Screen Scraping

If you have been reading my reviews for any amount of time you know that I love tech books, and I usually give them pretty glowing reviews, especially No Starch books. They are informative, teach you things, make you think outside the box. I love No Starch books.

Alright, now that you know I love No Starch I am sad to report I have found the bad apple in the bunch. Webbots, Spiders, and Screen Scrapers. I didn't come to this point of view lightly, I really tried to find the good in this book, and there is some, however it is overshadowed by what I consider to be a pretty lame mistake on the authors part.

Webbots, Spiders, and Screen Scrapers is all about the what, how and why of webbots, spiders and screen scrapers. Basically a guide to why you need them, how to make them, and what they should be doing. It is a great reference as to what webbots are, and you can learn a thing or two while reading this book.

My gripe is pretty simple, and there is a work around for it, but here it is. The author, Michael Schrenk, didn't teach us all about writing webbots, spiders and screen scrapers. The book was meant to be an tool in teaching the PHP/cURL involved in writing these bots. Instead the author wrote a library of functions and tells you to include it, and then uses the book as an almost 400 page reference to his own library.

Sure you could open the library up and read though the code and get an idea of what is going on, but really that wasn't the point of the book. The point of the book was to show you how to use PHP and cURL to build your own bots, and spiders. What you get is a book that tells you how to build HIS bots and spiders. Furthermore the library comes with disclaimers about bugs in the code, instead of fixes to the code. So now you have a book that won't teach you to code PHP /cURL webbots, and it gives you code that may or may not work for what you are doing.

The silver lining in all of this is that the book did come with the library and if you are inclined to open it up and read through the code, then you can get a sense of what you really wanted to know in the first place. How to handle pages as files, how to parse for information, and how to store the information you pulled. I really would have liked the book to have been more about the building of the library than a reference to the library.