Solr looks interesting thank you. Why Java 1) I know Java the best, and I can make something work very very easy, I just want something I can run continuesly with low foot print on a comodity VM. 2) The rest of the backend is in Java and I like to have a single development languae for any given segment of a project if posable for easy maintenance. 3) Mostly the firt reason ;) As for sudo code of what I could do, I could do something like the following, I am just not happy with how it looks... java.io.BufferedInputStream in = new java.io.BufferedInputStream(new java.net.URL(urlVector.getValue(i)).openStream()) while (!endOfPage) { String line = (in).readLine(); for ( i=0; keywordVector.getSize; i++) { if (line.indexOf(keywordVector.getValue(i)) > 0) { somecount.set(i, somecount.getValue(i)++); } if (line.indexOf("") > 0) { endoOfPage = true; } } } in.close On Sat, Aug 1, 2009 at 7:25 AM, Lisa Kachold wrote: > Why java? > > Why not a simple javascript search script? > > > http://stackoverflow.com/questions/141280/whats-the-best-way-to-count-keywords-in-javascript > > On 8/1/09, Bryan O'Neal wrote: > > Thought of that, the overhead is worse then scraping, parsing, and > > searching. > > > > On Fri, Jul 31, 2009 at 7:51 AM, Lisa Kachold > > wrote: > > > >> Try using google? > >> > >> On 7/31/09, Bryan O'Neal wrote: > >> > Ok, so I want to, with utmost efficacy, go through a web pages and ask > >> how > >> > many of a set of key words is in that web page. Does any one know of a > >> good > >> > open source tool for this? > >> > I have hundreds of web pages and a near equal number of key word sets > so > >> > scraping each page, parsing to create a vector of strings and doing a > a > >> set > >> > of nested for loop to run through each vector and compare to words in > >> > the > >> > key word vector is, well, FAR from efficient. > >> > I heard of Apache velocity, but that seems to be for creating pages on > >> the > >> > fly. I also heard of Apache lucene, but appears to be for implementing > >> your > >> > own query engine on your application server (to index and query your > >> pages) > >> > > >> > Also, if you know of a local ACTIVE java forum I would love to know > >> > about > >> > it. I have subscribed to a half dozen lists and there is nothing but > >> > silence. > >> > > >> > Thanks a bunch :) > >> > > >> > >> > >> -- > >> > >> (623)239-3392 > >> (503)754-4452 www.obnosis.com > >> --------------------------------------------------- > >> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > >> To subscribe, unsubscribe, or to change your mail settings: > >> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss > >> > > > > > -- > > (623)239-3392 > (503)754-4452 www.obnosis.com > --------------------------------------------------- > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > To subscribe, unsubscribe, or to change your mail settings: > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss >