Desktop search, code analysis

Mon, 2006-10-16 05:32 — john

I've been contracted to do some analysis of the client's code. Naturally, the first thing I did was load up all their source into Callgraph, and generate HTML of all their source. :) Now I have something that is preprocessed, formatted, easy to read, and with hyperlinks between calls and procedures. The best part is, plain text searches of this output HTML are more useful and reliable than unprocessed code would be.

That leads me to my topic. I wrote a little find/grep script with generates a nice HTML list with links to files which contain my search string. It works great, but it takes about 10-20 minutes per search. The need for a desktop index/search application is something that I've avoided until right now.

So, off I went, looking into what's available. Here are my notes:

Copernic: Fast to index, very nice clean UI.
Docco: Neat, but the UI isn't exactly what I want. Ideally, I'd like to generate HTML reports. Powered by Apache Lucene.
Filehand: Slow to index. Decent UI. Can generate a plain text report.
Google Desktop: Limited, with concerns about privacy. Didn't try it.
Regain: Is a Sourceforge project, browser UI. Powered by Apache Lucene.
Svizzer: Very slick UI. The indexer seems slower than Lucene, though I didn't actually measure it. Looks like it should be good, but it seems to strip '_' from searches, which makes it useless to me.
WhereIsIt: Shareware, $40. Seems like it should be good, but I must have misunderstood something about it, because my 5 minute install and play didn't give me a search of the file contents, only the file names.
x1: Had wanted to try it in the past, and now that it's free, had no excuse not to. Pretty cool. Seems difficult to tell it to go index just one particular directory right now, which is rather necessary for how I want to use it. Has an option for exporting the search results (with selected columns, delimiters, etc) which would be helpful for the way I want to use it.

I've settled on Regain for searching code. The main reason for this decision is because of Regain's web based UI. When I use Regain to do a search, the result is in HTML, which I can save off to a file, edit, and add my own notes to. That's exactly like what I was doing with my own find/grep script, and I was finding it to be quite effective.

I'm very impressed with the nice UI in Copernic. X1 seems very feature rich. If I ever decide that I want to index my emails and such, I'll have to do some more comparisons between the two.

If anybody else has any favorite desktop code indexing applications that they use, I'd sure like to see your comments here! (And, yes, I know that an application could easily be built using OE word indexes. :)

Be sure to check the comments because I'll be posting more findings on this topic as I go.

john's blog
Login to post comments

Comments

Sat, 2007-11-24 23:18 — SpectateSwamp

Indexing for personal info is Over Kill

I also wrote my own text search. Instead of searching through directories everytime.
I merge the *.txt files into merge.txt and search that file. The merge marks the start
and end of each file then displays the path to each match file, in the form title bar.

20,000,000 characters per second is the maximum read speed. It slows with
context search.

All my textual data since 1996 is nearing 80,000,000 bytes.
A lifetimes reading will fit on a DVD. Why keep, catalogue or index anything you have
never read.

Tue, 2008-01-01 17:17 — SpectateSwamp

Spectate Swamp Desktop Search is NOW Open Source

The most powerful program on the Planet goes open source.

This simple 10,000 line Visual Basic 5 program does
Video, Music, Pictures and Text. With lots of room
for improvement.

Any takers?

"Trolling for Desktop Search enthusiasts" channel9 msdn dot com
http://channel9.msdn.com/ShowPost.aspx?PostID=368158

visit the above MicroSoft site and request the most current
source code.

Still lots of the early release copies left.

Desktop Search is fun.

Cheers
Spectate Swamp

Open source makes It Your desktop search

Fri, 2006-11-10 23:35 — Knut Handsome

New Version of Copernic buggy?

I'm very disappointed so far with the new version of Copernic (2.01). I've been using and recommending Copernic (as was) for some time, and it is one of the two or three most essential utilities I have.

It has always had the odd gremlin - some directories are not picked up in initial indexing, but the new version is missing various directories, and individual files to the extent that I've had to junk it and return to WinGrep with the concomitant wait time.

Although it's prettier, I find it more fussy and less intuitive, and too buggy to rely on.

Like I said, very disappointed.

Sat, 2006-11-11 00:24 — Knut Handsome

Oh yeah, and did I mention

the hanging and crashing?

Sat, 2006-11-11 01:19 — john

Huh, I haven't had any of

Huh, I haven't had any of those problems so far. I did check a few times on the search results to make sure I got what I expected. I'll have to keep an eye on it.

Sat, 2006-11-11 10:26 — Knut Handsome

Well, I'm going to uninstall

Well, I'm going to uninstall it, then do a fresh install and try again - the current one was installed over a previous version. In the meantime I'm looking at your list (thanks), particularly Yahoo (never got on terribly well with Google), which seems to be clever enough for me not to have to register particular file types as text.

ps. off topic: when I login to post a reply I get a 'page not found' error.

Sequence is:
Hive New > Go to thread > Login to post comment > Login > 'page not found'.
May be unrelated, but I notice a time component in the URL.

Sat, 2006-11-11 18:15 — john

page not found error (was: Copernic)

"Login to post comment > Login > 'page not found'"

Thanks for reporting that. I've seen it before. So far it hasn't seemed like too serious an issue, and I'm just hoping that with a future update to Drupal, it will get fixed.

Mon, 2006-10-23 18:19 — tamhas

Why desktop search?

If you are doing analysis of ABL code, why aren't you loading the results of Proparse and Callgraph into a database to do your searching?

Wed, 2006-10-25 06:24 — john

I am!

I guess there are two answers to that question.

1. If you think about it, I am. Proparse feeds ProRefactor, ProRefactor feeds Callgraph, Callgraph generates the HTML, and the HTML is loaded into a database.

2. I haven't built an xref engine.

Given the common use of dynamic queries and other such techniques, an XREF engine would be incomplete without more data and control flow information. I know how to get there from here, but it's going to take time and money.

And believe me: As much as the contract work is nice to have, I'd much rather just press a button to report the data flow details I need, rather than go through all these hours of drudgery.

Wed, 2006-10-25 23:09 — tamhas

DB of what

I guess I am thinking that a DB built a more direct output than the HTML could be more detailed and that would make for more powerful searching. I recognize the problem of unresolvable aspects, but they are a problem regardless.

Wed, 2006-10-18 20:38 — john

Yahoo! Desktop! Search!

As much as I prefer the UI in Copernic, I've! Switched! To! Yahoo! Desktop! Search! (i.e. X1) The reason is simple: I absolutely have to be able to export the result list, so that I can use it as the basis for tracking and reporting on my analysis work.

From Y!DS I can select the columns I want to export (path and file name), and with a quick bit of fancy footwork in vi, change that list into an HTML document with hyperlinks to the appropriate output files from Callgraph. As far as I can tell, X1 has as many (or more) features than Copernic. The UI just isn't nearly as nice. It would be great to learn that there's a way to export search results from Copernic, but I haven't found any such thing yet.

I'm afraid that Google DS is out. Here are my notes: No option to sort by directory. No option to specify search directory. Limited control over indexing. Simplistic web UI. Hard to tell if a directory is fully indexed - if status of "Up-to-date" means that it's fully indexed, then Google Desktop missed things that Copernic Desktop did not miss.

I've also stopped using Regain. I like the fact that it's open source, and based on Apache Lucene, but: Only 10 results per page. No sort options. Missed a result in an "index.html", where the result was the displayed text in a hyperlink tag, I don't know why.

Wed, 2006-10-18 01:24 — mollyfud

Google Desktop

Google Desktop is unbelievable and I wouldn't worry about the alledged "Privacy Problems" as a) you only pass information to Google if you turn it on b) Lets assume there is some sort of information they are getting, what do you think that they are going to do with it? C) aren't you giving them all your information anyway via search?

JMTC
Molly

Wed, 2006-10-18 16:30 — john

Hi Molly! Thanks for your

Hi Molly! Thanks for your feedback. I'm trying out Google Desktop now, based on your recommendations. So far, so good. By default it indexes all fixed drives, so that is taking forever on my machine. :)

I'm still finding that Copernic is a better solution. For example, if I want to restrict my search to one particular source code directory tree, there is an easy and obvious way to do that with Copernic's UI. After reading Google Desktop's help files, I don't see any way to do it with Google. Also, the good old Google web UI is usable, but it's certainly not as slick as Copernic's UI for browsing results.

Restricting the search wasn't an issue for me for Regain, because I only told it to index the source code directory structure I was interested in. (Or, rather, I told it to just index the HTML representation of the sourcecode that I generated with Callgraph.)

This report (PDF download) is a year old, but you might find it interesting. I just stumbled across it, while Googling for something else: Benchmark Study of Desktop Search Tools.

The one thing that is still bothering me about Copernic is the apparent lack of reporting or export of search results.

Thu, 2006-10-19 00:42 — mollyfud

Interesting

No Worries. I must try out Copernic again soon. I tried it in the past but I think I stopped (From poor memory) as its searching of Outlook wasn't as good as other searchs that I tried.

Some of the things you point to as being missing in GD definitely make me want to check out Copernic though!
TIA
Molly

The OpenEdge Hive

More Navigation