Search This Issue


Home Site
 

   Page 19
Sample Issue - Subscribe TODAY to see the entire issue
18.3 Table of ContentsBottom of PageSite Map

data, e.g. for forensics search purposes (see below).
     After indexing, full-text search speed, even across millions of documents, is typically less than a second. While indexing a very large collection of documents for the first time may be time consuming, subsequent updates of the index are usually much faster. dtSearch, for example, simply checks the file modification dates of all indexed files, and only reindexes those files that have been added, deleted or changed since the last index update. (While the text retrieval terminology here relies on the dtSearch product line, the concepts in this article are generally applicable.)
     In addition to enabling precision boolean searching, an index can also store such information as word positions, enabling word or phrase proximity searching. An index can also hold information about word frequency and distribution, enabling computation of natural language relevancy rankings across a document collection. If the company name appears in two million documents, it would get a low relevancy ranking. If the latest marketing terminology appears in only four documents, it would get a much higher relevancy rank. In that way, PR could,

 

for example, enter a whole paragraph of proposed text for a press release as a natural language search, and zoom right in on the most relevant documents.

     But full-text searching, whether boolean, natural language, or otherwise, is only part of the text retrieval answer. Suppose HR wants to limit its search to documents with an HR executive designation. This type of fielded data classification can result from fields or meta data inside a document, or from an overlaying document management-type application. With the latter, fielded data classification can rely on associated database entries, such as SQL or XML, or the addition of fields "on the fly" during the indexing process.

Adding in Security Classifications
     Now suppose the goal is to enable searching organization-wide, but to keep the wrong documents out of the wrong hands. For example, suppose documents that bear certain


(Text Continued on pg. 21)

To Page 18

18.3 Table of Contents
Top of Page

To Page 24


18.3
19

PC AI Magazine - -
e-mail: PCAI.TH @ gmail.com - Comments: webmaster @ pcai.com