Text Analysis Workshop

Department of Homeland Security
Advanced Scientific Computing Program
Text Analysis Workshop
May 25-26, 2005
Hilton Alexandria Old Town
Alexandria, VA

Computer software provides various means of searching, sorting, and navigating through sets of text documents. The terrorist threat challenge is for a computer to sift through a large amount of text data, and provide a human with accurate and relevant potential threat scenarios supported by relevant subsets of documents.

There is a vast collection of information captured in human language over the past several centuries. The human ability to process language is relatively slow, perhaps a few hundred pages of information per day. In the current environment a human may be called upon to quickly assess a potential terrorist threat based on thousands of pages of information. These requests are beyond what humans can reasonably do, and therefore limit the quality and effectiveness of decisions made.

Computers and software have the capability to rapidly process enormous volumes of data, with the capacity to search and retrieve millions of documents in seconds. The natural conclusion to helping humans read and understand volumes of threat data is to use computers. However, computers do not understand human language. This is a long term problem where steady, incremental progress is being made. Each new computer science development adds to better approaches to addressing this problem. The focus of this workshop is to explore the issues associated with how computers can be effectively used to help humans process very large amounts of textual information.

