primary research interests
Information Retrieval, Semantic Web technologies, Artificial Intelligence (Planning, Natural Language Processing, Knowledge Representation, Agent collaboration etc. )
Research Projects
Hybrid Retrieval from the Unified Web
Developed a method of unifying the HTML web (web of documents) and the Semantic Web (web of data), and a hybrid query language (HQL) to retrieve data and documents from this Unified Web. The aim was to harness the Semantic Web data to improve document retrieval from the HTML web, and to exploit information found in HTML web pages to facilitate easy navigation of the Semantic Web. Implemented SITAR, an Apache Lucene (Java APIs for indexing and searching) based persistent memory system. This independent project is still in progress. ESWC '07 draft SAC '07 draft
Metadata for Timelining Events
This ongoing daytaOhio project funded by LexisNexis aims at generating entity-event timelines from News documents in XML (see a brief (toy) demo or screenshot of the full-scale version). The project also aims at generating labels for document clusters. Responsibilities include development of the high level concepts and the design of the system, closely guiding and supervising a Master’s student through the implementation process which includes assigning specific programming tasks, keeping the student motivated, and occasionally programming (when the student encounters a tough road block). Played a pivotal role in the preparation of the proposal. NLDB '07 draft
XML Search
Text search engines are inadequate for indexing and searching XML documents because they ignore metadata and aggregation structure implicit in the XML documents. On the other hand, the query languages supported by specialized XML search engines are very complex. We developed a simple yet flexible keyword-based XML query language which enables extraction of relevant fragments without sacrificing the ability to fall back on plain-text search. The approach combines and generalizes the existing techniques to obtain precise and coherent results. The system was implemented using Lucene. This is an independent project. ISMIS '06 draft JIIS draft
Semantic Search Tool
This project developed a modular approach to improving effectiveness of searching documents for information by reusing and integrating mature software components such as Lucene APIs, WORDNET, LSA techniques, and domain-specific controlled vocabulary. To evaluate the practical benefits, the prototype was used to query MEDLINE database, and to locate domain-specific controlled vocabulary terms in Materials and Process Specifications. Its extensibility was demonstrated by incorporating a spell-checker for the input query, and by structuring the retrieved output into hierarchical collections for quicker assimilation. It was also used to experimentally explore the relationship between LSA and document clustering using 20-mini-newsgroups and Reuters data. The work was funded by AFRL. WTAS '05 draft
Tables Processing
Explored the application and extension of XML technology for handling tables in heterogeneous documents. Specifically, analyzed the issue of embedding a formalization of a text document into the original document via annotations to obtain an XML Master document that improves traceability and enables using XSLT stylesheets for interpreting and manipulating tables using XSLT stylesheets. The work was funded by AFRL. IICAI '05 draft
GTrans (Mixed-Initiative Planning) (Gtrans Homepage)
GTrans is a distributed, mixed-initiative planning system. The underlying planner is PRODIGY, a non-linear state-space planner developed at CMU. Responsibilities included making major extensions to GTrans, developing PRODIGY domains, collaborating with Ball Aerospace and SAIC to integrate GTrans with Ball Aerospace's Knowledge Kinetics (K2) and SAIC's Consequences Assessment Tool Set (CATS). The tool was used as a test bed to study multi-agent collaboration especially in the emergency response domain. CTS '04 draft Tech report
Elizabrarian
ElizaBrarian is a Natural Language help system (research project) developed by the researchers at LexisNexis for the users of their website. Responsibilities included designing and developing a Natural Language FAQ matcher using Glisp. The program takes a natural language query from the user and retrieves the matching question from the FAQ repository. For example, the query, “How do I print only the text of a document?” would retrieve the FAQ “How do I print the text of a document to my attached printer without printing the graphics?” (and the corresponding answer) from the FAQ repository. Here is another example and a screenshot. For the curious: the name ElizaBrarian was created by combining ELIZA and Librarian.
See the publications page for slides and additional details on the papers.
Other Interests
Distributed Computing, Operating Systems, Data Mining, Evolutionary Computation, etc.