Collexis, a leading developer of semantic search and knowledge discovery software, develops applications that range from search tools for your website to highly sophisticated discovery applications utilized by many organizations worldwide. These applications allow the user to identify and search for documents, experts, trends, or new discoveries.
The Internet and intranet technologies have led to the availability of vast and increasing amounts of archived information. Much of this information is unstructured, making it difficult for users to access the information they need to make decisions. This conundrum is particularly evident to research professionals.
As professional researchers and others understand, the answers are rarely found in a single document. More often than not, researchers rely on multiple information sources for their research projects. Most search engines and retrieval methods are not capable of handling complex search requests. Their methods range from the simple retrieval of single items of information, like a document or an article, to the slightly advanced method, e.g. related documents or the experts who created them.
The unique Collexis approach makes significant improvements to standard data and information retrieval capabilities by discovering the relationships between the elements of different content sources and uncovering unique information. Additionally, Collexis can look at aggregate information from multiple content sources to create potentially new hypotheses based on large volumes of unstructured content.
Collexis is based on the principle of Fingerprinting. For humans, a fingerprint, although small, is a unique representation of a person. Collexis creates a Fingerprint of each piece of information (such as an MS-Word document, an e-mail, a PowerPoint presentation, a web site) using the knowledge residing in a thesaurus or multiple thesauri. A thesaurus is a specialized vocabulary (“repository of knowledge”) of a particular discipline such as medicine, law, or financial services. Collexis not only creates “Fingerprints” from all content, but also from the search information. This information can be a few words, a sentence, or a complete document.
This first level focuses on superior retrieval results. It avoids common search problems encountered in conventional search engines by “helping” the researcher yield more relevant results. This is done by using the existing results to suggest additional concepts to the search query, which in turn, reduces the number of irrelevant “hits.”
The Fingerprints can be manipulated in many ways (aggregated, associated, and clustered), enabling Collexis to make information available beyond the level of a single document. For example, discernible patterns in a group of documents written by one author can be found by aggregation and implemented for expert retrieval.
While the first two levels reveal information that already exists, this level uncovers new information . It deals with the most advanced information issues, such as hypothesis generation, intelligent text-blast applications, and other advanced scientific approaches, which can result in actual new knowledge. Typical application fields in knowledge discovery are drug discovery, unrevealed relationships, policymaking, trend analysis, and competitor analysis (gap mining, comparison).