Entity Extraction Class

Semantic Technology Class, August 2011

This tutorial is an overview of Entity Extraction for the non-programmer. Entity extraction is rapidly becoming one of the most cost-effective ways to extract knowledge from documents. The first session is a general introduction including introduction to terminology with short demonstrations of entity extraction using real-world software on real documents. We cover the key processes Entity Extraction and the link them to concepts in the Semantic Web stack including XML encoding, URIs, RDF and RDFa.

In part two we begin to uncover some of the real challenges involved in high-quality entity extraction and the tools. We cover part-of-speech analysis, noun phrases and entity extraction pipelines, validation of entities and a short jump into some of the complex mathematics used in machine learning of large corpus. In Part three we cover ROI, quality metrics, data formats, mashups and strategies. We also show how Business Intelligence systems can integrate text analytics.

After this presentation users will:

  1. Understand the key issues in Entity Extraction of text documents
  2. Understand the concept of linguistic pipelines and annotators and text enrichment
  3. Appreciate the complexities of natural language understanding and how both rule based systems and machine learning can be used to tackle these challenges
  4. Understand how Entity Extraction quality can be measured and how quality metrics can be used to tune the Entity Extraction process
  5. Find resources to start and Entity Extraction pilot project

PPTX Slides