|
News: We are presenting QuoteMine, our search engine for quotes, at the LNdW (Lange Nacht der Wissenschaften) : Demonstrator
News: Paper accepted at the Knowledge Extraction Workshop at NAACL-HLT 2012 : Workshop Homepage
I am a research associate at DIMA, Technical University of Berlin
in the SCAPE project.
My foci are scalable Information Extraction technologies, with the ultimate goal of mining large amounts of knowledge from large bodies of text.
I received my masters degree at the Freie Universität Berlin in 2009, where I designed and implemented an Open Information Extraction system using deep linguistic patterns.
Prior to joining the DIMA-team, I worked at Neofonie GmbH as a research engineer and team leader, working in the fields of Information Retrieval and Extraction.
My research interests include - but are not limited to - Natural Language Processing, Computational Linguistics and Machine Learning technologies.
Publications
Alan Akbik, Alexander Löser
KrakeN: N-Ary Facts in Open Information Extraction
The Knowledge Extraction Workshop at NAACL-HLT, 2012.
[pdf] [abstract]
Current techniques for Open Information Extraction (OIE) focus on the extraction of binary facts and suffer significant quality loss for the task of extracting higher order N-ary facts. This quality loss may not only affect the correctness, but also the completeness of an extracted fact. We present KrakeN, an OIE system specifically designed to capture N-ary facts, as well as the results of an experimental study on extracting facts from Web text in which we examine the issue of fact completeness. Our preliminary experiments indicate that KrakeN is a high precision OIE approach that captures more facts per sentence at greater completeness than existing OIE approaches, but is vulnerable to noisy and ungrammatical text.
Alan Akbik, Jürgen Broß
Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns
Workshop on Semantic Search, WWW 2009.
[pdf] [video]
[abstract]
A great share of applications in modern information technology
can benefit from large coverage, machine accessible
knowledge bases. However, the bigger part of todays knowledge
is provided in the form of unstructured data, mostly
plain text. As an initial step to exploit such data, we present
Wanderlust, an algorithm that automatically extracts semantic
relations from natural language text. The procedure
uses deep linguistic patterns that are defined over the dependency
grammar of sentences. Due to its linguistic nature,
the method performs in an unsupervised fashion and is not
restricted to any specific type of semantic relation. The applicability
of the proposed approach is examined in a case
study, in which it is put to the task of generating a semantic
wiki from the English Wikipedia corpus. We present an
exhaustive discussion about the insights obtained from this
particular case study including considerations about the generality
of the approach.
Teaching & Projects
My teaching centers around applying different Text Mining technologies to real world problems. Check out some of the interesting projects below that my students are working on.
Teaching - current semester
DBPRO: Theorie und Praxis von Websuchmaschinen
(engl. "Applied Information Retrieval and Extraction")
[expand]
In this semester's course we take a look at information retrieval and extraction technologies in the context of the MIA, GoOLAP and other projects.
IMPRO: Fortgeschrittene Theorie und Praxis von Websuchmaschinen
(engl. "Advanced Applied Information Retrieval and Extraction")
[expand]
In this semester's course we take a look at information retrieval and extraction technologies in the context of the MIA, GoOLAP and other projects.
IMSEM: Hot Topics in Information Management
[expand]
In this semester's seminar we take a look at knowledge representation and workflows.
Current projects
QuoteMine: Die Zitatsuchmaschine
(engl. "QuoteMine: A search engine for quotes") [link]
[expand]
In this project we search newswire text on the Web and extract quotes and speakers with shallow IE patterns. Our search engine can be used to answer questions like "What does Angela Merkel say about France?" and "Give me all quotes
that contain the word 'FC Bayern München'". Currently, we're working on scaling the system up and introducing some other new exciting features. A first prototype can be accessed here. But obviously, much work still needs to be done :)
Timeline Visualization of Web Document Content [link]
[expand]
In this project, we are working on extracting temporal events from document collections on the Web and visualising them in form of timelines. Our goal is to find interesting information for specific topics. The project
began last semester and is now being continued by three Master's students. A first prototype is available here.
Der Umformulierer Demo [link]
(engl. "Online Paraphrasing of German sentences")
[expand]
This demo shows a technique for paraphrasing German sentences by switching the sentence's word order. German is a free word order language, having only a limited set of rules that determine what word orderings are considered valid. We implemented these rules using a deep dependency parser and made this demo available. Check it out here.
Supervisor for Bachelor/Masters Theses
Umar Maqsut - Extraktion von Relationen und Konzepten von komplexen Nominalphrasen
(engl. "Extraction of Relations and Concepts from Complex Noun Phrases")
[expand]
Umar is extracting information from complex noun phrases and investigates when such phrases can be used as concepts in a knowledge base.
Stefan Schramm - Implementierung und Evaluierung eines Verfahrens zur Erhöhung der Qualität der flachen Extraktion komplexer Nominalphrasen
(engl. "Using World Knowledge to find Complex Noun Phrases in Shallow Parsing")
[expand]
Stefan investiages a number of 'world knowledge' features in a CRF classifier for finding complex noun phrases, something that can normally only be achieved using a deep syntactic parser. He trains a classifier
using different featuresets and evaluates the results.
Ahmet Karakas - Information Extraction von Zitaten in türkischsprachigen Quellen
(engl. "Quote-Extraction from Turkish newswire text")
[expand]
Ahmet will build a pipeline for extracting quotes and speakers from Turkish-language text. He will also conduct a survey of existing NLP resources for the Turkish language.
The results of his work will be integrated into the QuoteMine project at www.textmining.tu-berlin.de.
|