12th SIKS/Twente Seminar on Searching and Ranking

Probabilistic approaches to smart discovery


The goal of this seminar is to bring together researchers from academia and companies working on the development and evaluation of information systems, in particular retrieval, filtering, and recommendin g systems. Invited speakers are:

The symposium will take place at the campus of the University of Twente in building Ravelijn, room RA1314.
See Travel information. The event is part of the SIKS educational program. PhD-students working in the field of (interactive) information filtering, recommending, and retrieval are encouraged to participate.


Coffee and Welcome
Opening and Introduction
Djoerd Hiemstra (University of Twente, the Netherlands)
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the others said?
Ingo Frommholz (University of Bedfordshire, UK)

The task definition of the Social Book Search Lab describes complex goal-oriented and non-goal tasks. To satisfy the resulting information needs, the user can utilise and combine different sources of evidence, like, for instance, metadata (e.g. abstract, title, author) and reviews and ratings provided by the user. The challenge is to support the user in this endeavour to create an effective search experience. To this end, in this talk I will discuss how this challenge relates to the well-known principle of polyrepresentation. I will then introduce a probabilistic logic-based framework called POLAR, which is capable of handling complex queries based on the graph induced by user-generated content. Subsequently I will provide a brief outlook on further formal models that try to support the user beyond the typical query-and-result paradigm. The first one is based on quantum probabilities, neatly combining geometry and probability theory to support different forms of user interaction and polyrepresentation. The latter one combines polyrepresentation with probabilistic clustering and the idea of a simulated user.

Causal discovery from big data
Tom Heskes (Radboud University Nijmegen, the Netherlands)

Discovering causal relations from data lies at the heart of most scientific research today. In apparent contradiction with the adagio "correlation does not imply causation", recent theoretical insights indicate that such causal knowledge can also be derived from purely observational data, instead of only from controlled experimentation. In the "big data" era, such observational data is abundant and being able to actually derive causal relationships from very large data sets would open up a wealth of opportunities for improving business, science, government, and healthcare. In this talk, I will sketch how insights from statistics and machine learning may lead to novel approaches for robust discovery of relevant causal relationships.

Co-occurrence Rate Networks: Towards separate training for unidrected graphical models
PhD Defense (location: Waaier 4) by Zhemin Zhu (University of Twente)

Dependence is a universal phenomenon which can be observed everywhere. In machine learning, probabilistic graphical models (PGMs) represent dependence relations with graphs. PGMs find wide applications in natural language processing (NLP), speech processing, computer vision, biomedicine, information retrieval, etc. Many traditional models, such as hidden Markov models (HMMs), Kalman filters, can be put under the umbrella of PGMs. The central idea of PGMs is to decompose (factorize) a joint probability into a product of local factors. Learning, inference and storage can be conducted efficiently over the factorization representation.
Two major types of PGMs can be distinguished: (i) Bayesian networks (directed graphs), and (ii) Markov networks (undirected graphs). Bayesian networks represent directed dependence with directed edges. Local factors of Bayesian networks are conditional probabilities. Directed dependence, directed edges and conditional probabilities are all asymmetric notions. In contrast, Markov networks represent mutual dependence with undirected edges. Both of mutual dependence and undirected edges are symmetric notions. For general Markov networks, based on the Hammersley–Clifford theorem, local factors are positive functions over maximum cliques. These local factors are explained using intuitive notions like ‘compatibility’ or ‘affinity’. Specially, if a graph forms a clique tree, the joint probability can be reparameterized into a junction tree factorization.
In this thesis, we propose a novel framework motivated by the Minimum Shared Information Principle (MSIP): We try to find a factorization in which the information shared between factors is minimum. In other words, we try to make factors as independent as possible.
The benefit of doing this is that we can train factors separately without paying a lot of efforts to guarantee consistency between them. To achieve this goal, we develop a theoretical framework called co-occurrence rate networks (CRNs) to obtain such a factorization. Briefly, given a joint probability, the CRN factorization is obtained as follows. We first strip off singleton probabilities from the joint probability. The quantity left is called co-occurrence rate (CR). CR is a symmetric quantity which measures mutual dependence among variables involved. Then we further decompose the joint CR into smaller and indepen dent CRs. Finally, we obtain a CRN factorization whose factors consist of all singleton probabilities and CR factors. There exist two kinds of independencies between these factors: (i) a singleton probability is independent (Here independent means two factors do not share information.) of other singleton probabilities; (ii) a CR factor is independent of other CR factors conditioned by singleton probabilities. Based on a CRN factorization, we propose an efficient two-step separate training method: (i) in the first step, we train a separate model for each singleton probability; (ii) given singleton probabilities, we train a separate model for each CR factor. Experimental results on three important natural language processing tasks show that our separate training method is two orders of magnitude faster than conditional random fields, while achieving competitive quality (often better on the overall quality metric F1).
The second contribution of this thesis is applying PGMs to a real-world NLP application: open relation extraction (ORE). In open relation extraction, two entities in a sentence are given, and the goal is to automatically extract their relation expression. ORE is a core technique, especially in the age of big data, for transforming unstructured information into structured data. We propose our model SimpleIE for this task. The basic idea is to decompose an extraction pattern into a sequence of simplification operations (components). The benefit by doing this is that these components can be re-combined in a new way to generate new extraction patterns. Hence SimpleIE can represent and capture diverse extraction patterns. This model is essentially a sequence labeling model. Experimental results on three benchmark data sets show that SimpleIE boosts recall and F1 by at least 15% comparing with seven ORE systems.
As tangible outputs of this thesis, we contribute open source implementations of our research results as well as an annotated data set.


CTIT Centre for Telematics and Information Technology
Netherlands research school for Information and Knowledge Systems


Please send your name and affiliation to if you plan to attend the symposium, and help us estimate the required catering.