7th SIKS/Twente Seminar on Searching and Ranking

Distributing Search


The goal of this seminar is to bring together researchers from companies and academia working on the effectiveness of search engines. Invited speakers are:

The symposium will take place at the campus of the University of Twente at the Ravelijn (building 10), lecture hall 2504.
The event is part of the advanced components stage of the SIKS educational program. PhD-students working in the field of Web based Systems and Data Management, Storage and Retrieval are encouraged to participate.


10:30 Coffee and Welcome
Selective Search of Distributed Indexes

When an environment contains many search engines, a resource selection algorithm identifies a small number of engines that are most likely to return good results for the query. Typically resource selection algorithms are used to organize a group of independent search engines into a federated or aggregated search service. Our research adapts this approach to search large indexes more efficiently. Topic-based partitioning divides a large index into topically-coherent shards. A resource selection algorithm directs each query to (just) the shards most likely to return good documents. Experiments demonstrate that the method improves efficiency without hurting accuracy, and suggest new research directions.

Jamie Callan (Carnegie Mellon University)
Tribler: 4th generation peer-to-peer technology

During this talk the first prototype will be unveiled of our attack-resilient QMedia app for microblogging. QMedia goal for future versions is news dissemination from a single smartphone to an audience of millions in the form of microblogging, enriched with pictures and streaming video which is guarded against all known forms of government censorship such as cyberspace sabotage, digital eavesdropping, infiltration, fraud, Internet kill switches and especially lawyer-based attacks. We hope new Open Source developers will join our Internet-deployed project and help realize our QMedia goal for the end of 2012: building next-generation anonymity technology, founded on social networking, traffic hiding and a global reputation system.

For over a decade Delft University of Technology has been measuring and building P2P systems, aided by millions of Euros in research funding from the European Union and Dutch government. We are continuously improving our own attack-resilient sharing software called Tribler. With one million downloads, Tribler provides us with vital behavioral feedback of novel algorithms. Tribler is not dependent and completely decoupled from unreliable servers such as DNS servers, web servers, swarm trackers and access portals. Using fully self-organising P2P technology we aim to create an overlay which is unbreakable: the only way to take it down is to take the Internet down. We dream of transforming media and money with five innovations we have developed within Tribler: (1) The Libswift P2P engine, (2) Dispersy elastic database, (3) Bartercast reputation system, (4) bandwidth-as-a-currency resource based cybercurrency, and (5) the Skynet V0.1 self-organizing and self-learning Artificial Intelligence engine (joined work University of Szeged) with a very limited form of self-awareness.

Johan Pouwelse (Delft University of Technology)
12:30 Break and on-site lunch
Unsupervised Linear Score Normalization: Intuition, Assumptions and Performance

Many score normalization techniques have been proposed in the literature varying from linear to non-linear functions which may require or not training data for their estimation. In this talk we give a fresh look into the simplest normalization methods: linear functions that do not require any training or search engine cooperation. We give theoretical justifications on the implicit assumptions of the methods, present a number of modifications and compare their performance. We also discuss our current research directions for improving unsupervised linear score normalization techniques.

Fabio Crestani (University of Lugano)
13:45 Closing
Peer-to-Peer Information Retrieval

The Internet has become an integral part of our daily lives. However, the essential task of finding information is dominated by a handful of large centralised search engines. In this thesis we study an alternative to this approach. Instead of using large data centres, we propose using the machines that we all use every day: our desktop, laptop and tablet computers, to build a peer-to-peer web search engine. We provide a definition of the associated research field: peer-to-peer information retrieval. We examine what separates it from related fields, give an overview of the work done so far and provide an economic perspective on peer-to-peer search. Furthermore, we introduce our own architecture for peer-to-peer search systems, inspired by BitTorrent.

Distributing the task of providing search results for queries introduces the problem of query routing: a query needs to be send to a peer that can provide relevant search results. We investigate how the content of peers can be represented so that queries can be directed to the best ones in terms of relevance. While cooperative peers can provide their own representation, the content of uncooperative peers can be accessed only through a search interface and thus they can not actively provide a description of themselves. We look into representing these uncooperative peers by probing their search interface to construct a representation. Finally, the capacity of the machines in peer-to-peer networks differs considerably making it challenging to provide search results quickly. To address this, we present an approach where copies of search results for previous queries are retained at peers and used to serve future requests and show participation can be incentivised using reputations.

There are still problems to be solved before a real-world peer-to-peer web search engine can be build. This thesis provides a starting point for this ambitious goal and also provides a solid basis for reasoning about peer-to-peer information retrieval systems in general.

PhD Defense of Almer Tigelaar


CTITCentre for Telematics and Information Technology
SIKSNetherlands research school for Information and Knowledge Systems
NWONetherlands Organisation for Scientific Research (NWO)


