Archive for 2013

Q-Able nominated for the Young Technology Award

Friday, December 20th, 2013, posted by Djoerd Hiemstra

Vote now for Q-Able

Merry Christmas

Friday, December 20th, 2013, posted by Djoerd Hiemstra

Database Group

Merry Christmas from the Database Group!

What is information?

Friday, December 20th, 2013, posted by Djoerd Hiemstra

Met computers kun je informatie opslaan en versturen, maar wat is informatie eigenlijk? Hoeveel informatie staat er in een boek van 100 pagina’s? En welke boekenserie bevat meer informatie: “De wereld van Darren Shan” of “De griezelbus van Paul van Loon”? Hoe meet je dat?

Computers are used to store and send information, but what is information anyway? How much information does a book of 100 pages contains? What book series contain more information: “The Saga of Darren Shan” or “The Horror Bus of Paul van Loon”? How to measure this?

This lecture for the Museum Jeugduniversiteit for children aged 8 to 12 is based on the wonderful Computer Science Unplugged activities by Tim Bell, Ian Witten and Mike Fellows. In the lecture I explain the theories of Claude Shannon, talk about statistical language models, and we play the Twenty Guesses quiz.

Celebrating Stephen Robertson’s Retirement

Friday, December 20th, 2013, posted by Djoerd Hiemstra

by Djoerd Hiemstra, John Tait, Andrew MacFarlane, and Nick Belkin

Stephen Robertson at SIGIR 2013 Stephen Robertson was named fellow of the Association for Computing Machinery (ACM) last week. Robertson retired from the Microsoft Research Lab in Cambridge this year after a long career as one of the most influential, well liked and eminent researchers in Information Retrieval throughout the world. His successful career was celibrated in the latest BCS IRSG Informer. Stephen Robertson continues to be active in Information Retrieval in his retirement at University College London.

[download pdf]

Kien Tjin-Kam-Jet defends PhD thesis on Distributed Deep Web Search

Thursday, December 19th, 2013, posted by Djoerd Hiemstra

Distributed Deep Web Search

by Kien Tjin-Kam-Jet

The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in which they are less capable of doing so. Specifically, there is a noticeable shortcoming in situations that involve the retrieval of data from the deep web. Deep web data is difficult to crawl and index for today’s web search engines, and this is largely due to the fact that the data must be accessed via complex web forms. However, deep web data can be highly relevant to the information-need of the end-user. This thesis overviews the problems, solutions, and paradigms for deep web search. Moreover, it proposes a new paradigm to overcome the apparent limitations in the current state of deep web search, and makes the following scientific contributions:

  1. A more specific classification scheme for deep web search systems, to better illustrate the differences and variation between these systems.
  2. Virtual surfacing, a new, and in our opinion better, deep web search paradigm which tries to combine the benefits of the two already existing paradigms, surfacing and virtual integration, and which also raises new research opportunities.
  3. A stack decoding approach which combines rules and statistical usage information for interpreting the end-user’s free-text query, and to subsequently derive filled-out web forms based on that interpretation.
  4. A practical comparison of the developed approach against a well-established text-processing toolkit.
  5. Empirical evidence that, for a single site, end-users would rather use the proposed free-text search interface instead of a complex web form.

Analysis of data obtained from user studies shows that the stack decoding approach works as well as, or better than, today’s top-performing alternatives.

[download pdf]

Marije de Heus graduates on Recommender Systems for High School Courses

Wednesday, December 18th, 2013, posted by Djoerd Hiemstra

Design and Evaluation of a Recommender System for High School Courses in The Netherlands

by Marije de Heus

This thesis presents a newly developed recommender system for recommending high school courses in The Netherlands. The recommender system recommends a complete set of courses to a student, based on the choices of similar students that have already completed high school. A large historical database containing information of more than 20% of all new Dutch high school students was used for this recommender. The methodologies used are a structured literature review, interviews for requirements, design of the system and offline (with a historical dataset containing grades from tens of thousands students) and online (with on-site experiments at 4 high schools) experiments. The main findings of this report are the following:

  • There is a definite need for an objective recommendation of high school courses by students and school counselors;
  • The recommendations are not accurate;
  • The recommendations received good reviews in the online experiment;
  • The recommendations did not outperform the random recommendation in the online experiment;
  • A serendipitous result: the offline tests have shown that recommenders can predict future exam grades with high accuracy.

Our recommendation to Topicus, based on these findings, is not to implement the recommender system. Instead, a broader search could be started, to find other possible solutions for the need for objective recommendations. One technique that could be explored further, is the prediction of grades for single courses. We expect that school counselors will find such a tool helpful in advicing students which courses to take.

[download pdf]

Adele Lu Jia defends her PhD thesis on incentives in p2p networks

Wednesday, October 30th, 2013, posted by Djoerd Hiemstra

Adele Lu Jia successfully defended her PhD thesis at Delft University of Technology,

Online Networks as Societies: User Behaviors and Contribution Incentives

by Adele Lu Jia

Online networks like Facebook and BitTorrent have become popular and powerful infrastructures for users to communicate, to interact, and to share social lives with each other. These networks often rely on the cooperation and the contribution of their users. Nevertheless, users in online networks are often found to be selfish, lazy, or even ma- licious, rather than cooperative, and therefore need to be incentivized for contributions. To date, great effort has been put into designing effective contribution incentive policies, which range from barter schemes to monetary schemes. In this thesis, we conduct an analysis of user behaviors and contribution incentives in online networks. We approach online networks as both computer systems and societies, hoping that this approach will, on the one hand, motivate computer scientists to think about the similarities between their artificial computer systems and the natural world, and on the other hand, help people outside the field understand online networks more smoothly.

To summarize, in this thesis we provide theoretical and practical insights into the correlation between user behaviors and contribution incentives in online networks. We demonstrate user behaviors and their consequences at both the system and the individual level, we analyze barter schemes and their limitations in incentivizing users to contribute, we evaluate monetary schemes and their risks in causing the collapse of the entire system, and we examine user interactions and their implications in inferring user relationships. Above all, unlike the offline human society that has evolved for thousands of years, online networks only emerged two decades ago and are still in a primitive state. Yet with their ever-improving technologies we have already obtained many exciting results. This points the way to a promising future for the study of online networks, not only in analyzing online behaviors, but also in cross reference with offline societies.

[more info]

Sabbatical at Q-Able

Monday, September 2nd, 2013, posted by Djoerd Hiemstra

Starting today, I am on sabbatical at Q-Able, an exciting new internet startup and spinoff of the University of Twente. Q-Able will bring new search capabilities to internet web shops, hotel and travel booking sites, online banking, etc. by replacing multi-field web forms by free text querying. Instead of meticulously filling in one field at a time of a web form, users of your web site get a simple, single search field. Q-Able’s solutions provide a better user experience for the visitors of web sites, and it gives the company running the web site the opportunity to find out what their customers really want (you’d be surprised of the things people will enter in single search fields).

More information shortly at: q-able.com.

Maarten Fokkinga retires

Friday, August 30th, 2013, posted by Djoerd Hiemstra

Today, Maarten Fokkinga retires after a scientific career of more than 40 years. Maarten is well-kown for his work on functional programming and category theory. Some of his well-known and well-cited works include: Functional programming with bananas, lenses, envelopes and barbed wire with Eric Meijer and Ross Paterson, Law and Order in Algorithmics, his Ph.D thesis, and Monadic Maps and Folds for Arbitrary Datatypes (yes, those are maps and reduces!)

To celebrate Maarten’s long successful career, Jan Kuper and I wrote recipes for curried bananas and pasta, appropriately formalized in Haskell, so Maarten can both cook and enjoy programming after his retirement. Download the recipes from Github.

Frans van der Sluis defends his PhD thesis on Information Experience

Friday, August 30th, 2013, posted by Djoerd Hiemstra

When Complexity becomes Interesting: An Inquiry into the Information eXperience

by Frans van der Sluis

To date, most research in information retrieval and related fields has been concerned primarily with efficiency and effectiveness of either the information system or the interaction of the user with the information system. At the same time, understanding the experience of a user during information interaction is recognized as a grand challenge for the development of information systems. There is a widely shared intuition that the value of the retrieved information is dependent on more than system characteristics such as the topical overlap between a query and a document. As it is not obvious how to embrace this intuition, this challenge has mostly been left ignored. This dissertation embarked upon the challenge of describing and developing an operational model of the Information eXperience (IX) – the experience during the interaction with information. This task was decomposed into three sub-challenges:

  1. Transform the fuzzy concept of the IX into a formalized one.
  2. Develop a model of textual complexity that enables an information system to influence a user’s IX.
  3. Identify and influence the causes of the experience of interest in text.