SIKS/Twente Data Science Colloquium, 20 April 2015
Data Science

Meet Data Science

Scientific and economic progress is increasingly powered by our capabilities to explore big datasets. Data is the driving force behind the successful innovation of Internet companies like Google, Twitter, and Yahoo. The need for data scientists is apparent in almost every sector of our society, including business, health care, and education.

SIKS/Twente
Data Science Colloquium

20 April 2015, University of Twente

Twente Data Science is a collaboration between research groups of the University of Twente to research, promote and facilitate big data analysis for all scientific disciplines. We operate by sharing expertise, ideas, our research infrastructure for big data analysis and - of course - by sharing our data. The Data Science colloquia are kindly sponsored by the Netherlands Research School for Information and Knowledge Systems (SIKS) and part of the SIKS educational program. SIKS PhD students are strongly encouraged to participate.

download opening slides

Program

13:00   Welcome / Coffee

13:30   Opening

13:45   Jan Willem Tulp (Tulp Interactive): Designing Data Experiences

14:45   Piet Daas (CBS): Big data @ CBS

15:45   Coffee

16:00   Rolf de By and Raul Zurita Milla (ITC): Rich Remote Sensing...

17:00   Closing / Drinks

Registration

Please, sign up for this event here.

Venue

University of Twente: DesignLab,
Building The Gallery, Hengelosestraat 500, Enschede, The Netherlands.

Abstracts


Big data @ CBS

by Piet Daas (CBS)

More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term 'Big Data'. Because of the fact that these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. However, first experiences obtained with analyses of large amounts of Dutch traffic loop detection records, call detail records of mobile phones and Dutch social media messages reveal that a number of challenges need to be addressed to enable the application of these data sources for official statistics. These and the lesson learned during these initial studies will be addressed and illustrated with examples.

Piet J. H. Daas is methodologist and big data research coordinator of the Dutch Central Bureau of Statistics (Centraal Bureau Statistiek, CBS).

download slides


Designing data experiences

by Jan Willem Tulp (Tulp Interactive)

What makes a good data visualization? There are many types of visualizations and many ways to design one. Using his own work as examples, Jan Willem Tulp will illustrate a simple framework that helps you to analyze and to design better visualizations.

Jan Willem Tulp is creator of interactive data visualisations for magazines like Scientific American and Popular Science, as well as companies, for instance the Tax Free Retail Analysis Tool for Schiphol Amsterdam Airport.


STARS: Rich Remote Sensing Data for Tropical Smallholder Farming

by Rolf A. de By & Raul Zurita-Milla (ITC)

We describe the STARS project, which can be seen as an attempt to collect a rich, higher dimensional image dataset over smallholder farm plots in sub-Saharan Africa and South Asia, with the aim to learn how such data can serve applications that monitor crop growth. The eventual beneficiaries are manifold: individual farmers, farming communities and cooperatives, the farm input private sector, farm extension services, and also national agricultural organizations responsible for the food security agenda.
The image dataset is higher dimensional because next to the native image two-dimensional nature, our images are multi-spectral (record information in 3 to 8 spectral bands). Also because our images have different spatial resolutions, and they are obtained at different moments in the growing season. This is because we use multiple cameras and platforms (handheld, UAV and satellite) so that we can study in detail multiple crop types, under multiple farm management practices. We also intensively collect traditional/agronomic ground data through field measurements.
In the second half of the presentation, we discuss our plans to process the data: from initial preparatory steps, to statistical aggregation, and materialization in a crop spectro-temporal library. We discuss the different modes of performing deeper analyses over this data which we expect will allow classification of crops, crop status (as in crop health), and possibly also yield.
The computational techniques that we will use include running and inverting models (e.g. atmospheric models to create surface reflectance products from our images) as well as applying diverse image processing (e.g. segmentation to identify plots) and machine learning (e.g. random forests to classify crops) methods to our rich image dataset to extract farmer-relevant information.

Rolf de By and Raul Zurita Milla are professor at the ITC Geo-Information Science and Earth Observation. Their work is supported by the Bill and Melinda Gates Foundation.

download slides


The Data Science Colloquium is organized by
Robin Aly, Yuri Engelhardt, and Djoerd Hiemstra.

IGS
In cooperation with
#DATALAB
IGS DataLab