Click on a theme or a project in the table below for more information.
Project leader:
Prof.dr. Martin Kersten (CWI)
Consortium:
CWI
Industrial partners (non-exhaustive):
RUG (Groningen), Kapteyn Instituut, JHU (USA), SDSS team
Total FTE: 3.4fte (heads: 2 faculty: 1 PhD, 1 PD)
|
Project IS4/5: Cracking a Scientific Database
Scientific data management challenges the capabilities offered by
database systems, both in terms to deal with petabyte data volumes
and to facilitate scalable and complex query interaction in a
distributed setting. In this project we plan to study the
architectural consequences of continuous meta-data reorganization
decisions and multi-step, adaptive query processing. Reorganization
will be an integral part of the query evaluation-process. Every
query is first analyzed for its contribution to "crack" the
database into multiple pieces, such that both the required subset
is easily retrieved and subsequent queries may benefit from the new
partitioning structure. A similar argumentation exists for query
processing over a scientific database. The project creates an
experimental setting to develop and evaluate novel database
techniques to aid scientific discoveries in astronomy.
Sloan Digital SkyServer
This subproject aims to create a mirror site for
the SDSS. A single closed-source solution is known, but also
indicates the sizeable challenges to support the requirements of
this scientific database. The database definition alone is 200
pages SQL, the database contains >200M objects, and over a million
queries are handled every month. The modern open-source database
management system MonetDB is considered the prime candidate to act
as an experimentation platform. It enables experimentation at all
levels of a DBMS architecture, including data structures, query
optimization, and distributed storage management.
Streaming scientific workflow
This subproject concentrates on distributed
(and hence scalable) techniques to manage both the update load and
the scientific discovery algorithms. Large (radio) telescopes
generate a myriad of data, which requires cleaning, calibration,
and analysis in an e-Science grid setting. The outcome of this
process is a multi-gigabyte daily stream of events records, to be
archived for a long period. In passing, high-priority real-time
analysis is required to track phenomena of interest. A strong focus
is development of streaming database technology.
Industrial cooperation & LOFAR
We cooperate with astronomers of the LOFAR project on
database support for detecting transients (Kapteyn Instituut RUG,
Anton Pannekoek Institute, UvA), and database research labs
(Microsoft Bay-area Research Centre) to understand the
domain-specific issues and experiences gained in the sole working
version of SDSS.
Highlights
As this project is part of the third phase of the BRICKS program
(financed through the second open round July 2006), challenges
rather than results are presented. Likewise, no BRICKS key
publications are currently available.
Research challenges
Scientific databases have been recognized as one of the most
challenging areas of database research. It requires a fundamental
assessment and renovation at all levels of a database system
architecture, including the following:
- Lightweight database compression techniques to reduce the massive storage requirements.
- Optimization techniques geared at mathematical analysis of event streams.
- Approximate query-processing algorithms against inherently incomplete and noisy data.
- Distributed database techniques to scale to the size and the number of sites involved.
- A sound architecture of a data stream engine to cope with the large volumes and real-time analysis required.
Economic & societal impact
Many of the research results are the result of intensive and
fruitful collaborations with industrial parties, which use the
results to enhance their competitive edge. In this way, the results
have a significant impact on the society.
The MonetDB platform is
concurrently developed in the Bsik program MultimediaN, where the
emphasis is on multimedia search, Philips research lab to empower
their ambient home environment, Regie voor Geo Informatiesystem,
aimed at improved GIS applications, and with the Dutch Forensics
Institute to simplify digital forensics.
The results of this project are made available through a real-life
mirror of the SDSS. This way we provide a bridge between astronomy
research worldwide and database research. It also provides the
basis for quality assurance and empirical research in scientific
database application scenarios.
Future work 2007-2009
This project has started late 2006. The functional prototype of
MonetDB/Sky server has been created and is expected to go live 2nd
quarter of 2007.
IS4/5 Researchers funded by BRICKS
- M.Sc. Erietta Liarou (CWI)
- Dr. Milena Ivanova (CWI)
- Drs. Romulo Pereira Goncalves (CWI)
Other researchers involved
- Prof.dr. M. Kersten (CWI)
- Prof.dr. R. van Liere (CWI)
- Dr. N. Nes (CWI)
- Drs. E. Liarou (CWI)
- Dr. R. Nijboer (ASTRON/LOFAR)
- Dr. O. Smirnov (ASTRON/LOFAR)
- G. van Diepen (ASTRON/LOFAR)
For more information, please refer to the publications and posters of this project.
|