A big data management platform for rapidly changing environments

P. Zahedi, Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology (ST)

    Research output: ThesisEngD Thesis

    4 Downloads (Pure)

    Abstract

    Big data is now a reality. Storing, managing, and analyzing very large amount of data is a common challenge in the world of technology where digital content is rapidly growing. In recent years, FEI advanced electron microscopes, with their unsurpassed magnification and resolving power brought an evolution to the microscopy industry, however, generating large volume data sets urged FEI to search for Big data management solutions. As a result, FEI developed PALLAS (Platform for ALL Analysis and Storage), an integrated framework to manage, process and visualize microscopy data. In order to cover drawbacks of PALLAS with respect to big data management, this assignment was initiated. This project focuses on investigating, designing, and prototyping a solution to cover metadata variety with respect to the following aspects: · Heterogeneous data types · Changing data models · Various data sources This report presents an extensible framework to absorb new data types to PALLAS, regardless of their data models. Moreover, this framework provides an extension point for the plug-ins to access various databases and other data sources. To absorb various data types in PALLAS, this framework dealt with two main challenges: accessing metadata of various data types, and storing them in a way that is optimal to search. The former was satisfied by introducing the concept of data type reader/ writer, defining its template, and elegating the task of providing it to those who intend to use it. This approach gives the chance of adding user-defined data types to PALLAS. The second challenge was addressed by selecting a database with flexible schema to accommodate the diversity in data type structures. mongoDB was selected from 10 candidates which were evaluated against quality attributes and nontechnical requirements of the framework. To handle the changing nature of data models of the microscope and image processing applications, and to avoid maintenance costs, the framework is made independent from changing parts. This goal was achieved by categorizing the microscopy metadata into generic and specific groups. The common metadata in all microscopy image files (i.e. generic metadata) is defined in the system and is easily accessible by the users. The changing points, the metadata which are specific for a group of microscopes (i.e. specific metadata), are unknown to the framework. The system provides a structure to store and retrieve this group of metadata, however, at search time, the burden of knowledge about their key names is assigned to the user. Another contribution of this framework is in providing a well-defined extension point for PALLAS to connect to external data sources (databases). In a generalenough design, it provides a consistent solution, to query various data sources in a uniform approach. In the current version, we provided two plug-ins to connect to PDB (Protein Data Bank) and OMERO (Open Microscopy Environment Remote Objects) database respectively.
    Original languageEnglish
    Awarding Institution
    Supervisors/Advisors
    • Jarnikov, Dmitri S., Supervisor
    • Schoenmakers, Remco, External supervisor, External person
    Award date1 Oct 2014
    Place of PublicationEindhoven
    Publisher
    Print ISBNs978-90-444-1320-5
    Publication statusPublished - 1 Oct 2014

    Bibliographical note

    Eindverslag. - Confidential until 01-10-2016

    Fingerprint

    Dive into the research topics of 'A big data management platform for rapidly changing environments'. Together they form a unique fingerprint.

    Cite this