A big data management platform for rapidly changing environments

P. Zahedi, Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology (ST)

    Research output: ThesisPd Eng ThesisAcademic

    Abstract

    Big data is now a reality. Storing, managing, and analyzing very large amount of data is a common challenge in the world of technology where digital content is rapidly growing. In recent years, FEI advanced electron microscopes, with their unsurpassed magnification and resolving power brought an evolution to the microscopy industry, however, generating large volume data sets urged FEI to search for Big data management solutions. As a result, FEI developed PALLAS (Platform for ALL Analysis and Storage), an integrated framework to manage, process and visualize microscopy data. In order to cover drawbacks of PALLAS with respect to big data management, this assignment was initiated. This project focuses on investigating, designing, and prototyping a solution to cover metadata variety with respect to the following aspects: · Heterogeneous data types · Changing data models · Various data sources This report presents an extensible framework to absorb new data types to PALLAS, regardless of their data models. Moreover, this framework provides an extension point for the plug-ins to access various databases and other data sources. To absorb various data types in PALLAS, this framework dealt with two main challenges: accessing metadata of various data types, and storing them in a way that is optimal to search. The former was satisfied by introducing the concept of data type reader/ writer, defining its template, and elegating the task of providing it to those who intend to use it. This approach gives the chance of adding user-defined data types to PALLAS. The second challenge was addressed by selecting a database with flexible schema to accommodate the diversity in data type structures. mongoDB was selected from 10 candidates which were evaluated against quality attributes and nontechnical requirements of the framework. To handle the changing nature of data models of the microscope and image processing applications, and to avoid maintenance costs, the framework is made independent from changing parts. This goal was achieved by categorizing the microscopy metadata into generic and specific groups. The common metadata in all microscopy image files (i.e. generic metadata) is defined in the system and is easily accessible by the users. The changing points, the metadata which are specific for a group of microscopes (i.e. specific metadata), are unknown to the framework. The system provides a structure to store and retrieve this group of metadata, however, at search time, the burden of knowledge about their key names is assigned to the user. Another contribution of this framework is in providing a well-defined extension point for PALLAS to connect to external data sources (databases). In a generalenough design, it provides a consistent solution, to query various data sources in a uniform approach. In the current version, we provided two plug-ins to connect to PDB (Protein Data Bank) and OMERO (Open Microscopy Environment Remote Objects) database respectively.
    LanguageEnglish
    Awarding Institution
    Supervisors/Advisors
    • Jarnikov, Dmitri, Supervisor
    • Schoenmakers, R., External supervisor, External person
    Award date1 Oct 2014
    Place of PublicationEindhoven
    Publisher
    Print ISBNs978-90-444-1320-5
    StatePublished - 1 Oct 2014

    Fingerprint

    Metadata
    Information management
    Microscopic examination
    Data structures
    Microscopes
    Big data
    Optical resolving power
    Image processing
    Electron microscopes
    Proteins
    Costs
    Industry

    Bibliographical note

    Eindverslag. - Confidential until 01-10-2016

    Cite this

    Zahedi, P., & Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology (ST) (2014). A big data management platform for rapidly changing environments Eindhoven: Technische Universiteit Eindhoven
    Zahedi, P. ; Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology (ST). / A big data management platform for rapidly changing environments. Eindhoven : Technische Universiteit Eindhoven, 2014.
    @phdthesis{fd3110c9afee409db10d61b05902f358,
    title = "A big data management platform for rapidly changing environments",
    abstract = "Big data is now a reality. Storing, managing, and analyzing very large amount of data is a common challenge in the world of technology where digital content is rapidly growing. In recent years, FEI advanced electron microscopes, with their unsurpassed magnification and resolving power brought an evolution to the microscopy industry, however, generating large volume data sets urged FEI to search for Big data management solutions. As a result, FEI developed PALLAS (Platform for ALL Analysis and Storage), an integrated framework to manage, process and visualize microscopy data. In order to cover drawbacks of PALLAS with respect to big data management, this assignment was initiated. This project focuses on investigating, designing, and prototyping a solution to cover metadata variety with respect to the following aspects: · Heterogeneous data types · Changing data models · Various data sources This report presents an extensible framework to absorb new data types to PALLAS, regardless of their data models. Moreover, this framework provides an extension point for the plug-ins to access various databases and other data sources. To absorb various data types in PALLAS, this framework dealt with two main challenges: accessing metadata of various data types, and storing them in a way that is optimal to search. The former was satisfied by introducing the concept of data type reader/ writer, defining its template, and elegating the task of providing it to those who intend to use it. This approach gives the chance of adding user-defined data types to PALLAS. The second challenge was addressed by selecting a database with flexible schema to accommodate the diversity in data type structures. mongoDB was selected from 10 candidates which were evaluated against quality attributes and nontechnical requirements of the framework. To handle the changing nature of data models of the microscope and image processing applications, and to avoid maintenance costs, the framework is made independent from changing parts. This goal was achieved by categorizing the microscopy metadata into generic and specific groups. The common metadata in all microscopy image files (i.e. generic metadata) is defined in the system and is easily accessible by the users. The changing points, the metadata which are specific for a group of microscopes (i.e. specific metadata), are unknown to the framework. The system provides a structure to store and retrieve this group of metadata, however, at search time, the burden of knowledge about their key names is assigned to the user. Another contribution of this framework is in providing a well-defined extension point for PALLAS to connect to external data sources (databases). In a generalenough design, it provides a consistent solution, to query various data sources in a uniform approach. In the current version, we provided two plug-ins to connect to PDB (Protein Data Bank) and OMERO (Open Microscopy Environment Remote Objects) database respectively.",
    author = "P. Zahedi and {Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology (ST)}",
    note = "Eindverslag. - Confidential until 01-10-2016",
    year = "2014",
    month = "10",
    day = "1",
    language = "English",
    isbn = "978-90-444-1320-5",
    series = "PDEng rapport",
    publisher = "Technische Universiteit Eindhoven",

    }

    Zahedi, P & Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology (ST) 2014, 'A big data management platform for rapidly changing environments', Eindhoven.

    A big data management platform for rapidly changing environments. / Zahedi, P.; Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology (ST).

    Eindhoven : Technische Universiteit Eindhoven, 2014.

    Research output: ThesisPd Eng ThesisAcademic

    TY - THES

    T1 - A big data management platform for rapidly changing environments

    AU - Zahedi,P.

    AU - Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology (ST)

    N1 - Eindverslag. - Confidential until 01-10-2016

    PY - 2014/10/1

    Y1 - 2014/10/1

    N2 - Big data is now a reality. Storing, managing, and analyzing very large amount of data is a common challenge in the world of technology where digital content is rapidly growing. In recent years, FEI advanced electron microscopes, with their unsurpassed magnification and resolving power brought an evolution to the microscopy industry, however, generating large volume data sets urged FEI to search for Big data management solutions. As a result, FEI developed PALLAS (Platform for ALL Analysis and Storage), an integrated framework to manage, process and visualize microscopy data. In order to cover drawbacks of PALLAS with respect to big data management, this assignment was initiated. This project focuses on investigating, designing, and prototyping a solution to cover metadata variety with respect to the following aspects: · Heterogeneous data types · Changing data models · Various data sources This report presents an extensible framework to absorb new data types to PALLAS, regardless of their data models. Moreover, this framework provides an extension point for the plug-ins to access various databases and other data sources. To absorb various data types in PALLAS, this framework dealt with two main challenges: accessing metadata of various data types, and storing them in a way that is optimal to search. The former was satisfied by introducing the concept of data type reader/ writer, defining its template, and elegating the task of providing it to those who intend to use it. This approach gives the chance of adding user-defined data types to PALLAS. The second challenge was addressed by selecting a database with flexible schema to accommodate the diversity in data type structures. mongoDB was selected from 10 candidates which were evaluated against quality attributes and nontechnical requirements of the framework. To handle the changing nature of data models of the microscope and image processing applications, and to avoid maintenance costs, the framework is made independent from changing parts. This goal was achieved by categorizing the microscopy metadata into generic and specific groups. The common metadata in all microscopy image files (i.e. generic metadata) is defined in the system and is easily accessible by the users. The changing points, the metadata which are specific for a group of microscopes (i.e. specific metadata), are unknown to the framework. The system provides a structure to store and retrieve this group of metadata, however, at search time, the burden of knowledge about their key names is assigned to the user. Another contribution of this framework is in providing a well-defined extension point for PALLAS to connect to external data sources (databases). In a generalenough design, it provides a consistent solution, to query various data sources in a uniform approach. In the current version, we provided two plug-ins to connect to PDB (Protein Data Bank) and OMERO (Open Microscopy Environment Remote Objects) database respectively.

    AB - Big data is now a reality. Storing, managing, and analyzing very large amount of data is a common challenge in the world of technology where digital content is rapidly growing. In recent years, FEI advanced electron microscopes, with their unsurpassed magnification and resolving power brought an evolution to the microscopy industry, however, generating large volume data sets urged FEI to search for Big data management solutions. As a result, FEI developed PALLAS (Platform for ALL Analysis and Storage), an integrated framework to manage, process and visualize microscopy data. In order to cover drawbacks of PALLAS with respect to big data management, this assignment was initiated. This project focuses on investigating, designing, and prototyping a solution to cover metadata variety with respect to the following aspects: · Heterogeneous data types · Changing data models · Various data sources This report presents an extensible framework to absorb new data types to PALLAS, regardless of their data models. Moreover, this framework provides an extension point for the plug-ins to access various databases and other data sources. To absorb various data types in PALLAS, this framework dealt with two main challenges: accessing metadata of various data types, and storing them in a way that is optimal to search. The former was satisfied by introducing the concept of data type reader/ writer, defining its template, and elegating the task of providing it to those who intend to use it. This approach gives the chance of adding user-defined data types to PALLAS. The second challenge was addressed by selecting a database with flexible schema to accommodate the diversity in data type structures. mongoDB was selected from 10 candidates which were evaluated against quality attributes and nontechnical requirements of the framework. To handle the changing nature of data models of the microscope and image processing applications, and to avoid maintenance costs, the framework is made independent from changing parts. This goal was achieved by categorizing the microscopy metadata into generic and specific groups. The common metadata in all microscopy image files (i.e. generic metadata) is defined in the system and is easily accessible by the users. The changing points, the metadata which are specific for a group of microscopes (i.e. specific metadata), are unknown to the framework. The system provides a structure to store and retrieve this group of metadata, however, at search time, the burden of knowledge about their key names is assigned to the user. Another contribution of this framework is in providing a well-defined extension point for PALLAS to connect to external data sources (databases). In a generalenough design, it provides a consistent solution, to query various data sources in a uniform approach. In the current version, we provided two plug-ins to connect to PDB (Protein Data Bank) and OMERO (Open Microscopy Environment Remote Objects) database respectively.

    M3 - Pd Eng Thesis

    SN - 978-90-444-1320-5

    T3 - PDEng rapport

    PB - Technische Universiteit Eindhoven

    CY - Eindhoven

    ER -

    Zahedi P, Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology (ST). A big data management platform for rapidly changing environments. Eindhoven: Technische Universiteit Eindhoven, 2014. (PDEng rapport).