Very large data sets, often being referred as Big Data, that are hard to collect, store, process and visualize using traditional computational methods are bringing challenges in various domains of the technology world. The resent advances in electron microscopy that enable the collection of large scale data sets make necessary for FEI Company to search for answers on Big Data management. This project focuses on investigating and prototyping possible solutions on the domains of data modeling and data analysis for this kind of data sizes. Failing to answer the problems arising from Big Data will result in a lot of missed opportunities. The hardware may be able to acquire more information but if the software is not able to process it then the added knowledge will be lost. Moreover, as more information is becoming available, combining insights from different experiments becomes a harder task and a more efficient data organization technique needs to be identified in order to facilitate the association of knowledge. Finally, Big Data cannot be tackled with traditional products and new models, such as cloud computing or grid computing, need to be considered. On the data analysis domain, this report presents a conceptual idea about a novel organization of data based on their spatial and temporal co-ordinates. Such a data model, referred as the hypercube, can enable the combination of data coming from different sources (sensors) and with different characteristics (e.g. different resolutions). The high-level design for the visualization use case is given and the extendibility of the design to cover the use case of raw data querying is also portrayed. In order to realize such a data model, more research effort needs to be invested and a clear strategy also needs to be identified. Nevertheless, this project also confirmed the need for a solution in this domain and the presented idea can serve as a starting point. On the data analysis part, the need for a high-performance computing solution led to the evaluation of Hadoop as the platform on which FEI can built image processing applications that can deal with massive data sets. Hadoop is a software framework based on the MapReduce programming model which is used by internet giants such as Google, Yahoo and Facebook in order to process their web-scale data sets. Hadoop can also work together with Target, a data management framework that FEI is investigating. Additionally, Hadoop is a mature, highly accepted framework that can bring scalability, deployability and robustness in FEI's applications. Three-dimensional image processing on MapReduce is not an area that has been previously explored but this project demonstrates ways to map different algorithm families in the new model and evaluates their performance. The results are very promising and the investigation can now move to the next step of demonstrating complete applications. This project is placed in the beginning of the study for solutions on Big Data challenges. The results presented should be further discussed and expanded. The Hadoop evaluation can be directly picked-up and the next phase can immediately start. On the data modeling part a clear strategy needs to be identified and the hypercube idea should be further evaluated within different projects in the company. Overall the high performance storage, computing and visualization projects should receive enough focus to be able to align with the breakthroughs in other disciplines.
|Award date||4 Oct 2012|
|Place of Publication||Eindhoven|
|Publication status||Published - 2012|