Lecture Notes on Data Engineering and Communications Technologies
- Anthology ID:
- G21-27
- Month:
- Year:
- 2021
- Address:
- Venue:
- GWF
- SIG:
- Publisher:
- Springer Singapore
- URL:
- https://gwf-uwaterloo.github.io/gwf-publications/G21-27
- DOI:
Large Scale Image Registration Utilizing Data-Tunneling in the MapReduce Cluster
Amit Kumar Mondal
|
Banani Roy
|
Chanchal K. Roy
|
Kevin A. Schneider
Applications of image registration tasks are computation-intensive, memory-intensive, and communication-intensive. Robust efforts are required on error recovery and re-usability of both the data and the operations, along with performance optimization. Considering these, we explore various programming models aiming to minimize the folding operations (such as join and reduce) which are the primary candidates of data shuffling, concurrency bugs and expensive communication in a distributed cluster. Particularly, we analyze modular MapReduce execution of an image registration pipeline (IRP) with the external and internal data (data-tunneling) flow mechanism and compare them with the compact model. Experimental analyzes with the ComputeCanada cluster and a crop field data-sets containing 1000 images show that these design options are valuable for large-scale IRPs executed with a MapReduce cluster. Additionally, we present an effectiveness measurement metric to analyze the impact of a design model for the Big IRP, accumulating the error-recovery and re-usability metrics along with the data size and execution time. Our explored design models and their performance analysis can serve as a benchmark for the researchers and application developers who deploy large-scale image registration and other image processing tasks.