Home > Geospatial Data Analysis > Geospatial Data Analysis: A Review of Theory and Methods

Geospatial Data Analysis: A Review of Theory and Methods


A. Analytics platform

Anuradha [5] proposes a geo spatial analytics big data platform that uses Hadoop, Hive, other NoSql technologies along with Relational DB to process geo spatial data. The architecture is a proposal and is not available as an implementation. The proposal also describes the required functionality of different components of the architecture. The architecture as shown in [5] is below:

Fig. 1. Logical Architecture of Big Data Platform

B. Data processing frameworks

Klien et al [7] proposed Physical Analytics Integrated Repository and Services (PAIRS) a scalable geo-spatial data analytics platform. It enables rapid data discovery by automatically updating, joining and homogenizing data layers in space and time. It helps in automatic data download,data curation and scalable storage and a computational platform for running physical and statistical models on the curated datasets.

It claims that the key differentiator is its capability in multilayer query, to search multiple data layers and filter based on multiple search criteria. It uses HBase to store data and index built is based on latitude, longitude and timestamp. It uses open source tools to convert data layer projections into WGS84 co-ordinate system. It helps to manage data from multiple sources in a scalable fashion on distributes compute resources. The main component of PAIRS is its Data Integration Engine. It can download, re-project and index data. If a data format is not raster, then it rasterized and handled as a large matrix. The data integration layer of PAIRS as shown in [7] is below:

Fig. 2. Data Integration Layer of PAIRS

XinChen et all [8] propose a high performance integrated spatial big data analytics framework based on MapReduce paradigm and present a few use cases. Their data integration mainly focuses on spatial datasets that relies on these following query types:

1) Point-in-polygon queries;
2) Cross-matching queries;
3) Nearest neighbor queries.

These kinds of queries are both data and compute intensive. Hence their spatial data integration is based on extending Hadoop- GIS with scalable spatial clustering and spatial regression capabilities.

Stefan Hagedorn et al [9] compare existing solution for spatial data processing on Apache Hadoop and Apache Spark. The comparison is based on their features and their performances in a micro benchmark for spatial filter and joins queries. They have compared the Hadoop extensions: Hadoop-GIS [10], Spatial Hadoop [11] and the Spark based systems SpatialSpark [12], GeoSpark [13] and their implementation STARK[https://github.com/dbis-ilm/stark]. The comparison table I shown below as given their paper gives a feature level comparison.

Jae-GilLee et al [14] present an overview of existing geospatial big data challenges and opportunities and propose a geo spatial data processing architecture identifying the existing technologies and the proposed new technologies separately with a spatial online analytical processing module.

C. Systems

David Haynes et al [15] propose Terra Populus a system that provides three web applications that allows to access, analyze and tabulate different datasets under a common platform.

1) Paragon is a prototype parallel spatial database that extends the functionality of PostgreSQL and PostGIS onto multimode systems.
2) Terra Populus tabulator application builds dynamic queries for analyzing large population survey data.
3) Terra Explorer is an exploratory analysis tool for visualizing the spatial datasets within the repository.

Barik et al [16] propose a Fog Computing based framework called FogGIS for mining from geospatial data. It is built as a prototype using Intel Edison, an embedded microprocessor. This work claims the following contributions:

  • FogGIS framework proposes improved throughput and reduced latency for analysis and transmission of geospatial data.
  • Different compression techniques for reducing data size, thereby reducing transmission power.

Bosch et al [17] propose a geo-spatial document analysis system for the VAST 2011 Mini Challenge 1. Their system equips the user to interact with the data in a visual, direct and scalable fashion, which offers diverse views and data management components. Wang et al [30] have proposed TerraFly Geo-Spatial Cloud platform. This system provides comprehensive spatial analysis methods and visualization.

Pages ( 3 of 8 ): « Previous12 3 45 ... 8Next »

Leave a Comment:

Your email address will not be published. Required fields are marked *