Home > Geospatial Data Analysis > Geospatial Data Analysis: A Review of Theory and Methods

Geospatial Data Analysis: A Review of Theory and Methods




A vast amount of data is generated and collected every moment and often, data has a spatial and/or temporal aspect. This increasing data generation and collection is resulting in increasing volume and varying formats of data being collected and the geospatial data collection is no exception. This posses challenges in storing, processing, analyzing and visualizing the geospatial data. This paper discusses the big data paradigm of the geospatial data and presents a taxonomy for analysis of the geospatial data. The existing literature is studied and discussed based on the proposed taxonomy for analysis of geospatial data.


Spatial data also known as geo-spatial data is information about any physical object on earth that can be represented by numerical values in geographic co-ordinate system. Generally it represents the location, size and shape of the object. With the rise of web, Geotagging data has become more popular as well. Geotagging is the process of adding geographical identification metadata to various media such as photographs or videos, websites, SMS, etc. This data consists of many other information in addition to the latitude and longitude details.

A. Big Data Paradigm

In a 2001 research report Lanley [1] defined data growth challenges and opportunities by these features: Volume (amount of data), ariety(range of data types and sources) and Velocity (speed of data in and out). It is called as 3Vs model for describing big data. The spatial data is considered under the BIG DATA paradigm as the spatial data possess all the characteristics of big data like Volume, Veracity and Velocity.

1) Volume: The images collected from various earth observing satellites contains rich information. As the resolution of the image increases, the size of the image increases.

2) Variety: Geospatial data consists of three basic models: rater (e.g. satellite images), vector (encompassing points, lines and polygons) and graph (spatial network). Multiple sources and approaches are used to collect spatial data on these three forms. There are different types of formats available to store spatial representation of different objects. They are:

  • Geo Tiff [Geotagged Tag Image File Format for exchanging raster graphics (bitmap) images between application programs (.tif)]
  • IMG
  • HDF [Hierarchical Data format is a set of file formats designed to store and organize large amounts of data(.hdf5)]
  • NETCDF [Network Common Data Form is a set of software libraries and self-describing machine independent data formats that support the creation, access and sharing of array oriented scientific data (.nc)]
  • BIL [A BIL image file which means bands interleaved by line is an uncompressed file containing the actual pixel values of an image. It stores pixel information in separate bands within the file]
  • AAIGRID [This is the ASCII interchange format for Arc/Info Grid, and takes the form of an ASCII file plus sometimes an associated .prj file.]
  • Vector [There are numerous vector formats: EPS, SVG,PDF, AI, DXF, JPEG, JPG, PNG, BMP etc]
  • Twitter [Geo located tweets]
  • Text [Geo located text]
  • GIS Images

This available heterogeneity across the data sets adds to the variety component of big data.

3) Velocity: The real-time monitoring of earth or any other object means a continuous flow of data which requires high computing and storage capabilities. In addition to having these 3Vs the spatial data also poses these features of high dimensionality, high complexity and high uncertainty as pointed out by Liu et al in [19] .

Leave a Comment:

Your email address will not be published. Required fields are marked *