** 5) Map-Reduce for Polygon retrieval:** For many spatial data analysis and computational problems, polygon retrieval is a fundamental operation which is often computed under real time constraints. Given that there is unprecedented growth in terrain data in volume and rate, many sequential algorithms do not effectively meet this demand [28]. Guo et al [28] propose a MapReduce based parallel polygon retrieval algorithm that aims at minimizing the IO and CPU loads of the map and reduce tasks during spatial processing.

The terrain data is usually represented using one of the common data structures to approximate surface either using digital elevation model(DEM) or triangulated irregular network(TIN). Their proposed algorithm hierarchically indexes the spatial terrain data usng a quad-tree index. Also, a prefix tree based on the quad-tree index is built to query the relationship between the terrain data and query area in real time. Their proposed technique first divides the entire data set into several chunks of files based on a quad-tree prefix. Then for each range query, a prefix tree is used to organize the set of quadindices whose corresponding grids intersect the query area.

Prior to processing a query, these indices are used to filter the unnecessary TIN data. The relationship between the TIN data and the query shape is pre-tested through the built prefix tree in the map function in order to minimize the computation.

** 6) The General Spatial Interaction Model:** Spatial interaction models are statistical models used to predict origin destination flows. They are widely applied to geography, planning, transportation and the social sciences to predict the interactions or flows related to commuting, migration, access to services etc. Mathematically, when a series of observations y(i,j) : i, j = 1,…, n on random variables Y (i, j) is given, each of which corresponds to movements of people (cars, commodities or telephone calls) between origin and destination locations i and j. The Y (i, j) are assumed to be independent random variables. They are sampled from a specified probability distribution that is dependent upon some mean say $\mu(i,j)$, then the statistical model of the general form is given by:

where $\mu(i,j) = E[Y(i,j)$ is the expected mean interaction frequency from i to j, and $\in(i, j)$ is an error about the mean. The mean interaction frequencies between origin i and destination j are also modelled as:

where A(i) called as origin-specific factors, B(j) called as destination-specific factors and S(i, j) is a function of some measure of separation between location i and j.

**This interaction model relies on three types of factors: **

**1. ** Origin-specific factors that characterize the ability of the origin locations to produce or generate flows,

**2. ** Destination specific factors that represent the attractiveness of destination,

**3. ** Origin-destination factors that characterize the way spatial separation of origins from destinations constrains or impedes the interaction.

### VI. APPLICATIONS OF SPATIAL DATA

There are a number of applications built based on the geospatial data. Broadly they can be classified into the following:

**1)** Epidemiological data analysis

**2)** Geospatial data based recommender system for the location of health services

**3)** Seismic hazard assessment

**A. Epidemiological data analysis**

Wang et al [29] present a geospatial epidemiology analysis system on the TerraFly Geo-spatial cloud platform [30]. They present a four kinds of API algorithms for data analysis and results visualization based on the TerraFly GeoCloud system like disease mapping(mortality/morbidity map, SMR map), disease cluster determination(spatial cluster, HotSpot analysis tool, cluster and outlier analysis), geographic distribution measurement(mean central, median central, standard distance, distributional trends), and regression (linear regression, spatial auto-regression).

Lopez et al [31] present the spatial data model to predict the epidemiological impact of influenza in Vellore, India.

They use the geographically weighted regression to predict the H1N1 influenza epidemic for 2013-2014. The geographically weighted regression model finds the local regression model for each region i, and uses the local regression co-efficient to estimate the influenza prevalence for 2013 – 2014.

They have used the diagnostics block to validate the model based on Akaike information criteria and $R^2$ value. The results of geographically weighted regression model are evaluated in terms of residuals and regression coefficient.

They have inferred that H1N1 influenza prevalence has positive correlations with rainfall and wind speed, and negative correlations with temperature and humidity.