SpatiaL ETL Tutorial: Using buffer zones to define areas

Software_tips_tricks_EN

Using the Talend Open Studio open source software and the SpatiaL ETL plug-in GeoSpatial Integration for Talend, we will show you how you can manage geodata and spatial operations in your ETL processes. But first of all, please download and install a current version of Talend Open Studio for Data Integration and Disys GeoSpatial Integration for Talend plug-in:

The download package from Disy contains all the data you need to join this tutorial.

For our example, we have chosen an "Airport" scenario. As a starting point, we have chosen Frankfurt/Main Airport. The task here is to calculate zones surrounding the airport within a specific radius. For this, we use global airport data as basic data. The airports will be displayed as points on a map.

This tutorial will show you how to use Disy’s geo-spatial component “dBufferCalculator“, which is one of THE core functionalities of common spatial information systems. Thanks to Disy’s GeoSpatial Integration software, users can now also calculate buffer zones in combination with Talend. This component helps you create one or more buffer zones based on distance parameters. These areas are often used as the basis for analyses and planning. In addition, the results of buffer operations frequently represent intermediate steps within more complex calculations. Buffer zones are also often used as spatial filters.

The scenario selected for our tutorial can be applied to many other scenarios where an area surrounding a point, object or city needs to be defined. For example:

  1. Biology: Distance between the population of species and development areas, new roads planned for a specific area, etc.
  2. Urban planning: Provision of green spaces in urban areas
  3. Interdisciplinary: Number of registered objects (trees, gas stations, lakes) within a specific search radius

Step 1: Create Metadata in Talend

First, you have to create a new connection to this database. In the Repository tree view, select Metadata > Db Connections > Create connection.

A dialog opens in which you specify the name and purpose of the new connection. Select DB Type > SQLite. Browse and select the natural_earth.sqlite file in your installation. After you unpacked the data provided with the plugin, you should find the file under <YourInstallationPath>\Disy_GeoSpatial_Sample_Data\in\DataPreparation End the dialog by clicking Finish.

The new connection NaturalEarth is listed under Db Connections (Fig. 2). Right-click NaturalEarth and select Retrieve Schema from the context menu (Fig.3).

Fig. 1: Create new connection
Fig. 1: Create new connection
Fig. 2: The Connection will be shown in the Metadata
Fig. 3: Retrieve Scheme of NaturalEarth
Fig. 3: Retrieve Scheme of NaturalEarth
Fig. 4: The table "NE_10M_AIRPORTS" is selected and loaded

Fig. 4: The table "NE_10M_AIRPORTS" is selected and loaded

Click next and load the table with point airport information. In the Name Filter field enter NE. The tables that start with “NE” are filtered, select NE_10M_AIRPORTS. There are 12 columns loaded and the creation status shows Success.

Fig. 5: Unnecessary columns are marked and deleted

Fig. 5: Unnecessary columns are marked and deleted

Click Next. Remove the unnecessary columns by selecting the column and the option ‘X’ (Remove selected items). Remove all columns except PK_UID, type, name, iata_code and Geometry.

Now set the following properties:

  • Column PK_UID: set the type as Integer (INT) and select the checkbox key to set it as primary key.
  • Column Geometry: make sure that the DB Type is set as POINT and select the Type as DGeometry. This type is used by the Geospatial Integration Plugin to handle geometries.

Click Finish. The schema for the table NE_10M_AIRPORTS is now available under Metadata.

Step 2: Create Geo-ETL-Job

In the Repository, click Job Designs and create a new folder Airports. Right-click the folder and select Create job (Fig.7). Enter Name (e. g. buffer_airport), purpose and description and click Finish. The new job is displayed under Airports. (Fig. 8).

Fig. 7: A new job is created in the repository
Fig. 7: A new job is created in the repository
Fig. 8: The job has been created and still needs to be edit
Fig. 8: The job has been created and still needs to be edit

Step 3: Edit Job

Fig. 9: The data from NE_10M_AIRPORTS are draged into the job area and the component dSpatialLiteInput is created.

Fig. 9: The data from NE_10M_AIRPORTS are draged into the job area.

First, extract the data. In the Metadata area, under Db Connections > NaturalEarth > Table Schemas, select “NE_10M_AIRPORTS” and drag & drop it to the job’s workspace. A dialogue will open automatically. You can now choose which component you want to work with. Select dSpatiaLiteInput.

Fig. 10: The properties of NE_10M_Airports are displayed

Fig. 10: The properties of NE_10M_Airports are displayed

The NE_10M_AIRPORTS table is now available in the job. Double-click it to display its Basic settings view. The properties show the values defined in the metadata in the first step.

Fig. 11: The component tReplicate is dragged into the workspace

Fig. 11: The component tReplicate is dragged into the workspace

To generate two different data flows for the input data, you will use the tReplicate component. In the component palette, search for tReplicate, select the component, drag & drop it to the job’s workspace.

Fig. 12: Create a main-row-connection

Fig. 12: Create a main-row-connection

Connect the dSpatiaLiteInput component to the tReplicate component using Row > Main connection and label the connection as airports.

Fig. 13: The structure of the Geo-ETL job is visible

Fig. 13: The structure of the Geo-ETL job is visible

Add a component to create the buffer zones.
In the component palette, under Disy GeoSpatial > Geom Operations, select the dBufferCalculator component, drag & drop it to the job’s workspace. Drop another dBufferCalculator component to the job’s workspace.

Connect the tReplicate component to both dBufferCalculator components using Row > Main connection.  Rename both components so they describe the buffer zones to be created (i.e. Buffer_Calculator_50km and Buffer_Calculator_20km).

Fig. 14: The 50 km buffer is created

Fig. 14: The 50 km buffer is created

Double-click the 50-km buffer calculator to display its Basic settings view. Apply the following settings:

  • Input geometry column: Geometry
  • Buffer width: 50000 (50 km)
  • Number of line segments: 8
  • Line end cap style: Round
  • Output buffer geometry column: Geometry

Similarly, in the 20-km instance of dBufferCalculator component, apply the similar settings. But: We apply „buffer width“ 20000 (20 km).

Now you will write the resulting buffer areas into two shapefiles. Drag two dShapeFileOutput components to your job. The component is available in the palette under Disy GeoSpatial > File > Shapefile Rename the components (i.e. ShapeOutput_50km and ShapeOutput_20km).

Connect the dBufferCalculator components to the dShapeFileOutput components using Row > Main connection.

Display the basic settings of the first dShapefileOutput component.

Set the following properties:

  • Shapefile: select the shapefile’s storage location and give it a name (Buffer_50km.shp)
  • Schema: the schema will be automatically taken from the tMap component
  • GeometryType: by default the proper type Polygon/Multipolygon is already selected.
  • Coordinate reference system: enter “EPSG:3857”. The output coordinate system is the same as the input data.
  • Encoding: select an encoding, for example ISO88-59-15

Repeat the same steps for the second dShapefileOutput component.

Execute the job by clicking on the green triangle in the menu bar. The output files (two Shapefiles) will be saved to the path you specified for further use. They can now be edited or visualized.

The following figure shows what our final ETL job looks like.

Fig. Here's what your Geo-ETL job should look like

Fig. Here's what your Geo-ETL job should look like

Congratulations, you have successfully finished the Tutorial!

In the map created with Cadenza, we see our result visualized (Fig. 16).

Fig. 16: Visualization of the two buffers in Cadenza

Fig. 16: Visualization of the two buffers in Cadenza