SpatiaL ETL Tutorial: Using buffer zones to define areas
Using the Talend Open Studio open source software and the SpatiaL ETL plug-in GeoSpatial Integration for Talend, we will show you how you can manage geodata and spatial operations in your ETL processes. But first of all, please download and install a current version of Talend Open Studio for Data Integration and Disys GeoSpatial Integration for Talend plug-in:
The download package from Disy contains all the data you need to join this tutorial.
For our example, we have chosen an "Airport" scenario. As a starting point, we have chosen Frankfurt/Main Airport. The task here is to calculate zones surrounding the airport within a specific radius. For this, we use global airport data as basic data. The airports will be displayed as points on a map.
This tutorial will show you how to use Disy’s geo-spatial component “dBufferCalculator“, which is one of THE core functionalities of common spatial information systems. Thanks to Disy’s GeoSpatial Integration software, users can now also calculate buffer zones in combination with Talend. This component helps you create one or more buffer zones based on distance parameters. These areas are often used as the basis for analyses and planning. In addition, the results of buffer operations frequently represent intermediate steps within more complex calculations. Buffer zones are also often used as spatial filters.
The scenario selected for our tutorial can be applied to many other scenarios where an area surrounding a point, object or city needs to be defined. For example:
- Biology: Distance between the population of species and development areas, new roads planned for a specific area, etc.
- Urban planning: Provision of green spaces in urban areas
- Interdisciplinary: Number of registered objects (trees, gas stations, lakes) within a specific search radius
Step 1: Create Metadata in Talend
First, you have to create a new connection to this database. In the Repository tree view, select Metadata > Db Connections > Create connection.
A dialog opens in which you specify the name and purpose of the new connection. Select DB Type > SQLite. Browse and select the natural_earth.sqlite file in your installation. After you unpacked the data provided with the plugin, you should find the file under <YourInstallationPath>\Disy_GeoSpatial_Sample_Data\in\DataPreparation End the dialog by clicking Finish.
The new connection NaturalEarth is listed under Db Connections (Fig. 2). Right-click NaturalEarth and select Retrieve Schema from the context menu (Fig.3).
Now set the following properties:
- Column PK_UID: set the type as Integer (INT) and select the checkbox key to set it as primary key.
- Column Geometry: make sure that the DB Type is set as POINT and select the Type as DGeometry. This type is used by the Geospatial Integration Plugin to handle geometries.
Click Finish. The schema for the table NE_10M_AIRPORTS is now available under Metadata.
Step 2: Create Geo-ETL-Job
In the Repository, click Job Designs and create a new folder Airports. Right-click the folder and select Create job (Fig.7). Enter Name (e. g. buffer_airport), purpose and description and click Finish. The new job is displayed under Airports. (Fig. 8).
Step 3: Edit Job
First, extract the data. In the Metadata area, under Db Connections > NaturalEarth > Table Schemas, select “NE_10M_AIRPORTS” and drag & drop it to the job’s workspace. A dialogue will open automatically. You can now choose which component you want to work with. Select dSpatiaLiteInput.
Add a component to create the buffer zones.
In the component palette, under Disy GeoSpatial > Geom Operations, select the dBufferCalculator component, drag & drop it to the job’s workspace. Drop another dBufferCalculator component to the job’s workspace.
Connect the tReplicate component to both dBufferCalculator components using Row > Main connection. Rename both components so they describe the buffer zones to be created (i.e. Buffer_Calculator_50km and Buffer_Calculator_20km).
Similarly, in the 20-km instance of dBufferCalculator component, apply the similar settings. But: We apply „buffer width“ 20000 (20 km).
Now you will write the resulting buffer areas into two shapefiles. Drag two dShapeFileOutput components to your job. The component is available in the palette under Disy GeoSpatial > File > Shapefile Rename the components (i.e. ShapeOutput_50km and ShapeOutput_20km).
Connect the dBufferCalculator components to the dShapeFileOutput components using Row > Main connection.
Display the basic settings of the first dShapefileOutput component.
Set the following properties:
- Shapefile: select the shapefile’s storage location and give it a name (Buffer_50km.shp)
- Schema: the schema will be automatically taken from the tMap component
- GeometryType: by default the proper type Polygon/Multipolygon is already selected.
- Coordinate reference system: enter “EPSG:3857”. The output coordinate system is the same as the input data.
- Encoding: select an encoding, for example ISO88-59-15
Repeat the same steps for the second dShapefileOutput component.
Execute the job by clicking on the green triangle in the menu bar. The output files (two Shapefiles) will be saved to the path you specified for further use. They can now be edited or visualized.
The following figure shows what our final ETL job looks like.
Congratulations, you have successfully finished the Tutorial!
In the map created with Cadenza, we see our result visualized (Fig. 16).