Spatial ETL with Talend

Spatial ETL is the process of merging different data records including spatial data. With GeoSpatial Integration for Talend, Disy closes a major gap in the field of Spatial ETL tools. Talend users can now also incorporate spatial data from different spatial databases and data formats.

Spatial ETL with Talend software

Why Spatial ETL?

It is becoming increasingly important for authorities and companies to merge, check and supply the growing amount of data and spatial data from custom applications or sensors for comprehensive analysis, data portals and reporting obligations. Precisely this process of merging different data records, including spatial data, is known as Spatial ETL.

Disy has been trusting Talend’s software platform for professional data integration solutions for many years. Talend is one of the global market leaders in the field of data integration. The company has developed a new Spatial ETL plug-in for Talend with which spatial data can be seamlessly applied and processed in Talend. The software can now be downloaded free of charge for Talend Open Studio.

In many projects, not only attribute data but also in particular spatial data play a decisive role. Special requirements are placed on these which, until now, have only been taken into account to some extent in ETL tools such as Talend. Additional tools often had to be made use of in order to process spatial data.

The desire for a tool which could be integrated into the existing Talend process as seamlessly as possible arose, enabling the user to apply a consistent way of working for all data. In this case we speak of a Spatial ETL tool, or also a Geo ETL tool.


A plug-in automating Spatial ETL processes

With “GeoSpatial Integration for Talend”, Disy has developed an extension for Spatial ETL that makes additional connectors available for spatial data sources as well as spatial calculators and operators. It enables alphanumerical data to be geographically enriched and spatial data to be simply incorporated into data integration processes.

Two fundamental considerations were at the forefront of development for Disy: On the one hand, we know the precise requirements for working with spatial data in a data integration process from numerous projects and were able to use our experience to create a slim tool which is perfectly fitted to cover the topic of spatial data in data warehouse projects. On the other hand, we saw a high demand for efficient solutions in our customer projects. Data quantities are constantly rising and, without the respective tools, can only be used with high personnel expenses.

ETL becomes Spatial ETL: Seamless extension to the Talend toolbar with additional spatial operations

The Talend software user interface: Spatial ETL processes can be visually realized by dragging and dropping the components and routines.

The new plug-in for Spatial ETL is directly incorporated into the Talend environment and therefore seamlessly extends the existing toolbar. The users obtain additional data sources and operators that they can move to their workspace via Drag & Drop. Depending on the component currently used, they can select further settings or perform additional calculations.

Popular relational databases, such as Oracle or PostgreSQL, have already been supporting spatial data types and operators for processing spatial data for years with Oracle Locator/Spatial or PostGIS.

With the Spatial ETL plug-in GeoSpatial Integration for Talend, which has been developed by Disy, this spatial data can now be directly integrated. In concrete terms, the plug-in currently supports the following databases and formats: Oracle Locator and Spatial, PostgreSQL with PostGIS, SQLite with SpatiaLite as well as Shapefiles and WKT (Well-Known-Text). More connectors for SAP HANA and ArcGIS-Server will follow soon.

In addition, there are numerous components and spatial operators with whose help spatial operations are carried out.

These include

  • length and polygon calculations,
  • the conversion of X, Y and Z coordinates into 2D/3D point geometries,
  • the calculation of centroids,
  • the buffering of points, lines and polygons,
  • the intersecting of geometries,
  • the calculation of a bounding box (envelope) or a convex hull for one or more geometries,
  • the connection of points to lines or lines to polygons as well as the transformation of the coordinates between different coordinate systems, and
  • the algorithmic simplification of complex geometries or even the validation of input data (e.g. Shapefiles).

Increasing efficiency and reducing costs using a uniform tool for ETL and Spatial ETL

When setting up data warehouses or evaluation databases using spatial data, this solution offers two major advantages:

(1) It is no longer necessary to use several tools and technologies. All types of data can now be processed seamlessly with a single tool. This saves organizational costs for merging the tools, reduces the expense of training effort and ensures a consistent approach to alphanumerical data and spatial data.

(2) Proven and field-tested ETL technologies, like those already offered by Talend for attribute data, can now also be used for spatial data processing, so that Talend can be extended to become a tool for Spatial ETL.

In addition to the extremely comprehensive amount of data sources, components and routines which are supplied with GeoSpatial Integration, this also includes the functions that Talend already provides. In particular, functions for version management, metadata management, for working with distributed teams and release management, for refactoring as well as central administration, load balancing or even big data processing should be emphasized here.

An example of Spatial ETL: Automated processing of shipping routes’ GPS tracks using Talend and subsequent visualization of the tracks using Cadenza.

An example of Spatial ETL: Automated processing of shipping routes’ GPS tracks using Talend and and GeoSpatial Integration andsubsequent visualization of the tracks using Cadenza.

Talend – a “leader” in the field of data integration

Talend in the “Magic Quadrant for Data Integration Tools” by Gartner (source: Gartner, August 2016). ETL or Spatial ETL are core processes in data integration.

Talend has become specialized in the integration of large amounts of data, right up to big data and streaming data, and is now also a global leading company in the field of big data and cloud integration solutions. In this context, Talend was even included in the “Magic Quadrant for Data Integration Tools” by Gartner in the year 2016 as a “leader” in the field of data integration.

Free download of the Spatial ETL tool, webinars and training sessions

The Spatial ETL plug-in “GeoSpatial Integration for Talend” is now available for Talend Open Studio. The plug-in is free for Talend Open Studio, so you can test it and use it directly in smaller scenarios. For companies and authorities who wish to use the solution in larger productive systems and with Talend Data Integration, Talend Data Management Platform or Talend Real-time Big Data, an annual subscription is offered for professional support and additional functions for data quality, visualization etc.

If you wish to test Talend and GeoSpatial Integration for Talend, you can download the plug-in for Talend Open Studio using the following link: Download GeoSpatial Integration for Talend Open Studio.

If you have any questions on the plug-in, the version or on Talend itself, please get in touch with us directly or you can visit one of our webinars or a training session.