SFMTA speeding up mobility data using Talend Software and spatial ETL plug-in from Disy

From the bay to the hills: San Francisco Municipal Transportation Agency (SFMTA) relies on unified data processing with Talend Real-Time Big Data and GeoSpatial Integration for Talend to manage its mobility data. In tandem, they accelerate the processes for the various SFMTA services.

Verkehrsbetriebe San Francisco managen Mobilitätsdaten mit Software von Disy

Around 800,000 people live in San Francisco, and almost 9.5 million people live in the Bay Area, on the West Coast of the USA. Every day, hundreds of thousands of residents and commuters coming into the city from the surrounding districts use the public transportation system and services of the San Francisco Municipal Transportation Agency (SFMTA). However, construction sites and disruptions are commonplace. To overcome these local obstacles in real life, the SFMTA works with a sophisticated data system. All the available data flows together in a so-called data lake. This includes data from the individual events (accidents, access to the individual means of transport, or temporary construction sites), but also location-specific data of individual stops, temporarily closed bus stops, and even the location of curbs that limit accessibility to the stops. Data about the behavior of passengers and residents is also included. This entire data management is operated by the software Talend Real-Time Big Data.

Integrated geodata and operations

Thanks to the spatial ETL plug-in GeoSpatial Integration for Talend, developed by Disy, geospatial data is seamlessly integrated directly into Talend, enabling uniform processing of all data, including geospatial data, within a single platform. Around 21,000 trips per day are made with e-scooters alone. In other parts of the system, 700,000 boardings are recorded per day. The plug-in provides ready-to-use geo operations, which are transferred to the spatial ETL job via drag and drop and used within the workflow. Directly in the integration and transformation process, surfaces or lengths can be calculated, geometries can be intersected, and buffers, convex hulls or bounding boxes can be created. Geometries are quickly checked for validity and the distance between them is measured. Data processing and modeling takes place within one management platform. The data quality is exceedingly high.

Consistently high quality of mobility data

Information about travel routes, availability and usage patterns is efficiently generated and processed. Additional tools or geo-databases are no longer required for calculation and processing since the spatial processing of the data is performed directly in the process. Wassilios Kazakos, Head of Marketing and Business Development, is pleased about the beneficial use of the Disy software at the San Francisco Municipal Transportation Agency: "With the plug-in, SFMTA is able to provide its customers with consistently high, controllable quality of their mobility data with much faster and more efficient processes. GeoSpatial Integration for Talend, together with Talend software, is therefore perfect for all companies that process and use geospatial and location-based data, such as locations, areas, traffic routes, etc. ."

Time saved through direct integration of geodata sources

The Disy plug-in for the Talend platform is ideal because it provides direct integration options for common geodata sources. Many data sources and databases are directly supported, namely PostGIS, Oracle Locator and Spatial, SpatiaLite, ArcGIS Server, Shapefile and GeoJSON. This is particularly important for the development of ETL/ELT processes and data warehouse solutions, but also for the integration of streaming big data or data processing in the cloud.

Expansion to real time data processing planned

SFMTA, with Talend and GeoSpatial Integration for Talend, currently processes data according to the Mobility Data Specification (MDS) for e-scooters, bicycles, and mopeds and generates or processes information about travel routes, availability and usage patterns. The final data is made available to other applications via two interfaces. The amount of data to be processed is enormous: around 100,000 transactions are carried out per day. The SFMTA expects the number of transactions to rise to 200,000 per day in the future. For the future, the use of the data in real-time mode is planned. Depending on the service, the required response times range from sub-second to 30 seconds. With GeoSpatial Integration for Talend, the San Francisco Municipal Transportation Agency is well prepared for this.

The GeoSpatial Integration for Talend plug-in for Talend Open Studio can be downloaded for free from Disy’s website www.disy.net/spatial-etl. For Talend customers in need of professional support, a commercial version is available as a subscription or bundle with one of the Talend platforms.