Run SQL queries with DuckDB in Jupyter over Parquet files stored in Azure ADLS storage

You can use DuckDB to directly query Parquet files on S3 thanks to HTTPFS extension. Unfortunately the same is not so easy for Azure’s ADLS Gen. 2 Storage (S3-like service...

Read XML using Spark - example for parsing OSM changeset data

Edit: fixed xml schema which had an error Databricks created a library that allows Spark to parse XML files. Here I’ll convert OpenStreetMap dump of changesets that can be downloaded...

Web Maps - part 3

Continuing part two let’s go through vector tiles generators. Vector Tiles Generators There are many software projects that can generate vector tiles. Here are some of them. OpenMapTiles One of...

Web Maps - part 2

Continuing part one let’s go through vector tiles formats and schemas based on OpenStreetMap data. Vector tiles Part one and Mapbox docs 1 2 explain briefly what Vector tiles are....

Analyse OpenStreetMap data published by AWS open data programme locally with Spark

I recently found in OSM Weekly an AWS blog post showing how you can query OpenStreetMap data that AWS hosts in their Public Datasets using Athena. If you don’t want...

Web Maps - part 1

Web maps have evolved greatly in the recent years. Vector tiles is particularly hot technology that has seen a lot of development recently. I wanted to provide general intro into...

Import OpenStreetMap data to PostgreSQL with Imposm3

Introduction If you want to import OpenStreetMap data to PostgreSQL (+ PostGIS) database two popular tools are osm2pgsql and imposm3. Both were designed to prepare data for rendering although some...

Airflow relationships builder functions

Recently I encountered some Airflow util functions that seems very useful but no tutorial that I have seen mentions them. These functions help with building tasks relationships in a DAG...