Download Ski Resort Data – Global Ski Atlas

How We Create the Data

Built from OpenStreetMap using a multi-step pipeline

Our datasets are built from OpenStreetMap (OSM) using a multi-step pipeline. We use regional PBF extracts from Geofabrik—continental or country-level OSM data—then process each region through an 11-step pipeline.

Aerial view of ski resort with map data overlay

The 11-Step Pipeline

From OSM extract to GeoParquet output

Extract winter_sports – Pull ski areas and winter sport facilities from OSM
osm_nearby – Extract OSM features within ~2 km of each ski area
lifts and pistes – Extract lift lines and piste (trail) geometries
enrich – Add boundaries, administrative data, and enrich attributes
analyze – Compute statistics (trail counts, elevation, area, etc.)
parquet – Export to GeoParquet format for compact storage and fast reads
1000 ft buffer – Build a buffer polygon around each ski area for mapping
translate – Add or fill English names for resort display
elevation / contours – Attach elevation and contour data per ski area
re-export CSV – Regenerate analyzed CSV with elevation and final fields
combine_regions – Merge all regional outputs into one global dataset

Regions & Deployment

Scale by region, merge globally

Regions are defined in config/regions.yaml. Large areas (Europe, North America, Asia) are split into countries, states, or sub-regions so each run stays manageable. After processing, we combine regional outputs into a single global dataset using our combine_regions script.

The pipeline runs either locally with Docker or on AWS ECS Fargate for continent-wide batch jobs. Full Europe or North America runs take roughly 5–8 hours each.

View globalskiatlas_data on GitHub

World map with ski regions across continents

Datasets

GeoParquet format — use with Pandas, DuckDB, GeoPandas

Each file has embedded geometry. Download below:

ski_areas_analyzed.parquet Ski areas with stats. Used by the Online Atlas.

Download

ski_areas.parquet Ski area polygons and core attributes.

Download

lifts.parquet Lift lines as LineStrings with OSM attributes.

Download

pistes.parquet Piste lines with difficulty as LineStrings.

Download

Why Iceberg & AWS Glue?

So lots of people and apps can use the same data without stepping on each other’s toes.

We have millions of rows about ski areas, lifts, and trails. If we only kept them in one big file, only one person could update them at a time, and it’d be easy to overwrite someone else’s work.

Apache Iceberg is like a tidy filing system in the cloud: it keeps the data in chunks, tracks changes over time, and lets many tools read or write without breaking anything. AWS Glue is the “card catalog” that tells everyone where to find those files—so data scientists, apps, and this website can all use the same tables without getting lost.

Together they give us one shared source of truth for ski data that stays consistent and is easy to query. The numbers below are live from that system.

How we query it (query_iceberg.py)

Live Iceberg stats

From /api/iceberg-stats (Lambda)

Loading…

Download Parquet Data

How We Create the Data

The 11-Step Pipeline

Regions & Deployment

Datasets

Why Iceberg & AWS Glue?

Further Reading