Global Ski Atlas
Download App Coming Soon

Download Parquet Data

Download ski resort datasets in Parquet format and learn how we create them.

How We Create the Data

Built from OpenStreetMap using a multi-step pipeline

Our datasets are built from OpenStreetMap (OSM) using a multi-step pipeline. We use regional PBF extracts from Geofabrik—continental or country-level OSM data—then process each region through an 11-step pipeline.

Aerial view of ski resort with map data overlay
Ski lift ascending mountain with pistes

The 11-Step Pipeline

From OSM extract to GeoParquet output

  1. Extract winter_sports – Pull ski areas and winter sport facilities from OSM
  2. osm_nearby – Extract OSM features within ~2 km of each ski area
  3. lifts and pistes – Extract lift lines and piste (trail) geometries
  4. enrich – Add boundaries, administrative data, and enrich attributes
  5. analyze – Compute statistics (trail counts, elevation, area, etc.)
  6. parquet – Export to GeoParquet format for compact storage and fast reads
  7. 1000 ft buffer – Build a buffer polygon around each ski area for mapping
  8. translate – Add or fill English names for resort display
  9. elevation / contours – Attach elevation and contour data per ski area
  10. re-export CSV – Regenerate analyzed CSV with elevation and final fields
  11. combine_regions – Merge all regional outputs into one global dataset

Regions & Deployment

Scale by region, merge globally

Regions are defined in config/regions.yaml. Large areas (Europe, North America, Asia) are split into countries, states, or sub-regions so each run stays manageable. After processing, we combine regional outputs into a single global dataset using our combine_regions script.

The pipeline runs either locally with Docker or on AWS ECS Fargate for continent-wide batch jobs. Full Europe or North America runs take roughly 5–8 hours each.

View globalskiatlas_data on GitHub
World map with ski regions across continents
Ski trail map with lifts and pistes

Datasets

GeoParquet format — use with Pandas, DuckDB, GeoPandas

Each file has embedded geometry. Download below:

ski_areas_analyzed.parquet Ski areas with stats. Used by the Online Atlas.
Download
ski_areas.parquet Ski area polygons and core attributes.
Download
lifts.parquet Lift lines as LineStrings with OSM attributes.
Download
pistes.parquet Piste lines with difficulty as LineStrings.
Download

Why Iceberg & AWS Glue?

So lots of people and apps can use the same data without stepping on each other’s toes.

We have millions of rows about ski areas, lifts, and trails. If we only kept them in one big file, only one person could update them at a time, and it’d be easy to overwrite someone else’s work.

Apache Iceberg is like a tidy filing system in the cloud: it keeps the data in chunks, tracks changes over time, and lets many tools read or write without breaking anything. AWS Glue is the “card catalog” that tells everyone where to find those files—so data scientists, apps, and this website can all use the same tables without getting lost.

Together they give us one shared source of truth for ski data that stays consistent and is easy to query. The numbers below are live from that system.

How we query it (query_iceberg.py)
Live Iceberg stats
From /api/iceberg-stats (Lambda)
Loading…

Further Reading

Pipeline docs in the globalskiatlas_data repo

  • LOCAL_WORKFLOW.md – Run the pipeline locally with Docker
  • RUN_BY_REGION.md – Region layout, PBF sizes, OOM avoidance
  • WORLD_SCALE.md – Roadmap for world-scale data and serving
  • AWS_ECS_DEPLOYMENT.md – Deploy to AWS ECS Fargate and S3
View globalskiatlas_data on GitHub
Cozy ski lodge with documentation