I got it working with the `duckdb` terminal tool like this:
INSTALL httpfs;
LOAD httpfs;
ATTACH 'https://data.opentimes.org/databases/0.0.1.duckdb' AS opentimes;
SELECT origin_id, destination_id, duration_sec
FROM opentimes.public.times
WHERE version = '0.0.1'
AND mode = 'car'
AND year = '2024'
AND geography = 'tract'
AND state = '17'
AND origin_id LIKE '17031%' limit 10;
Lately duckdb is becoming a serious competitor for my use of datasette because it's eliminating a step for most of my workflows - converting csv to sqlite.
I've been thinking about how to swap it in as a backend for datasette (maybe as a plugin?) but it seems inherently riskier as it needs to at very least be able to read a folder to list all the csvs available for my usecase. If I could hook that up with its native s3 support I'd be unstoppable (at work)
Thanks! I hadn't seen anyone do it this way before with a very large, partitioned dataset, but it works shockingly well as long as you're not trying to `SELECT *` the entire table. Props to the DuckDB folks.
Eventually I plan to add some thin R and Python wrapper packages around the DuckDB calls just to make it easier for researchers.
Very cool! I love the interactive map, but have a couple UX suggestions:
I usually expect most features of a map to be zoom invariant, with the exception of level of detail. Having the colormap change is surprising, particularly that longer time buckets simply disappear as I zoom in. The two problems with this are that any time I zoom in or out I now have to double check the color key in case it's changed, and if I want to find a travel time for something far away I need to zoom in to find the destination and then back out to see the travel time. Perhaps you can let the user manually choose the colormap granularity/range, or find some way to have a colormap that works at all scales?
Second suggestion, related, is to display the travel time next to the geography ID in the bottom left corner. This would mitigate the issues with getting a good colormap, since I can then just hover over a geography to get its time anywhere that the colormap isn't sufficient.
I played around with a single static colormap for all scales but couldn't find one that worked well/looked good. Perhaps I'll add a slider that lets you manually select the values of the legend.
The second suggestion is a no-brainer. I'll definitely get that added.
Amazing! GitHub actions to compute a giant skim matrix is an incredible hack.
I pretty regularly work with social science researchers who have a need for something like this... will keep it in mind. For a bit we thought of setting something like this up within the Census Bureau, in fact. I have some stories about routing engines from my time there...
Thank you for this excellent post! I've been developing [my own platform](https://github.com/MattTriano/analytics_data_where_house) that curates a data warehouse mostly of census and socrata datasets but I haven't really had a good way to share the products with anyone as it's a bit too heavyweight. I've been trying to find alternate solutions to that issue (I'm currently building out a much smaller [platform](https://github.com/MattTriano/fbi_cde_data) to process the FBI's NIBRS datasets), and your post has given me a few great implementations to study and experiment with.
This is a fantastic project , and well done! How are you thinking about implementing traffic (it feels like that’s obviously essential for most practical use cases)?
Also just so I’m clear, is the shading corresponding to a time the represent mean travel time for the area, or the max travel time? E.g. if you pick Walking, and a neighborhood next to Central Park in New York City, even Even though parts of Central Park are directly adjacent to that neighborhood and should be a short travel time. The entirety of Central Park is shaded with a longer travel time, Does this mean I could theoretically get to the furthest point away from the selected area in Central Park in that travel time or is that more of like a mean travel time within that area and some travel times will be shorter or longer?
Travel time context in general could be useful for retrieval before ranking in searches like Yelp or Google maps like products for nearby events and places. My MVP use case is to configure a rerank based on commute "score".
For example there is just no way I'm going to commute from Brooklyn to Manhattan on a Monday night to eat out.
I live across the river from Manhattan and what I never understood was why yelp and Google maps upranks restaurants across the river on the other island even though it is highly improbable I would go there.
This is a wonderful project. Seemingly simple on the surface. I'd love to see some notes of the frontend implementation. I see there's OpenFreeMap as presumably the base map, which uses MBTiles. Then custom geometry on top from Pmtiles, that I assume is generated for the project. How the colormapping is done I didn't find yet. Actually lots to unpack here.
This is great! I've been thinking about building something like this for ages since I started using Smappen [0] for mapping travel times on road trips. Super useful way to travel if you're on an open-ended trip with flexibility.
Cool project! For me, seeing commutes time for 1 and 2 standard deviations would be useful too. Many times, people like to convey the lowest travel times, when I am more interested in the amount of time I need to budget to travel between two places to be on time 95% of the time.
I’ve been meaning to make a much more searchable realty system (think Redfin with significantly more filters) and can look across the whole country not just in a specific zip code. This would be awesome as you could compute travel time between points roughly then filter out results more accurately
* some islands seem hamstrung by the approach - see Vashon Island for example.
* curious what other dataset you might incorporate for managing next level of magnitude smaller trips - e.g. getting a quarter mile to the store for a frozen pizza at the seventh inning stretch.
This is an impressive project! Providing free access to such a massive dataset of pre-computed travel times could be a game-changer for researchers and policymakers who rely on spatial accessibility data but often face high costs with commercial providers.
The technical approach is also fascinating—using static Parquet files on R2 instead of a traditional database keeps costs low while maintaining accessibility via SQL and DuckDB. Offloading computation to GitHub Actions is a clever way to handle large-scale processing efficiently.
It'll be interesting to see how this evolves, especially with the potential addition of traffic-aware travel times. Great work!
I actually tried a couple different engines before landing on OSRM. I started with R5 (since it can also do public transit) then switched to Valhalla.
The main limiting factor was speed. Basically all routing engines except for OSRM are too slow to compute continent-scale travel time matrices. For reference, it took Valhalla around a week to finish 3 billion point-to-point pairs. OSRM did the same calculation in about 2 hours.
I can't speak to Graphhopper since I haven't tried it. Maybe something to test in the future!
Yeah OSRM precomputes routes so if you just need the same mode of transportation and not dynamic params (like avoid tolls on this route, etc) it's gonna be a lot faster. Valhalla was designed for flexible/dynamic routing
Makes sense! Heads up that Graphhopper’s matrix algorithm isn’t open sourced so probably won’t work for this use case. I’ve had good experiences with it otherwise.
In the next year or so maybe. The biggest obstacles to adding public transit are:
- Collecting all the necessary scheduling data (e.g. GTFS feeds) for every transit system in the county. Not insurmountable since there are services that do this currently.
- Finding a routing engine that can compute nation-scale travel time matrices quickly. Currently, the two fastest open-source engines I've tried (OSRM and Valhalla) don't support public transit for matrix calculations and the engines that do support public transit (R5, OpenTripPlanner, etc.) are too slow.
Can you just compute the travel times from each bus/train/tram stop and publish the individual or merged travel times? I guess the merging would be the hard part
I feel like this would have to come from Apple/Google, due to them having numerous real time datapoints on the time it takes from the origin public transit stop to the destination transit stop.
And then they could determine the variance at certain times of days for certain public transit routes, and show your likelihood of reaching the destination by a certain time.
It's just using OpenStreetMap tags for routing, so if a bridge is impassible by foot/bike according to OSM, then it won't be able to route there. See the Verrazano-Narrows Bridge in New York as an example.
Ah okay, so it doesn't take into account automobile travel? While the bridge is certainly walkable with a heavy asterisk, the route between towns is highly inadvisable on foot (though doable on bicyclists, there's another discussion), with the rural highway sometimes not having a shoulder.
OK the way you're publishing the data with Parquet and making it accessible through DuckDB is spectacular.
Your README shows R and Python examples: https://github.com/dfsnow/opentimes?tab=readme-ov-file#using...
I got it working with the `duckdb` terminal tool like this:
Lately duckdb is becoming a serious competitor for my use of datasette because it's eliminating a step for most of my workflows - converting csv to sqlite.
I've been thinking about how to swap it in as a backend for datasette (maybe as a plugin?) but it seems inherently riskier as it needs to at very least be able to read a folder to list all the csvs available for my usecase. If I could hook that up with its native s3 support I'd be unstoppable (at work)
I have a medium term ambition to make Datasette backends a plugin mechanism, and the two I am most excited about are DuckDB and PostgreSQL.
Thanks! I hadn't seen anyone do it this way before with a very large, partitioned dataset, but it works shockingly well as long as you're not trying to `SELECT *` the entire table. Props to the DuckDB folks.
Eventually I plan to add some thin R and Python wrapper packages around the DuckDB calls just to make it easier for researchers.
I blogged a few more notes here: https://simonwillison.net/2025/Mar/17/opentimes/
Nice! I know a couple of projects that have been using this pattern.
- https://bsky.app/profile/jakthom.bsky.social/post/3lbarcvzrc...
- https://bsky.app/profile/jakthom.bsky.social/post/3lb4y65z24...
- https://skyfirehose.com
Love this distribution pattern. Users can go to the Parquet files or attach to your "curated views" on a small DuckDB database file.
Very cool! I love the interactive map, but have a couple UX suggestions:
I usually expect most features of a map to be zoom invariant, with the exception of level of detail. Having the colormap change is surprising, particularly that longer time buckets simply disappear as I zoom in. The two problems with this are that any time I zoom in or out I now have to double check the color key in case it's changed, and if I want to find a travel time for something far away I need to zoom in to find the destination and then back out to see the travel time. Perhaps you can let the user manually choose the colormap granularity/range, or find some way to have a colormap that works at all scales?
Second suggestion, related, is to display the travel time next to the geography ID in the bottom left corner. This would mitigate the issues with getting a good colormap, since I can then just hover over a geography to get its time anywhere that the colormap isn't sufficient.
These are great suggestions, thank you!
I played around with a single static colormap for all scales but couldn't find one that worked well/looked good. Perhaps I'll add a slider that lets you manually select the values of the legend.
The second suggestion is a no-brainer. I'll definitely get that added.
Maybe it sticks to whatever colors are initially shown, and there's a button to "reset colors"?
Amazing! GitHub actions to compute a giant skim matrix is an incredible hack.
I pretty regularly work with social science researchers who have a need for something like this... will keep it in mind. For a bit we thought of setting something like this up within the Census Bureau, in fact. I have some stories about routing engines from my time there...
Totally! I've been using this pattern a lot and recently wrote about it.
Thank you for this excellent post! I've been developing [my own platform](https://github.com/MattTriano/analytics_data_where_house) that curates a data warehouse mostly of census and socrata datasets but I haven't really had a good way to share the products with anyone as it's a bit too heavyweight. I've been trying to find alternate solutions to that issue (I'm currently building out a much smaller [platform](https://github.com/MattTriano/fbi_cde_data) to process the FBI's NIBRS datasets), and your post has given me a few great implementations to study and experiment with.
This is a fantastic project , and well done! How are you thinking about implementing traffic (it feels like that’s obviously essential for most practical use cases)?
Also just so I’m clear, is the shading corresponding to a time the represent mean travel time for the area, or the max travel time? E.g. if you pick Walking, and a neighborhood next to Central Park in New York City, even Even though parts of Central Park are directly adjacent to that neighborhood and should be a short travel time. The entirety of Central Park is shaded with a longer travel time, Does this mean I could theoretically get to the furthest point away from the selected area in Central Park in that travel time or is that more of like a mean travel time within that area and some travel times will be shorter or longer?
Looks cool. Please allow high max zoom levels, it’s hard to see individual street details on mobile.
Good point, will do!
Travel time context in general could be useful for retrieval before ranking in searches like Yelp or Google maps like products for nearby events and places. My MVP use case is to configure a rerank based on commute "score".
For example there is just no way I'm going to commute from Brooklyn to Manhattan on a Monday night to eat out.
I live across the river from Manhattan and what I never understood was why yelp and Google maps upranks restaurants across the river on the other island even though it is highly improbable I would go there.
I love how much this affects people on dating apps too, lol.
This is a wonderful project. Seemingly simple on the surface. I'd love to see some notes of the frontend implementation. I see there's OpenFreeMap as presumably the base map, which uses MBTiles. Then custom geometry on top from Pmtiles, that I assume is generated for the project. How the colormapping is done I didn't find yet. Actually lots to unpack here.
This is great! I've been thinking about building something like this for ages since I started using Smappen [0] for mapping travel times on road trips. Super useful way to travel if you're on an open-ended trip with flexibility.
[0] https://www.smappen.com/
Would be great to have multiple layers of times so you can plan out possible routes over days.
Cool project! For me, seeing commutes time for 1 and 2 standard deviations would be useful too. Many times, people like to convey the lowest travel times, when I am more interested in the amount of time I need to budget to travel between two places to be on time 95% of the time.
I’ve been meaning to make a much more searchable realty system (think Redfin with significantly more filters) and can look across the whole country not just in a specific zip code. This would be awesome as you could compute travel time between points roughly then filter out results more accurately
Well done, dfsnow!
* some islands seem hamstrung by the approach - see Vashon Island for example.
* curious what other dataset you might incorporate for managing next level of magnitude smaller trips - e.g. getting a quarter mile to the store for a frozen pizza at the seventh inning stretch.
This is an impressive project! Providing free access to such a massive dataset of pre-computed travel times could be a game-changer for researchers and policymakers who rely on spatial accessibility data but often face high costs with commercial providers.
The technical approach is also fascinating—using static Parquet files on R2 instead of a traditional database keeps costs low while maintaining accessibility via SQL and DuckDB. Offloading computation to GitHub Actions is a clever way to handle large-scale processing efficiently.
It'll be interesting to see how this evolves, especially with the potential addition of traffic-aware travel times. Great work!
Only $10/mo? Where can we go to cover a month
This is super cool - I definitely have a lot to learn from this. I can probably use this for my project theretowhere.com
How did you decide on routing engine? I’ve used Graphhopper in the past — is OSRM an improvement?
I actually tried a couple different engines before landing on OSRM. I started with R5 (since it can also do public transit) then switched to Valhalla.
The main limiting factor was speed. Basically all routing engines except for OSRM are too slow to compute continent-scale travel time matrices. For reference, it took Valhalla around a week to finish 3 billion point-to-point pairs. OSRM did the same calculation in about 2 hours.
I can't speak to Graphhopper since I haven't tried it. Maybe something to test in the future!
Yeah OSRM precomputes routes so if you just need the same mode of transportation and not dynamic params (like avoid tolls on this route, etc) it's gonna be a lot faster. Valhalla was designed for flexible/dynamic routing
It precomputes partial routes that are combined at run time. :)
Makes sense! Heads up that Graphhopper’s matrix algorithm isn’t open sourced so probably won’t work for this use case. I’ve had good experiences with it otherwise.
Any plans on adding public transit?
In the next year or so maybe. The biggest obstacles to adding public transit are:
- Collecting all the necessary scheduling data (e.g. GTFS feeds) for every transit system in the county. Not insurmountable since there are services that do this currently.
- Finding a routing engine that can compute nation-scale travel time matrices quickly. Currently, the two fastest open-source engines I've tried (OSRM and Valhalla) don't support public transit for matrix calculations and the engines that do support public transit (R5, OpenTripPlanner, etc.) are too slow.
Can you just compute the travel times from each bus/train/tram stop and publish the individual or merged travel times? I guess the merging would be the hard part
I would imagine actual nation scale public transit would be relatively difficult since even in the Northeast transit networks are often not contiguous
I feel like this would have to come from Apple/Google, due to them having numerous real time datapoints on the time it takes from the origin public transit stop to the destination transit stop.
And then they could determine the variance at certain times of days for certain public transit routes, and show your likelihood of reaching the destination by a certain time.
The engineering behind it is very clever. Well done!
Very cool! What did you use to make the ER diagram?
I'm not OP, but he used mermaid.js's Entity Relationship Diagram model [0]. GitHub renders it automatically [1].
[0] https://mermaid.js.org/syntax/entityRelationshipDiagram.html
[1] https://github.blog/developer-skills/github/include-diagrams...
This map makes Des Moines, IA seem appealing.
Does something like this exist for Europe?
I don't know.... It tells me that by car I can get someplace (that's geography close!) in < 15 min, but it's always taken me 20...
Mentions pretty clearly it doesn't include traffic data since that's all proprietary vendors
well done
It seems that it ignores bridges over rivers making the travel time wildly inaccurate.
It's just using OpenStreetMap tags for routing, so if a bridge is impassible by foot/bike according to OSM, then it won't be able to route there. See the Verrazano-Narrows Bridge in New York as an example.
Ah okay, so it doesn't take into account automobile travel? While the bridge is certainly walkable with a heavy asterisk, the route between towns is highly inadvisable on foot (though doable on bicyclists, there's another discussion), with the rural highway sometimes not having a shoulder.
Not anywhere I checked.