Prerequisites

1 Introduction

Welcome to the Mobile Tartu 2026 Transport Data Science workshop! This document outlines the prerequisites for the workshop, and is intended as homework to be completed before the session. If you have any issues completing the prerequisites, please get in touch on github.com/tdscience/tartu26 or by sending the organiser (Robin Lovelace) an email.

There are two main ways to run the workshop materials:

  1. Local setup on your computer (recommended, see Section 2).
  2. Cloud-based setup using GitHub Codespaces, Posit Cloud, or similar services (as a backup if you cannot get the software installed locally, see Section 3).

Within each approach, there are various options for software and data setup.

Which to use?

  • We recommend the local setup on your own machine so you can run the code directly, learn how to manage the required software, and have everything configured for future use.
  • If you run into issues installing the software locally, do not have admin rights to install tools like Docker, or have a machine with limited hardware resources, the cloud-based GitHub Codespaces option serves as an excellent, pre-configured backup that runs entirely in your browser.

Whichever option you choose, ensure you have a good handle on the software and data setup before the workshop to make the most of the session.

At a minimum, you should be able to reproduce the figure shown at the end of this document, which is an interactive flow map of Seville using the spanishoddata package.

If you’re just curious and want to dive in with minimal set-up, click the link below to fire-up a cloud-based environment to run the code:

Open in GitHub Codespaces

For more information, see the detailed cloud setup instructions in Section 3.

2 Local Setup

If you prefer to run the workshop materials on your own computer, we recommend a machine with a minimum of 16 GB RAM due to the size of the datasets we will be processing.

The workshop materials support both R and Python development paths. You can choose the setup method that best fits your preference:

For local setup, we recommend using the GitHub CLI (gh) to clone the repository and interact with GitHub from your terminal.

gh repo clone tdscience/tartu26

2.1 Option A: Local devcontainer (recommended if you have admin rights to install Docker)

Click to view setup steps

This is the most reliable way to ensure a consistent environment. The repository includes a pre-configured .devcontainer that sets up all dependencies (R, Python, Quarto, VS Code extensions) inside an isolated container (the same environment used in the cloud-based backup option).

To use this option:

  1. Ensure you have Docker Desktop and VS Code installed on your machine.
  2. Install the Dev Containers extension in VS Code.
  3. Clone the repository and open the folder in VS Code.
  4. Press Ctrl+Shift+P (or Cmd+Shift+P on macOS) to open the Command Palette, and select Dev Containers: Reopen in Container.

2.2 Option B: Local R/RStudio installation

Click to view setup steps

If you prefer a native R setup on your machine, please ensure you have the required runtimes, IDE, and tools installed.

2.2.1 Software Requirements

  • R (>= 4.3): The language runtime.
  • RStudio or Positron: The IDE.
  • Quarto CLI: Used to compile and render the workshop documents and workbook.
  • Git: Version control software to manage code and download files.

2.2.2 R Packages

Install the necessary packages for the workshop using the pak package (recommended):

install.packages("pak")
pak::pak("tdscience/tartu26")

Alternatively, you can install them manually using:

Manual installation instructions

If you don’t have {pak} installed, you can use install.packages() to install the required packages. Running the following code is equivalent to using pak to install the dependencies:

install.packages(c(
  "tidyverse",
  "sf",
  "tmap",
  "osmdata",
  "spanishoddata",
  "flowmapblue",
  "fs",
  "htmlwidgets"
))

2.3 Option C: Local Python/VS Code installation

Click to view setup steps

If you prefer a native Python setup on your machine, we use a modern, fast conda/mamba alternative package manager called Pixi to quickly install and manage all dependencies.

2.3.1 Software Requirements

  • Python (>= 3.10): The language runtime.
  • VS Code or Positron: The IDE.
  • Quarto CLI: Used to compile and render the workshop documents and workbook.
  • Git: Version control software to manage code and download files.
  • Pixi: A fast package manager. Follow the Pixi installation instructions for your operating system.

2.3.2 Installing Dependencies and Running Python

Once Pixi is installed, open your terminal in the cloned repository directory. You can initialize the project environment and run Jupyter Lab with:

pixi run jupyter lab

Or you can use VS Code’s Jupyter extension and select the environment created by Pixi.

We use DuckDB’s Python interface directly to handle and query large transport datasets highly efficiently, as demonstrated in the separate demo-py.qmd file.

2.3.3 Rendering Quarto Documents with Pixi

To render or preview the Quarto files locally using the Python environment managed by Pixi (which contains nbclient and all data science dependencies), prefix your Quarto commands with pixi run:

pixi run quarto render demo-py.qmd
pixi run quarto preview

Alternatively, you can activate the Pixi environment shell first and then run Quarto normally:

pixi shell
quarto render demo-py.qmd

2.4 Data

We will use open datasets throughout the workshop. All required data will be downloadable during the sessions, but you may want to download the larger datasets in advance to save time.

See the releases page for links to the datasets used in the workshop.

You are welcome to bring your own origin-destination and network data if you wish to follow along and implement the concepts with your own datasets. There will be less support for custom datasets. You are encouraged to experiment with your own data after the workshop, and have fun!

3 Cloud Setup (Backup Option)

If you cannot install the required software locally on your computer, the cloud-based option allows you to run the complete workshop environment entirely in your web browser using GitHub Codespaces.

3.1 GitHub

To use GitHub Codespaces or access repository resources, you must have a GitHub account. Go to GitHub and sign-up if you haven’t already done so.

Tip🎓 GitHub Academic Benefits (Free GitHub Pro & Copilot)

If you are a student, researcher, or teacher at an accredited academic institution, you can sign up for the GitHub Global Campus program to get free premium benefits:

  • GitHub Pro: Upgraded account limits, including more GitHub Codespaces hours and storage (ideal for running data-intensive environments like this workshop).
  • GitHub Copilot: Free access to GitHub’s AI developer tool to write code, debug, and learn faster.
  • Developer Packs: Free access to top-tier developer tools, cloud hosting credits, and subscriptions.

You can apply with your academic email address or school ID at education.github.com.

After you have a GitHub account, you can launch the workspace in GitHub Codespaces by clicking the badge below. Note: it takes around 5 minutes to set up the Codespace environment, so be patient, read-up on some of the links in this website, or get a coffee!

Open in GitHub Codespaces

When you click on that button, assuming you’re signed in and do not have any existing codespaces associated with the repo already running, you should see a page with default settings for the codespace. You can leave them all as is except for machine type.

We will be handling a lot of data so select the 4-core option. This gives you 16 GB RAM on a machine with 60 hours of free use per month. That should be plenty for processing the datasets we’ll using in this tutorial. Go for a larger option if you want to import larger datasets but be warned that your compute credits will burn down quicker the larger the machine you’re on, as described in the GitHub Codespaces pricing docs.

About GitHub

While you wait for that to load, it’s worth learning a bit about what GitHub is: it’s the world’s number one platform for hosting and collaborating on code, especially for international and open-source projects, although you can easily host ‘private repos’ also, allowing you to choose who can see and collaborate on your work. Key concepts in GitHub include:

  • Repositories, also called repos. These are like folders for your projects, where you can store code, data, and documentation. The workshop materials are hosted in a GitHub repository. See the GitHub repository for the workshop at github.com/tdscience/tartu26 for more details.
  • Issues. These are like to-do lists or bug trackers for your code. You can use them to keep track of tasks, bugs, or feature requests for your projects. See the Issues tab in the workshop repository and open a test issue if you want to practice, it’s a great way to communicate, learn, get feedback and collaborate with others.
  • Pull requests. These are like proposals for changes to your code. You can use them to suggest changes, review code, and merge changes into your projects. See the Pull Requests tab in the workshop repository for more details. Bonus: open a PR making a change to the README file, it’s a great way to practice and learn about how PRs work (see the GitHub documentation on pull requests for more details).
  • Commits. These are like snapshots of your code at a given point in time.
  • Branches. These are like different versions of your code that you can work on separately and then merge together.

You can run the code in any GitHub repo in Codespaces by clicking on the green “Code” button and selecting “Open with Codespaces”, as described in the GitHub documentation and shown below. Try to see the options in the green “Code” button for the workshop repository, and try cloning locally or downloading the zip file to see the different options available as a bonus. The “Codespaces” option is the one to launch the cloud-based environment for the workshop, while the “Download ZIP” option allows you to download the repository files to your local machine without using Git.

The “Local” clone — with HTTPS, ssh or GitHub CLI options — are for cloning the repository to your local machine, which is not required if you are using Codespaces, but may be useful if you want to work on the materials locally, as described in Section 2.

After the Codespace has finished ‘spinning-up’, you will see something like this:

A successfully loaded GitHub Codespaces session for this repository, with the file explorer on the left, a README preview in the editor, and a ready-to-use terminal at the bottom.

There are a few things worth noting about the Codespace environment shown above:

  1. The repository files are visible in the Explorer pane on the left (including .devcontainer, Quarto files, and workshop documents), confirming that the workspace has loaded correctly.
  2. The editor opens files directly in the browser-based VS Code interface. In the screenshot, README.md is shown in preview mode.
  3. The integrated terminal at the bottom is ready to run commands in the project directory (/workspaces/tartu26). This is where you can run workshop code and render Quarto documents.
  4. If your environment took around 5 minutes to start, this is expected for first launch: once loaded, you can proceed with the exercises normally.
  5. You have a unique Codespace URL (e.g. https://tdscience-tartu26-abc123-xyz456.codespaces.github.com) that you can bookmark and return to later, and share with others if you want to collaborate.

From that point, you’re almost ready to start writing and running code for the workshop! It’s worth being aware of a few different ways to run code interactively in the Codespace environment.

Ways to run code in GitHub Codespaces:

  1. Terminal: You can run any command-line code directly in the integrated terminal. This is useful for running scripts, installing packages, or executing commands that don’t require an interactive environment. To initiate an R session, simply type R in the terminal and press Enter. For Python, type python and press Enter.
  1. Quarto Documents (recommended): You can open existing or new .qmd files in the editor and run code in the chunks interactively by typing Ctrl+Enter (or Cmd+Enter on Mac) to run the code line-by-line or even selected parts of code. You can also use the “Run” buttons that appear when you hover over code chunks. This allows you to execute code and see results inline, which is great for learning and experimentation.
  1. Jupyter Notebooks: If you are familiar with Jupyter notebooks, these can be used in the same way as Quarto documents, allowing you to run code interactively and see results inline.

  2. quarto render and quarto preview: For rendering the entire Quarto site, you can use the terminal to run quarto preview to generate dynamic outputs (e.g. HTML), or quarto preview <filename.qmd> to render and open a live preview in the browser.

To check that the code runs for this tutorial, open the demo.qmd file in the editor and run the code chunks interactively using Ctrl+Enter (or Cmd+Enter on Mac) to execute the code line-by-line.

If it worked it should look something like this:

Congratulations if so, you have completed the prerequisites for the workshop and are ready to start learning about transport data science with R and Python!

Other cloud-based environments Other cloud services are available. One that is well-suited for data science is Posit Cloud, but unfortunately has a maximum session RAM use of only 1GB, which is not enough for the datasets we’ll be using in this workshop. For these cloud services, you may still need to install the required packages the first time you launch a new session (see the R Packages section).

4 The outcome: an interactive flow map of Seville

After running the code shown below, from demo.qmd, you should see an interactive flow map of Seville in the Viewer pane of your IDE.

See an interactive version of the map by downloading and then releasing the seville_flowmap_embed.html file from the github.com/tdscience/tartu26/releases page.

Reuse