Tools and Skills for Reproducible Transport Research

Day 1

Robin Lovelace

University of Leeds

Juan Fonseca

University of Leeds

September 8, 2025

Introduction

Course overview (see schedule)

Day 1

  • 09:30-10:00 Introduction
  • 10:00-11:00 Development environments, system commands, and version control
  • 11:15-12:30 Sharing code and data
  • 13:30-15:00 Reproducible papers and documentation with Quarto
  • 15:15-16:30 Cross-references and citations with Quarto

Day 2

  • 09:30-10:30 Drafting a reproducible paper
  • 10:45-12:30 Generating reproducible publication-quality visualisations
  • 13:30-14:30 Editing other people’s work
  • 14:45-16:00 Working on papers -> Presentations and wrap-up

Housekeeping

WiFi (if eduroam is not working):

  • SSID: Eventos IGOT
  • Password: sAfrcutm

Coffe will be downstairs

Toilets:

  • Women’s in front of Room 2.1
  • Men’s by the stairs and

About us

Robin Lovelace

  • Professor of Transport Data Science
  • Focus: influencing decision-making, to make it more evidence-based
  • R package developer and data scientist
  • New methods for more reproducible, data-driven and participatory transport planning

Juan Fonseca

  • PhD student at the University of Leeds
  • Focus: fast and flexible models for estimating traffic
  • Developer of {Telraamr} and {azuremapsr} R packages, the latter of which was recently published on CRAN

About you

  • Name
  • What tools you currently use for research
  • Where you’re from
  • What’s your favourite animal
  • A random fact about you

Me

  • Currently using VS Code, Quarto, R, Google Gemini etc, Devcontainers
  • From Herefordshire, UK
  • Favourite animal: Red kite
  • Random fact: I run 5 km every Saturday with double buggy

Over to you

The origins of the course

“If only I was told this ealier in my career”

Imagine a workflow that enabled:

  • Fewer context switches
  • More focus on the content and not style of the work
  • Integration of code into your research manuscript
  • Automatic generation of results, including figures and tables…
    • That change seamlessly when input datasets or code changes
  • Control over how you export and publish your work
    • Including publication-quality PDFs, website, blogs, slides
  • Full reproducibility
  • You to share your work for maximum benefit to others

Reproducible research

“Research is considered to be reproducible when the exact results can be reproduced if given access to the original data, software, or code.” Source: displayr.com

Stages of open and reproducible science

  1. Open access to the publications

  2. Open access to sample (synthetic if sensitive) data

  3. Open access to the code

  4. Fully reproducible paper published with documentation

  5. Project deployed in tool for non-specialist use

Example: rs5c conference slides

See slides website: robinlovelace.github.io/rs5c/ (source code github.com/robinlovelace/rs5c)

Example of reproducible research: networkmerge

See paper website: https://nptscot.github.io/networkmerge/ Source: github.com/nptscot

Example: biclaR

See biclar.tmobilidad.pt source code: github.com/u-shift (Félix, Moura, and Lovelace 2025).

Course principles

  • “Learn by doing”
  • “Learn by teaching”
  • “Learn from each other”
  • “We’re all learning”
  • “Growth mindset”
  • “Can-do” and “Go for It” attitude
  • “Every error is a learning opportunity”
  • “No such thing as a bad question”
  • “Fail fast”
  • Balance between focused work and comms
  • Bring you own principles (BYOP)

The practical sessions

  • Time of in-depth work
  • Use the course website as a reference point but spend most of the time in your own environment
  • Juan and I will support people 1-2-1 and do ‘live demos’ now and then

Tip

Press Ctrl+Tab to switch from IDE to browser with course content and other things for an efficient workflow

Any questions before we move to the first practical session?

Session 1 introduction (see Session 1 workbook)

Session 1 in context

  • 09:30-10:00 Introduction
  • 10:00-11:00 Development environments, system commands, and version control
  • 11:15-12:30 Sharing code and data
  • 13:30-15:00 Reproducible papers and documentation with Quarto
  • 15:15-16:30 Cross-references and citations with Quarto

Day 2

  • 09:30-10:30 Drafting a reproducible paper

Which IDE?

Table 1: Comparison of IDEs for Data Science
Attribute RStudio Positron VS Code
Languages R R + Python out-of-the-box Any (with extensions)
Status Mature ✅ Under development 🏗️ Mature ✅
Setup time Minimal ✅ Minimal ✅ Extensions needed ⚠️
Quarto Excellent ✅ Excellent ✅ Excellent (extension) ✅
Devcontainers
Live Share
Extensions Limited (via Addins) ⚠️ High (OpenVSX) ✅ High (Marketplace) ✅
Codespaces Limited ⚠️ Limited ⚠️
License Restrictive open (AGPL) ✅ Source-available (Elastic) ⚠️ Open core (binaries closed) ⚠️
AI integration Limited ⚠️ High (various extensions) ✅ High (various extensions) ✅

Which to use? Open to debate!

Source: Fosstodon

Source: Reddit

Which language to use?

Saw a post from an influencer telling followers to “stop using R for anything – use Python like a normal person”

As a 10+ year R & Python user, the irony is that 2025 is the best time EVER to be using R.

Here’s why:

[image or embed]

— Kyle Walker ((kylewalker.bsky.social?)) September 1, 2025 at 3:13 PM

Bonus: Live demo of VS Code (time permitting)

Git and the GitHub CLI

Principle: the command-line is better than the graphical user interface (CLI > GUI)

  • Using a GUI may allowing you to do something quicker the first time but will slow you down in the long-run

  • CLI: hard first time1 but will save time in long term

  • The relationship between git and gh tools

    • git is a long-established version-control system with many commands
    • gh is a high-level interface to git and the GitHub platform

Solo working through the practical (until ~11:00)

Any questions before the coffee break (11:00-11:15)?

Put your hands up, ask another participant, or use the github.com/tdscience/course/discussions

Session 2: Sharing code and data (see Session 2 workbook)

Key GitHub concepts and workflows

Finding repositories on GitHub

Use the search bar to discover projects and developers:

Or with the gh CLI tool 😎

Repository structure

Every GitHub repo follows a similar layout with tabs for Code, Issues, Pull requests, etc.:

Figure 2

Creating repositories

You can create repos from scratch or existing folders:

The repository creation window lets you set name, description, and visibility:

Cloning and working locally

Clone repos to work on them locally:

Git workflow basics

Version control with commits, branches, and merges:

Git Workflow diagram

Making and committing changes

Stage and commit your work:

Pushing to GitHub

Share your changes with the world:

Exercises: Solo working through the practical (until ~12:30)

Félix, Rosa, Filipe Moura, and Robin Lovelace. 2025. “Reproducible Methods for Modeling Combined Public Transport and Cycling Trips and Associated Benefits: Evidence from the biclaR Tool.” Computers, Environment and Urban Systems 117 (April): 102230. https://doi.org/10.1016/j.compenvurbsys.2024.102230.