In this application, we compare the speed of five different data-processing tools in R. Pressing the "Run" button on the "Speed Test" page initiates data-processing by the different tools. Processing speed is presented using a custom-built JavaScript-based "Speedometer" widget.
Data-processing is performed ten times for each tool. The duration for each iteration of the processing pipeline is stored. The speedometer presents "Iterations per second" in three different ways:
The tools and processing approaches in use are:
The dataset used is a subset of the taxi-journey dataset from nyc.gov . We downloaded the Yellow-Taxi data for 2024 and filtered to keep those journeys where the journey began and ended at one of the three airports in the original dataset (Newark, La-Guardia or JFK).
That filtered dataset was stored in three separate formats: a tibble, a
parquet dataset and a SQLite database. These files were uploaded as
'pins' to our Posit Connect server for
use within the application. The SQLite database was created locally and
uploaded using the other files were uploaded using
pins::pin_write()
. For the parquet dataset, this resulted in a
single file so no benefit could be gained from file-level partitioning of the
parquet dataset.
The analysis performed by the four tools is as follows on the 178k row, 22 column dataset
The resulting data (24 rows, 6 columns) was converted to a data-frame for consistency.
So that data import and conversion are not included in the data processing speeds, the datasets are ingested or created prior to data processing by the application. That is, when the application starts, the tibble and parquet datasets are read in from the pin board, a connection to the SQLite database is created and a data.table is created from the tibble; and the length of time these steps take does not contribute to the speed comparisons.
Data is passed from Shiny to the browser using the R function
sendCustomMessage
provided by Shiny and then read on the front end using the JavaScript function
Shiny.addCustomMessageHandler
, also provided by Shiny.
Results for a given algorithm are then aggregated and visualised in the form of a custom-built moving Speedometer, created using the d3 JavaScript library.