Exploring correlations in large datasets with SQL in Node.js

How quickly we can uncover interesting insights by executing analytics queries with the DuckDB embedded SQL engine, just using Node.js?

Project Setup

$ mkdir nyc-taxi-correlations && cd nyc-taxi-correlations
$ yarn init -y
$ yarn add -D typescript ts-node nodemon
$ yarn add node-duckdb
✨ Done in 11.66s.
$ mkdir data src
$ code .
$ yarn nodemon -e ts --exec ts-node src/run

Getting to Work

The Correlation Function

const r = (x: string, y: string) => 
`covar_pop(${x}, ${y}) / (stddev_pop(${x}) * stddev_pop(${y}))`;
const result = await connection.executeIterator(
`SELECT count(),
${r("tip_amount", "trip_distance")} as tip_distance
FROM read_csv_auto('data/yellow_tripdata_2019-01.csv')`);
{
'count()': 7667792n,
tip_distance: 0.539269014536253,
}
...
tip_fare: 0.026611783378787134,
tip_passengers: 0.001088787150935159
...

Dataset Performance

...
FROM (
SELECT * FROM read_csv_auto('data/yellow_tripdata_2019-01.csv')
UNION
SELECT * FROM read_csv_auto('data/yellow_tripdata_2019-02.csv')
) as trips
{
'count()': 14687122n,
tip_distance: 0.5327676857033936,
tip_fare: 0.027202256434925805,
tip_passengers: 0.0008883969400021263
}

Not the Whole Story

{
'count()': 44459089n,
tip_distance: 0.032890025637241575,
tip_fare: 0.6334371687025606,

tip_passengers: -0.0001533928725559738
}
...
WHERE extract('month' FROM tpep_pickup_datetime) < 7
GROUP BY extract('month' FROM tpep_pickup_datetime)
...
{
month: 3n,
'count()': 7832325n,
tip_distance: 0.028248351449362007,
tip_fare: 0.9713848900417407,
tip_passengers: -0.0001662104560847138
},
...
{
month: 5n,
'count()': 7565359n,
tip_distance: 0.5258277351357648,
tip_fare: 0.5614612919769764,
tip_passengers: -0.002357654989447674
},
{
month: 6n,
'count()': 6940628n,
tip_distance: 0.11414505917858382,
tip_fare: 0.06219687943806697,
tip_passengers: -0.001546597640316774
}
...

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store