Datasets - Added Chicago Taxi Trips dataset (#3775)
* Datasets - Added Chicago Taxi Trips dataset * Added details to the description * Improved querying by URL-encoding all query parameters * Renamed teh directory to datasets * Fixed the container image I though alpine had curl * Renamed the components file * Fixed the quoting * Renamed the directory
This commit is contained in:
parent
76f7476c0f
commit
5e3d9aa791
|
|
@ -0,0 +1,41 @@
|
|||
name: Chicago Taxi Trips dataset
|
||||
description: |
|
||||
City of Chicago Taxi Trips dataset: https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew
|
||||
|
||||
The input parameters configure the SQL query to the database.
|
||||
The dataset is pretty big, so limit the number of results using the `Limit` or `Where` parameters.
|
||||
Read [Socrata dev](https://dev.socrata.com/docs/queries/) for the advanced query syntax
|
||||
metadata:
|
||||
annotations:
|
||||
author: Alexey Volkov <alexey.volkov@ark-kun.com>
|
||||
inputs:
|
||||
- {name: Where, type: String, default: 'trip_start_timestamp>="1900-01-01" AND trip_start_timestamp<"2100-01-01"'}
|
||||
- {name: Limit, type: Integer, default: '1000', description: 'Number of rows to return. The rows are randomly sampled.'}
|
||||
- {name: Select, type: String, default: 'trip_id,taxi_id,trip_start_timestamp,trip_end_timestamp,trip_seconds,trip_miles,pickup_census_tract,dropoff_census_tract,pickup_community_area,dropoff_community_area,fare,tips,tolls,extras,trip_total,payment_type,company,pickup_centroid_latitude,pickup_centroid_longitude,pickup_centroid_location,dropoff_centroid_latitude,dropoff_centroid_longitude,dropoff_centroid_location'}
|
||||
- {name: Format, type: String, default: 'csv', description: 'Output data format. Suports csv,tsv,cml,rdf,json'}
|
||||
outputs:
|
||||
- {name: Table, description: 'Result type depends on format. CSV and TSV have header.'}
|
||||
implementation:
|
||||
container:
|
||||
image: curlimages/curl
|
||||
command:
|
||||
- sh
|
||||
- -c
|
||||
- |
|
||||
set -e -x -o pipefail
|
||||
output_path="$0"
|
||||
select="$1"
|
||||
where="$2"
|
||||
limit="$3"
|
||||
format="$4"
|
||||
mkdir -p "$(dirname "$output_path")"
|
||||
curl --get 'https://data.cityofchicago.org/resource/wrvz-psew.'"${format}" \
|
||||
--data-urlencode '$limit='"${limit}" \
|
||||
--data-urlencode '$where='"${where}" \
|
||||
--data-urlencode '$select='"${select}" \
|
||||
| tr -d '"' > "$output_path" # Removing unneeded quotes around all numbers
|
||||
- {outputPath: Table}
|
||||
- {inputValue: Select}
|
||||
- {inputValue: Where}
|
||||
- {inputValue: Limit}
|
||||
- {inputValue: Format}
|
||||
Loading…
Reference in New Issue