{arcpbf}
is an R package that processes Esri FeatureCollection
Protocol
Buffers.
It is written in Rust and powered by the
extendr library.
arcpbf has functions for reading protocol buffer (pbf) results from an
ArcGIS REST API result. pbf results are returned when f=pbf
in a
query.
The package is extremely lightweight and fast.
Limitation: this package does not support Z and M dimensions at this point.
open_pbf()
will read a FeatureCollectionpbf
file into a raw vectorread_pbf()
will read a FeatureCollectionpbf
file and process it withresp_body_pbf()
andresps_data_pbf()
processhttr2_response
objects with FeatureCollection pbf bodiesprocess_pbf()
will process a raw vector or a list of raw vectorspost_process_pbf()
will apply post processing steps to the results ofprocess_pbf()
- set
use_sf = TRUE
to return ansf
object if possible. Applied by default inread_pbf()
,resp_body_pbf()
andresps_data_pbf()
.
- set
Developer Note: Rust must be installed to compile the package. Run the one line installation instructions at https://rustup.rs/. To verify your Rust installation is compatible, run
rextendr::rust_sitrep()
. That’s it.
Note that only the FeatureCollection pbf specification is supported by
arcpbf. If you want to process OSM pbf files use
osmextract::oe_read()
.
Or, if you want to create and read arbitrary protocol buffers directly
in R, use
RprotoBuf
.
In most cases, we will be processing a protocol buffer directly from an
http request created with {httr2}
.
library(arcpbf)
# specify url to sent our request to
url <- "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/ACS_Population_by_Race_and_Hispanic_Origin_Boundaries/FeatureServer/2/query?where=1=1&outFields=objectid&resultRecordCount=10&f=pbf&token="
req <- httr2::request(url)
resp <- httr2::req_perform(req)
resp
#> <httr2_response>
#> GET
#> https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/ACS_Population_by_Race_and_Hispanic_Origin_Boundaries/FeatureServer/2/query?where=1=1&outFields=objectid&resultRecordCount=10&f=pbf&token=
#> Status: 200 OK
#> Content-Type: application/x-protobuf
#> Body: In memory (60102 bytes)
We can process request responses with resp_body_pbf()
. Post-processing
steps are applied by default. The arguments post_process
and use_sf
are TRUE
by default.
resp_body_pbf(resp)
#> Simple feature collection with 10 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -17298700 ymin: 2216212 xmax: -17253470 ymax: 2261306
#> Projected CRS: WGS 84 / Pseudo-Mercator
#> OBJECTID geometry
#> 1 1 POLYGON ((-17264972 2244291...
#> 2 2 POLYGON ((-17264972 2244291...
#> 3 3 POLYGON ((-17264587 2241560...
#> 4 4 POLYGON ((-17263053 2239296...
#> 5 5 POLYGON ((-17261894 2236947...
#> 6 6 POLYGON ((-17262143 2241010...
#> 7 7 POLYGON ((-17261575 2233733...
#> 8 8 POLYGON ((-17263397 2234419...
#> 9 9 POLYGON ((-17269918 2239302...
#> 10 10 POLYGON ((-17265323 2239121...
When running multiple requests in parallel using
httr2::req_perform_parallel()
the responses are returned as a list of
responses. resps_data_pbf()
processes the responses in a vectorized
manner.
# create a list of requests
reqs <- replicate(5, req, simplify = FALSE)
# perform them in parallel
resps <- httr2::req_perform_parallel(reqs)
#> Iterating ■■■■■■■ 20% | ETA: 7sIterating ■■■■■■■■■■■■■ 40% | ETA: 6sIterating
#> ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 1s
# process the responses
resps_data_pbf(resps)
#> Simple feature collection with 50 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -17298700 ymin: 2216212 xmax: -17253470 ymax: 2261306
#> Projected CRS: WGS 84 / Pseudo-Mercator
#> First 10 features:
#> OBJECTID geometry
#> 1 1 POLYGON ((-17264972 2244291...
#> 2 2 POLYGON ((-17264972 2244291...
#> 3 3 POLYGON ((-17264587 2241560...
#> 4 4 POLYGON ((-17263053 2239296...
#> 5 5 POLYGON ((-17261894 2236947...
#> 6 6 POLYGON ((-17262143 2241010...
#> 7 7 POLYGON ((-17261575 2233733...
#> 8 8 POLYGON ((-17263397 2234419...
#> 9 9 POLYGON ((-17269918 2239302...
#> 10 10 POLYGON ((-17265323 2239121...
In some cases you may have a file on disk that you want to process a pbf
from. Use read_pbf()
to do so. Again, post-processing steps are
applied by default.
fp <- system.file("small-points.pbf", package = "arcpbf")
read_pbf(fp)
#> Simple feature collection with 2 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -17298700 ymin: 2216212 xmax: -17260020 ymax: 2261306
#> Projected CRS: WGS 84 / Pseudo-Mercator
#> County geometry
#> 1 Hawaii County POLYGON ((-17264972 2244291...
#> 2 Hawaii County POLYGON ((-17264972 2244291...
There are three types of PBF FeatureCollection responses that may be returned as a result of a Feature Service Query request.
- Feature Results:
- the default query response type. Contains individual features with their attributes and geometries if available.
- Count Result:
- returned when
returnCountOnly=true
in an API request. Returned as a scalar integer vector.
- returned when
- Object ID Result:
- returned when
returnIdsOnly=true
. Adata.frame
containing object IDs where the column name is set to the object ID field name of the feature service.
- returned when
Feature results can either omit geometry entirely, for example in the
case of a Table or when the query parameter returnGeometry=false
, or
include it. When geometry is omitted entirely, the response is processed
as a simple data.frame
. However, if the response does contain
geometry, the response is a bit more complex.
Unprocessed feature results with geometries return a named list with 3 elements:
attributes
:- a
data.frame
of the fields and their values
- a
sr
:- a named list with elements
wkt
,wkid
,latest_wkid
,vcs_wkid
, andlatest_vcs_wkid
. These determine the coordinate reference system of the response as well as the vertical coordinate reference system.
- a named list with elements
geometry
:- an
sfc
object without a computed bounding box or coordinate reference system set or a CRS set.
- an
# read an example pbf without post-processing
fc_fp <- system.file("small-points.pbf", package = "arcpbf")
res <- read_pbf(fc_fp, post_process = FALSE)
res
#> $attributes
#> County
#> 1 Hawaii County
#> 2 Hawaii County
#>
#> $geometry
#> Geometry set for 2 features
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS: NA
#> POLYGON ((-17264972 2244291, -17264988 2244297,...
#> POLYGON ((-17264972 2244291, -17264967 2244286,...
#>
#> $sr
#> $sr$wkt
#> [1] NA
#>
#> $sr$wkid
#> [1] 102100
#>
#> $sr$latest_wkid
#> [1] 3857
#>
#> $sr$vcs_wkid
#> [1] NA
#>
#> $sr$latest_vcs_wkid
#> [1] NA
When post-processing is applied to a geometry Feature Result, the CRS is
set and the bounding box is computed. This requires the sf
package to
be available.
post_process_pbf(res)
#> Simple feature collection with 2 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -17298700 ymin: 2216212 xmax: -17260020 ymax: 2261306
#> Projected CRS: WGS 84 / Pseudo-Mercator
#> County geometry
#> 1 Hawaii County POLYGON ((-17264972 2244291...
#> 2 Hawaii County POLYGON ((-17264972 2244291...
The function open_pbf()
will read a pbf file into a raw vector which
can be passed to process_pbf()
. In general you will not need this
function, but it is handy for the sake of example.
pbf_raw <- open_pbf(fc_fp)
head(pbf_raw, 20)
#> [1] 12 ac fd 01 0a a8 fd 01 0a 08 4f 42 4a 45 43 54 49 44 12 0c
This raw vector can be turned into an R object using process_pbf()
.
The output will not be post processed.
res <- process_pbf(pbf_raw)
res
#> $attributes
#> County
#> 1 Hawaii County
#> 2 Hawaii County
#>
#> $geometry
#> Geometry set for 2 features
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS: NA
#> POLYGON ((-17264972 2244291, -17264988 2244297,...
#> POLYGON ((-17264972 2244291, -17264967 2244286,...
#>
#> $sr
#> $sr$wkt
#> [1] NA
#>
#> $sr$wkid
#> [1] 102100
#>
#> $sr$latest_wkid
#> [1] 3857
#>
#> $sr$vcs_wkid
#> [1] NA
#>
#> $sr$latest_vcs_wkid
#> [1] NA
Post-processing can be applied to the result of process_pbf()
using
post_process_pbf()
.
post_process_pbf(res)
#> Simple feature collection with 2 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -17298700 ymin: 2216212 xmax: -17260020 ymax: 2261306
#> Projected CRS: WGS 84 / Pseudo-Mercator
#> County geometry
#> 1 Hawaii County POLYGON ((-17264972 2244291...
#> 2 Hawaii County POLYGON ((-17264972 2244291...
post_process_pbf()
can also be applied to a list of processed pbf
responses.
multi_res <- list(res, res, res)
post_process_pbf(multi_res)
#> Simple feature collection with 6 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -17298700 ymin: 2216212 xmax: -17260020 ymax: 2261306
#> Projected CRS: WGS 84 / Pseudo-Mercator
#> County geometry
#> 1 Hawaii County POLYGON ((-17264972 2244291...
#> 2 Hawaii County POLYGON ((-17264972 2244291...
#> 3 Hawaii County POLYGON ((-17264972 2244291...
#> 4 Hawaii County POLYGON ((-17264972 2244291...
#> 5 Hawaii County POLYGON ((-17264972 2244291...
#> 6 Hawaii County POLYGON ((-17264972 2244291...
Below is a bench mark that compares processing pbfs to the current approach of processing raw json in arcgislayers and arcgisutils. The below recreates the example from the README of arcgislayers.
jsn <- function() {
json_reqs <- c(
"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=json&resultOffset=0",
"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=json&resultOffset=2001"
)
reqs <- lapply(json_reqs, httr2::request)
resps <- httr2::req_perform_parallel(reqs) |>
lapply(function(x) arcgisutils::parse_esri_json(httr2::resp_body_string(x)))
do.call(rbind.data.frame, resps) |>
sf::st_as_sf()
}
# protobuff processing
pbf <- function() {
pbf_reqs <- c(
"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=pbf&resultOffset=0",
"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=pbf&resultOffset=2001"
)
reqs <- lapply(pbf_reqs, httr2::request)
httr2::req_perform_parallel(reqs) |>
resps_data_pbf()
}
bench::mark(
jsn(),
pbf(),
check = FALSE,
relative = TRUE,
iterations = 5
)
#> Iterating ■■■■■■■■■■■■■■■■ 50% | ETA: 1s Iterating ■■■■■■■■■■■■■■■■ 50% | ETA: 1s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 jsn() 3.06 3.27 1 3.94 1.39
#> 2 pbf() 1 1 3.94 1 1
Internally, there is a rust crate esripbf
which
is a a Rust library built with
prost
to handle the
FeatureCollection Protocol Buffer
Specification.
Alternatively, it may make sense to write to a geoarrow array and convert to sfc using {wk}. These are just thoughts.