This page documents the different plugin function libraries available for use with Terrazzo. Plugins extend the capabilities of Terrazzo with functions that can be called as part of the transformation expressions in a build definition, such as in a compute or filter step.
There is a rich set of “built-in” functions that can be used without loading any plugins, and which are provided by the Datafusion library that Terrazzo is based on. You can see the list of built-in functions and how to invoke them here
Geometry
The Terrazzo geometry plugin provides routines for geospatial calculations, such as intersections and areas. For the most part, the functions in this plugin are implemented in terms of the corresponding algorithms in the GEOS geometry library.
Make the geometry plugin available to your build by adding this to the top of your build definition:
"plugins": ["terrazzo_geometry"],
As of this writing, all of the functions in this plugin expect to receive geospatial arguments encoded in WKB (Well-Known Binary) format. When applying these functions over column values, the columns must be of the Binary
data type.
geom_area
geom_area
- Returns the area of a polygonal geometry.
geom_area(GEOMETRY: Binary) -> Float64
geom_contains
geom_contains
- Tests if every point of B lies in A, and their interiors have a point in common
geom_contains(GEOMETRY_A: Binary, GEOMETRY_B: Binary) -> Boolean
geom_difference
geom_difference
- Computes a geometry representing the part of geometry A that does not intersect geometry B
geom_difference(GEOMETRY_A: Binary, GEOMETRY_B: Binary) -> Binary
geom_equals
geom_equals
- Tests if two geometries include the same set of points
geom_equals(GEOMETRY_A: Binary, GEOMETRY_B: Binary) -> Boolean
geom_intersection
geom_intersection
- Computes a geometry representing the shared portion of geometries A and B
geom_intersection(GEOMETRY_A: Binary, GEOMETRY_B: Binary) -> Binary
geom_intersects
geom_intersects
- Tests if two geometries intersect (they have at least one point in common)
geom_intersects(GEOMETRY_A: Binary, GEOMETRY_B: Binary) -> Boolean
NYC address
The Terrazzo NYC address plugin provides routines for translating addresses to and from various formats used in New York City-related datasets, such as voter files.
Make the NYC address plugin available to your build by adding this to the top of your build definition:
"plugins": ["terrazzo_nyc_address"],
normalize_address_nyc_boe_to_pluto
normalize_address_nyc_boe_to_pluto
- Attempt to format an address that follows the conventions of the NYC county boards of election voterfiles to an address of the form that appears in NYC Planning’s PLUTO database.
normalize_address_nyc_boe_to_pluto(ADDRESS: Utf8, CITY: Utf8) -> Utf8
Where ADDRESS
is the house number and street name of the target address, and CITY
is the target NYC borough–or in the case of Queens, the neighborhood name. Returns an address string (house number and street name) representing a best-effort attempt to normalize the input address.
Normalization transformations include whitespace trimming, conversion of ordinal street names (“FIRST” vs. “1”), and corrections of common, borough-specific street name misspellings.