Introduction
Specifications for storing and transmitting neuronal morphology and connectivity data using the Apache Arrow data model, and models compatible with it (e.g. Apache Parquet).
About Apache Arrow
From arrow.apache.org:
Apache Arrow defines a language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.
Using Apache Arrow gives neurarrow implementors access to a large ecosystem of existing software libraries across languages, as well as the ability to exchange that data between language runtimes and processes with minimal serialisation cost.
The use of standard binary formats such as parquet also allows the data to be read now and in the future without neurarrow-specific implementations.
Prior art
These software packages manage tabular neuroscience data:
- navis
- The neurarrow specification was originally based on navis’ parquet IO
- CATMAID
- natverse
These file formats describe tabular neuroscience data:
These specifications build on Apache Arrow with domain-specific schemas:
- geoarrow and geoparquet
Links
Development happens on github, and the rendered specification is at https://clbarnes.github.io/neurarrow/.