Introduction

Specifications for storing and transmitting neuronal morphology and connectivity data using the Apache Arrow data model, and models compatible with it (e.g. Apache Parquet).

About Apache Arrow

From arrow.apache.org:

Apache Arrow defines a language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Using Apache Arrow gives neurarrow implementors access to a large ecosystem of existing software libraries across languages, as well as the ability to exchange that data between language runtimes and processes with minimal serialisation cost.

The use of standard binary formats such as parquet also allows the data to be read now and in the future without neurarrow-specific implementations.

Prior art

These software packages manage tabular neuroscience data:

navis
- The neurarrow specification was originally based on navis’ parquet IO
CATMAID
natverse

These file formats describe tabular neuroscience data:

SWC (and SWCplus)

These specifications build on Apache Arrow with domain-specific schemas:

geoarrow and geoparquet

Links

Development happens on github, and the rendered specification is at https://clbarnes.github.io/neurarrow/.

Keyboard shortcuts

neurarrow

Introduction

About Apache Arrow

Prior art

Links