Zarr Concepts

Zarr is the underlying storage format for OME-NGFF. Understanding Zarr concepts helps when working with OME-Zarr data.

Key Concepts

Stores

Zarr arrays can be stored in various backends:

  • Directory store: Files on local disk (most common)

  • S3 store: Cloud object storage (Amazon S3, MinIO, etc.)

  • HTTP store: Read-only access via HTTP/HTTPS

  • Memory store: In-memory storage for testing

Groups and Arrays

Zarr organizes data hierarchically:

  • Groups: Containers that can hold arrays and other groups (like folders)

  • Arrays: N-dimensional data chunks with metadata

Chunks

Large arrays are divided into chunks for efficient access:

  • Each chunk is stored as a separate file/object

  • Only needed chunks are loaded into memory

  • Chunk shape affects performance for different access patterns

import zarr

# Create array with specific chunk shape
arr = zarr.zeros((10000, 10000), chunks=(1000, 1000))

Sharding

Zarr now provides the option to bundle multiple chunks into a so-called shard file (see sharding proposal). On local file systems, this reduces the number of files written to disk, which improves several aspects of data handling (i.e., data transfer speeds). Moreover, depending on the used file system, there are limits on the total number of binary files per volume, which sharding helps to mitigate. On remote file systems, this can limit the overhead during streaming operations (i.e., when reading data from an S3 store), which can improve performance.

The API to actually write shards depends on the respective implementation and usage of the Zarr standard. In ome-zarr-py, the sharding options are documented here.

Zarr v2 vs v3

OME-NGFF v0.4 uses Zarr v2, while OME-NGFF v0.5 uses Zarr v3:

Feature

Zarr v2

Zarr v3

Metadata file

.zarray, .zgroup

zarr.json

Sharding

No

Yes

Codecs

Limited

Extensible

Resources