Apache Arrow .NET

An implementation of Arrow targeting .NET.

See our current feature matrix for currently available features.

Implementation

  • Arrow specification 1.0.0. (Support for reading 0.11+.)
  • C# 11
  • .NET Standard 2.0, .NET 6.0, .NET 8.0 and .NET Framework 4.6.2
  • Asynchronous I/O
  • Uses modern .NET runtime features such as Span<T>, Memory<T>, MemoryManager<T>, and System.Buffers primitives for memory allocation, memory storage, and fast serialization.
  • Uses Acyclic Visitor Pattern for array types and arrays to facilitate serialization, record batch traversal, and format growth.

Known Issues

  • Cannot read Arrow files containing tensors.
  • Cannot easily modify allocation strategy without implementing a custom memory pool. All allocations are currently 64-byte aligned and padded to 8-bytes.
  • Default memory allocation strategy uses an over-allocation strategy with pointer fixing, which results in significant memory overhead for small buffers. A buffer that requires a single byte for storage may be backed by an allocation of up to 64-bytes to satisfy alignment requirements.
  • There are currently few builder APIs available for specific array types. Arrays must be built manually with an arrow buffer builder abstraction.
  • FlatBuffer code generation is not included in the build process.
  • Serialization implementation does not perform exhaustive validation checks during deserialization in every scenario.
  • Throws exceptions with vague, inconsistent, or non-localized messages in many situations
  • Throws exceptions that are non-specific to the Arrow implementation in some circumstances where it probably should (eg. does not throw ArrowException exceptions)
  • Lack of code documentation
  • Lack of usage examples

Usage

Example demonstrating reading RecordBatches from an Arrow IPC file using an ArrowFileReader:

using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;
using Apache.Arrow;
using Apache.Arrow.Ipc;

public static async Task<RecordBatch> ReadArrowAsync(string filename)
{
    using (var stream = File.OpenRead(filename))
    using (var reader = new ArrowFileReader(stream))
    {
        var recordBatch = await reader.ReadNextRecordBatchAsync();
        Debug.WriteLine("Read record batch with {0} column(s)", recordBatch.ColumnCount);
        return recordBatch;
    }
}

Status

Memory Management

  • Allocations are 64-byte aligned and padded to 8-bytes.
  • Allocations are automatically garbage collected

Arrays

Primitive Types

Parametric Types

Type Metadata

Serialization

IPC Format

Compression

Not Implemented

  • Serialization
    • Exhaustive validation
    • Run End Encoding
  • Types
    • Tensor
  • Arrays
  • Array Operations
    • Equality / Comparison
    • Casting
  • Compute
    • There is currently no API available for a compute / kernel abstraction.