DataTUI - CSV, JSON, Parquet handler with Rust efficiency

DataTUI - CSV, JSON, Parquet handler with Rust efficiency

Python is easy to use, but it's memory-hungry. That's often a problem, especially when you handle large data files (CSV, JSON, Parquet). DataTUI isn't developed in Python, but in Rust. Let's see why this is an advantage.

In data science, workloads often involve massive datasets with hundreds of megabytes or gigabytes in memory. This is where Python’s dynamic typing and object wrappers can become a bottleneck. Each Python object (for example, floats in a list or DataFrame columns) carries significant per-object memory overhead and must be tracked by the garbage collector (GC).

Rust, in contrast, compiles directly to machine code and works with contiguous memory layouts similar to C or C++. This allows libraries written in Rust (such as Polars, Arrow, or DataFusion) to handle tabular and numerical data far more efficiently, especially when scanning or aggregating large files.

Because memory is managed at compile time without GC pauses, Rust-based data frames can stream and process data in a cache-friendly, vectorized manner with minimal allocations.

  • Libraries like pandas use numpy, have a C-based backend.
  • Or polars which has a Rust-based backend.

DataTUI - the data Terminal User Interface (TUI) for exploration

DataTUI is great for exploration tasks. Quickly load an unfamiliar dataset. Let's say a set of annual reports for various equities.

DataTUI — A Terminal UI for Data
DataTUI is a fast, keyboard-first terminal data viewer for CSV, Excel, SQLite, Parquet, and JSON. Tabs, SQL, filters, JMES, exports, and more.
  • polars based
  • keyboard TUI
  • wide-range file support
  • allows you to skip complex OLAP / DWH solutions for medium-scale data
    • closes a gap
    • with a little upkilling you can use it for PowerQuery / DEX style workflows (known from Excel / PowerBI) and accelerate your workflow

Problem: no memory spilling. You may need a beefy laptop / host.

datatui has some advanced features (note: synthetic data).
datatui --load 'csv:file.csv'

Visidata is the Python equivalent and uses Pandas 1.x

  • pandas 2 comes with performance improvements, but these do not seem to include memory spilling. Personally, I find it highly annoying because it would be relatively easy to do. Instead, the Pandas project recommends chunking.
Open-source data multitool | VisiData
Command-line interactive multitool for tabular data.

visidata therefore has no value-add anymore.

DuckDB

Given the shortcomings of many pandas and polars based projects, I prefer to use DuckDB or ArcticDB

  • DuckDB with a memory limit solves the data spilling topic
  • ArcticDB can use an LMDB backend on disk and is effective for pandas flows
Memory Management in DuckDB
DuckDB is an in-process SQL database management system focused on analytical query processing. It is designed to be easy to install and easy to use. DuckDB has no external dependencies. DuckDB has bindings for C/C++, Python, R, Java, Node.js, Go and other languages.

For me, the winner is duckdb if the filesize > 60% RAM.

Share This Article