#cli

DataTUI - CSV, JSON, Parquet handler with Rust efficiency

Oct 20, 2025

3 min read

Python is easy to use, but it's memory-hungry. That's often a problem, especially when you handle large data files (CSV, JSON, Parquet). DataTUI isn't developed in Python, but in Rust. Let's see why this is an advantage.

In data science, workloads often involve massive datasets with hundreds of megabytes or gigabytes in memory. This is where Python’s dynamic typing and object wrappers can become a bottleneck. Each Python object (for example, floats in a list or DataFrame columns) carries significant per-object memory overhead and must be tracked by the garbage collector (GC).

Rust, in contrast, compiles directly to machine code and works with contiguous memory layouts similar to C or C++. This allows libraries written in Rust (such as Polars, Arrow, or DataFusion) to handle tabular and numerical data far more efficiently, especially when scanning or aggregating large files.

Because memory is managed at compile time without GC pauses, Rust-based data frames can stream and process data in a cache-friendly, vectorized manner with minimal allocations.

Libraries like pandas use numpy, have a C-based backend.
Or polars which has a Rust-based backend.

DataTUI - the data Terminal User Interface (TUI) for exploration

DataTUI is great for exploration tasks. Quickly load an unfamiliar dataset. Let's say a set of annual reports for various equities.

polars based
keyboard TUI
wide-range file support
allows you to skip complex OLAP / DWH solutions for medium-scale data
- closes a gap
- with a little upkilling you can use it for PowerQuery / DEX style workflows (known from Excel / PowerBI) and accelerate your workflow

Problem: no memory spilling. You may need a beefy laptop / host.

`datatui` has some advanced features (note: synthetic data).

datatui --load 'csv:file.csv'

Visidata is the Python equivalent and uses Pandas 1.x

pandas 2 comes with performance improvements, but these do not seem to include memory spilling. Personally, I find it highly annoying because it would be relatively easy to do. Instead, the Pandas project recommends chunking.

visidata therefore has no value-add anymore.

DuckDB

Given the shortcomings of many pandas and polars based projects, I prefer to use DuckDB or ArcticDB

DuckDB with a memory limit solves the data spilling topic
ArcticDB can use an LMDB backend on disk and is effective for pandas flows

For me, the winner is duckdb if the filesize > 60% RAM.

#cli #data-science #tools

DataTUI - CSV, JSON, Parquet handler with Rust efficiency

DataTUI - the data Terminal User Interface (TUI) for exploration

Visidata is the Python equivalent and uses Pandas 1.x

DuckDB

Share This Article

Related Articles

Atlassian Confluence Migration / Exit Plan

Your open-source interactive agent to get things done

Nexterm for simple remote SSH and RDP sessions via the browser

Turbo for faster CLI edits

About Us

Quick Links

Recent Posts

AI = job killa?

Atlassian Confluence Migration / Exit Plan