# Concepts `fsspec-db` maps database structure onto the fsspec filesystem model. Schemas, tables, views, columns, indexes, and constraints become paths. Table data becomes files whose extension selects the transfer format. ## Layers The implementation has three layers: 1. A Rust `Database` trait describes database primitives: list schemas, list relations, inspect metadata, run queries, and insert Arrow batches. 1. Rust `DatabaseFs` turns a `Database` implementation into an `fsspec_rs::FileSystem`. It owns path parsing, metadata shaping, SQL generation for table reads, and format encoding/decoding. 1. Python filesystem classes subclass `fsspec.AbstractFileSystem` and delegate primitives to the PyO3 bridge. Python gets normal fsspec behavior while Rust remains the source of truth for database path semantics. The native SQLite, PostgreSQL, and MySQL backends use sqlx. Python-defined databases implement `AbstractDatabase`; the reverse bridge lets Rust call that Python object through the same `DatabaseFs` path. ## Data Model `info()` and `ls(detail=True)` return ordinary fsspec dictionaries with these common keys: | Key | Meaning | | ------ | ---------------------------------------------------------------------------------------------------- | | `name` | Absolute fsspec-db path, without protocol. | | `type` | `"directory"` for schemas, relations, and facets; `"file"` for metadata items and materialized data. | | `size` | Byte size when known. Materialized table reads usually learn size only after encoding. | | `kind` | fsspec-db object kind, such as `schema`, `table`, `view`, `column`, `index`, or `constraint`. | Extra metadata is stored directly in the same dictionary: | Path | Extra fields | | ------------------ | ---------------------------------------------------------------------- | | Relation directory | `kind`, optional `row_count`, optional `size_bytes`. | | Column item | `data_type`, `nullable`, `ordinal`, `primary_key`, optional `default`. | | Index item | `columns`, `unique`, optional `method`. | | Constraint item | `kind`, `columns`, optional `references`, optional `expr`. | | Data file | `format`, `dialect`, `size_known`. | ## Reads Reading a data path runs a generated `SELECT` against the relation, converts rows to Arrow, then encodes the result based on the path extension: | Extension | Bytes returned | | ---------- | --------------------------------- | | `.arrow` | Arrow IPC stream | | `.parquet` | Parquet | | `.csv` | CSV with a header | | `.jsonl` | Arrow JSON line-delimited records | | `.sql` | DDL or view definition text | `fs.query(sql, params=None)` is intentionally separate from path reads. It accepts raw SQL, binds parameters, and returns a `pyarrow.Table`. ## Writes Writes decode incoming Arrow-compatible bytes and call `Database.insert()`: | Operation | Insert mode | | --------------------------------------- | ----------------------------------- | | `open(path, "wb")` | truncate relation, then insert rows | | `open(path, "ab")` | append rows | | `pipe_file(path, bytes)` | truncate by default | | `pipe_file(path, bytes, mode="append")` | append | | `put_file(local, path)` | truncate by default | | `put_file(local, path, mode="append")` | append | DDL writes are deliberately not part of the early surface. Creating or dropping tables will be a guarded later feature. ## Boundaries Current native support is SQLite, PostgreSQL, and MySQL. These backends handle common Arrow scalar types: booleans, integers, floats, UTF-8 strings, binary values, and all-null columns. SQLite also binds temporal arrays as integer epoch values. Decimal, temporal, and specialized PostgreSQL/MySQL types should be cast in SQL until richer Arrow mappings land.