SQLite¶
SQLite is the first native backend. It is implemented in Rust with sqlx and exposed to
Python as the db+sqlite fsspec protocol.
Construction¶
import fsspec
fs = fsspec.filesystem("db+sqlite", database="app.db")
database can be a filesystem path, :memory:, or a SQLite URL such as
sqlite:///tmp/app.db.
You can also construct the class directly:
from fsspec_db import SQLiteDatabaseFileSystem
fs = SQLiteDatabaseFileSystem(database="app.db")
Reading Metadata¶
fs.ls("/", detail=False)
fs.ls("/main", detail=False)
fs.info("/main/users")
fs.info("/main/users/columns/id")
The backend introspects:
schemas from
PRAGMA database_list;tables and views from
sqlite_master;columns from
PRAGMA table_info;indexes from
PRAGMA index_listandPRAGMA index_info;primary keys and foreign keys from
PRAGMA table_infoandPRAGMA foreign_key_list.
Relation lookups check only the requested relation and count rows only for that relation. Schema listings do not count every table.
Reading Data¶
import pyarrow.ipc as ipc
with ipc.open_stream(fs.cat_file("/main/users.arrow")) as reader:
table = reader.read_all()
table = fs.query("SELECT id, name FROM users WHERE id > ?", [0])
The native path releases the Python GIL while SQLite I/O is running.
Writing Arrow IPC¶
Warning
Overwrite writes replace the table contents. open(path, "wb"), default pipe_file, and
default put_file delete existing rows inside the write transaction before inserting the
incoming rows. Use "ab" or mode="append" to preserve existing rows.
import pyarrow as pa
import pyarrow.ipc as ipc
table = pa.table({"name": ["ada"], "score": [1.0]})
sink = pa.BufferOutputStream()
with ipc.new_stream(sink, table.schema) as writer:
writer.write_table(table)
with fs.open("/main/users.arrow", "ab") as file:
file.write(sink.getvalue().to_pybytes())
"ab" appends rows. "wb" truncates the table with DELETE FROM inside the same
transaction, then inserts rows.
Writing Parquet¶
import pyarrow as pa
import pyarrow.parquet as pq
pq.write_table(pa.table({"name": ["grace"], "score": [2.0]}), "rows.parquet")
fs.put_file("rows.parquet", "/main/users.parquet")
put_file truncates by default. Pass mode="append" to append:
fs.put_file("rows.parquet", "/main/users.parquet", mode="append")
Type Handling¶
SQLite values are converted into Arrow arrays with a small affinity mapper. Query results use declared column metadata plus observed non-null runtime value types, so NULL-first and mixed numeric expression columns can still produce numeric Arrow arrays.
SQLite value/type |
Arrow type |
|---|---|
boolean-like |
|
integer-like |
|
real/float/double |
|
blob/binary |
|
everything else |
|
For writes, Arrow Null columns bind SQLite NULL. Temporal arrays are bound as integer
epoch values. When a query produces mixed SQLite runtime types in one expression column,
cast explicitly in SQL for predictable Arrow output.