DuckDB
2024-01-03 19:50:36.541854+01 by Dan Lyke 0 comments
masto.sh is a tool to download all of your mastodon posts so that you can query them via DuckDB.
DuckDB is a tool for using SQL for Online analytical processing (OLAP) queries, long-running, where the data sets are likely to have large bulk updates.
To efficiently support this workload, it is critical to reduce the amount of CPU cycles that are expended per individual value. The state of the art in data management to achieve this are either vectorized or just-in-time query execution engines. DuckDB contains a columnar-vectorized query execution engine, where queries are still interpreted, but a large batch of values (a “vector”) are processed in one operation. This greatly reduces overhead present in traditional systems such as PostgreSQL, MySQL or SQLite which process each row sequentially. Vectorized query execution leads to far better performance in OLAP queries.