Vitastor 3.0.14 released

2026-06-21

How many (bugs!) I’ve knifed, how many I’ve slit! (more than 50)

General note: most bug fixes now include regression tests to verify that they don’t repeat in the future. Most bugs fixed in this release were detected by using LLM analysis (Claude Opus/Fable, GPT 5.5).

OSD

  • Fix OSD hanging with an infinite loop when setting autosync_interval to 0 at runtime
  • (IMPORTANT) Fix EC PGs hanging in REPEERING when the last final commit/rollback in a batch completes with an error. An interesting note: the issue was related to a problem classified in 3.0.12 as a dangerous memory corruption, but after reproducing it in a regression test, it turned out that, apparently, no memory corruption could have actually occurred there.
  • Limit pg_size by 64 because peering doesn’t handle larger values with EC — they just lead to ‘incomplete’ objects
  • Fix OSD crash with an “assertion failed” error on EIO retry in snapshot chain read (i.e. when some chunks belong to a corrupted replica with checksum mismatch)
  • (IMPORTANT) Disable chunked PG count resharding due to possible interference with compaction changes (will be re-enabled after fixes)
  • (IMPORTANT) Fix incorrect snapshot allocation bitmap recovery during EC chained read
  • Add on-wire request size validation to prevent possible OOM/DoS/heap corruption on receiving invalid data from the network
  • (IMPORTANT) Fix parity-less EC writes destroying snapshot allocation bitmaps (i.e. when all parity OSDs in a PG are missing)
  • (IMPORTANT) Fix EC N+K, K>=2 recovery destroying snapshot allocation bitmaps of live parity chunks
  • (IMPORTANT) Fix a possible OSD crash during EC misplaced object scrubbing
  • (IMPORTANT) Verify object bitmap consistency during scrub (only data was checked previously)
  • (IMPORTANT) Fix corrupted object chunks incorrectly marked as non-corrupted on the second scrub
  • (IMPORTANT) Fix cached EC decoding of multiple stripes with ISA-L (ISA-L is the default)

New store

  • Fix a theoretically possible OSD crash on startup when using the previously added workaround for the “double-claim” problem
  • Remove theoretically possible incorrect metadata block writes during batch EC COMMITs restarted due to a full metadata area
  • Fix incorrect compaction counter tracking after OSD restart (could probably lead to compaction not restarted correctly after a restart)
  • (IMPORTANT) Fix some of parallel big_writes possibly not waiting for data fsync, thus not providing durability
  • Fix possible OSD crash on sync retry when io_uring is full
  • Fix a possible crash during startup on corrupted on-disk data with too small entry sizes

Old store

  • Prevent loading extra garbage metadata entries from the last 4 MB of metadata area
  • Fix read operations possibly crashing if a metadata read (with inmemory_metadata=false) was restarted due to a full io_uring
  • Fix a possible memory leak of temporary buffers and bitmaps/checksums when a read was restarted due to a full io_uring (reproducible with either inmemory_metadata=false or block_size>256k)
  • Fix a possible OSD crash during padded checksum reads if buffer count exceeded 1024 (IOV_MAX) (reproducible only with csum_block_size > 4k and block_size >= 4M)
  • (IMPORTANT) Fix partial padded read journal checksum verification with csum_block_size > 4k
  • Fix incorrect marking of corrupted objects as non-corrupted after flushing data from journal (with inmemory_journal=false)
  • (IMPORTANT) Fix deferred freeing of a different block when a block was used by a parallel read
  • Fix per-inode statistics not being disabled for FS and S3 pools correctly, leading to etcd overload with unneeded per-inode statistics, slower etcd operation, increased memory usage, and too many Prometheus statistics exported by the monitor

Both stores

  • Fix possibly left garbage in the metadata area if the first OSD startup was interrupted — metadata header is now written only after initializating metadata
  • Check for short reads during initialization (just in case, doesn’t happen in real life)

Clients

  • Fix write-back queue item split in case when write-back is enabled at runtime
  • Implement bdrv_detach_aio_context & bdrv_attach_aio_context in the QEMU driver (should fix migration with iothread)
  • Do not crash on full io_uring in ublk server
  • Fix missing --readonly option handling in NBD server
  • Stop gracefully on NBD_CMD_DISC instead of just exit(0) in NBD server
  • Fix writeback detection in ublk server for --image mode
  • Limit the amount of incoming data for NFS clients to prevent choking on memory in async mount mode

Tools (vitastor-disk/vitastor-cli)

  • Prevent vitastor-cli merge possibly exiting before completing the last sync/delete operations
  • Fix vitastor-disk incorrectly validating too large small_write entry length
  • Fix vitastor-cli merge ignoring input option validation errors
  • Fix vitastor-cli rm-data always skipping the final fsync
  • Fix vitastor-disk resize not moving the last used data block
  • Fix vitastor-disk write-meta incorrectly importing new store small_write entries
  • Fix vitastor-disk write-journal and write-meta importing old store data incorrectly when csum_block_size is > 4k
  • Support --io option for vitastor-disk dump-journal/write-journal
  • Fix vitastor-disk resize crash when converting from very old (0.5.x) metadata
  • Fix vitastor-disk trim incorrectly rounding block ranges with --discard_granularity option explicitly set to a value > 4k, possibly leading to discarding live data
  • Fix vitastor-disk write-meta importing new store metadata incorrectly with > 4 GB metadata area size
  • (IMPORTANT) Fix vitastor-cli modify --resize to a smaller size clearing all image data O_o

Other

  • Do not crash with an uncaught exception when an invalid /osd/state/ with a non-numeric suffix is present in etcd (in OSD and all client services)
  • Fix possible crash in vitastor-kv when handling a corrupted DB due to a uint32 overflow
  • Fix NFS-RDMA memory allocator crashing in some situations
  • Fix small shared file extend-write potentially reading unallocated memory (NFS)
  • Add bounds checks to prevent uint32 overflows in NFS/XDR
  • Re-enable accidentally disabled safety checks (asserts) in files with included cpp-btree
  • Fix too small memory allocation in NFS portmap