A FUSE file system with an internal dedicated page cache that only flushes data if explicitly requested by the application. This is useful for simulating power failures and losing unsynced data.
Please cite the following paper if you use LazyFS:
@article{
author = {Ramos, Maria and Azevedo, Jo\~{a}o and Kingsbury, Kyle and Pereira, Jos\'{e} and Esteves, T\^{a}nia and Macedo, Ricardo and Paulo, Jo\~{a}o},
title = {When Amnesia Strikes: Understanding and Reproducing Data Loss Bugs with Fault Injection},
year = {2024},
issue_date = {July 2024},
publisher = {VLDB Endowment},
volume = {17},
number = {11},
issn = {2150-8097},
url = {https://doi.org/10.14778/3681954.3681980},
doi = {10.14778/3681954.3681980},
journal = {Proc. VLDB Endow.},
month = jul,
pages = {3017–3030},
numpages = {14}
}LazyFS was tested with ext4 (with the default mount options) as the underlying file system (FUSE backend), in both Debian 11 (bullseye) and Ubuntu 20.04 (focal) environment. It is C++17 compliant and requires the following packages to be installed:
CMakeandg++:
sudo apt install g++ cmake
# The following versions are used during development:
# cmake: 3.16.3
# g++: 9.4.0FUSE 3:
sudo apt install libfuse3-dev libfuse3-3 fuse3FUSE requires the option allow_other as a startup argument so that other users can read and write files, besides the user owning the file system. For that, you must uncomment/add the following line on the configuration file /etc/fuse.conf:
user_allow_otherCompile and install the caching library libpcache, which will be attached to LazyFS:
cd libs/libpcache && ./build.sh && cd -Finally, build lazyfs:
cd lazyfs && ./build.sh && cd -LazyFS uses a toml configuration file to set up the cache and a named pipe to append fault commands:
[faults]
fifo_path="/tmp/faults.fifo"
# fifo_path_completed="/tmp/faults_completed.fifo"
[cache]
apply_eviction=false
[cache.simple]
custom_size="0.5GB"
blocks_per_page=1
# [cache.manual]
# io_block_size=4096
# page_size=4096
# no_pages=10
[file system]
log_all_operations=false
logfile="/tmp/lazyfs.log"
[[injection]]
type="torn-seq"
op="write"
file="output.txt"
persist=[1,4]
occurrence=2
[[injection]]
type="torn-op"
file="output1.txt"
occurrence=5
parts=3 #or parts_bytes=[4096,3600,1260]
persist=[1,3]
[[injection]]
type="clear"
from="f1.txt"
timing="before"
op="fsync"
occurrence=6
crash=trueI recommend following the simple cache configuration (indicating the cache size and using a similar configuration file as default.toml), since it's currently the most tested schema in our experiments. Additionally, for the section [cache], you can specify the following:
-
apply_eviction: Whether the cache should behave like the real page cache, evicting pages when the cache fills to the maximum.
-
[cache.simple] or [cache.manual]: To setup the cache size and internal organization. For now, you could just follow the example using the custom_size in (Gb/Mb/Kb) and the number of blocks in each page (you can just leave 1 as default). For manual configurations, comment out the simple configuration and uncoment/change the example above to suit your needs.
Optionally, users can specify a set of predefined injection faults before LazyFS starts running. These faults are introduced as additional features, namely:
- torn-seq: This fault type is used when a sequence of system calls, targeting a single file, is executed consecutively without an intervening
fsync. In the example, during the second group of consecutive writes (the group number is defined by the parameteroccurrence), to the file "output.txt", the first and fourth writes will be persisted to disk (the writes to be persisted are defined by the parameterpersist). After the fourth write (the last in thepersistvector), LazyFS will crash itself. - torn-op: This fault type involves dividing a write system call into smaller parts, with some of these parts being persisted while others are not. In the example, the fifth write issued (the number of the write is defined by the parameter
occurrence) to the file "output1.txt" will be divided into three equal parts if thepartsparameter is used, or into customizable-sized parts if theparts_bytesparameter is defined. In the commented code, there's an example of usingparts_bytes, where the write will be torn into three parts: the first with 4096 bytes, the second with 3600 bytes, and the last with 1200 bytes. Thepersistvector determines which parts will be persisted. After the persistence of these parts, LazyFS will crash. - clear-cache: Clears unsynced data in a certain point of the execution. In the example above, this fault will be injected after (
timing) the sixth (occurrence)fsync(op) to the file "f1.txt" (from). Theopparameter must be a system call, and if it involves two paths (such asrename), thetoparameter should also be specified. Thecrashparameter determines whether LazyFS should crash after the fault injection.
Other parameters:
- fifo_path: The absolute path where the faults FIFO should be created.
- fifo_path_completed: If we plan to inject the clear cache fault synchronously, it is necessary to determine the completion of the
lazyfs::clear-cachecommand execution. By specifying this parameter, a message will be written to another FIFO (finished::clear-cache), so that users can set up a reader process that waits before making any post-fault consistency checks. - log_all_operations: Whether to log all file system operations that LazyFS receives.
- logfile: The log file for LazyFS's outputs. Fault acknowledgment is sent to
stdoutor to thelogfile.
To run the file system, one could use the mount-lazyfs.sh script, which calls FUSE with the correct parameters:
cd lazyfs/
# Running LazyFS in the foreground (add '-f/--foregound')
./scripts/mount-lazyfs.sh -c config/default.toml -m /tmp/lazyfs.mnt -r /tmp/lazyfs.root -f
# Running LazyFS in the background
./scripts/mount-lazyfs.sh -c config/default.toml -m /tmp/lazyfs.mnt -r /tmp/lazyfs.root
# Umount with
./scripts/umount-lazyfs.sh -m /tmp/lazyfs.mnt/
# Display help
./scripts/mount-lazyfs.sh --help
./scripts/umount-lazyfs.sh --helpFinally, one can control LazyFS by echoing the following commands to the configured FIFO:
-
Clear cache - clears all unsynced data:
echo "lazyfs::clear-cache" > /tmp/faults.fifo
-
Checkpoint - checkpoints all unsynced data by calling
writeto the underlying file system (withoutfsync):echo "lazyfs::cache-checkpoint" > /tmp/faults.fifo
Note: Any subsequent failure is outside of the test control.
-
Show usage - displays the current cache usage (percentage of pages allocated):
echo "lazyfs::display-cache-usage" > /tmp/faults.fifo
-
Report unsynced data, which displays the inodes that have data in cache:
echo "lazyfs::unsynced-data-report" > /my/path/faults.fifo
-
Kill the filesystem, which is triggered by an operation, a timing and a path regex:
Here timing should be one of
beforeorafter, and op should be a valid system call name (e.g.writeorread).-
In the case of operations that have a source path only (e.g.
create,open,read,write, ...)echo "lazyfs::crash::timing=...::op=...::from_rgx=..." > /my/path/faults.fifo
Here,
from_rgxis required (do not specify to_rgx). -
For
rename,linkandsymlink, one is able to specify the destination path:echo "lazyfs::crash::timing=...::op=...::from_rgx=...::to_rgx=..." > /my/path/faults.fifo
Here, only one of
from_rgxorto_rgxis required.
Example 1:
echo "lazyfs::crash::timing=before::op=write::from_rgx=file1" > /my/path/faults.fifo
Kills LazyFS before executing a write operation to the file pattern 'file1'.
Example 2:
echo "lazyfs::crash::timing=before::op=link::from_rgx=file1::to_rgx=file2" > /my/path/faults.fifo
Kills LazyFS before executing a rename operation from the file pattern 'file1' to the file pattern 'file2'.
Example 3:
echo "lazyfs::crash::timing=before::op=rename::to_rgx=fileabd" > /my/path/faults.fifo
Kills LazyFS before executing a link operation to the file pattern 'fileabd'.
-
-
Kill the filesystem after injecting
torn-oportorn-seqfaults:The parameters are the same as the ones presented in the above configuration file. Parameters that have multiple values, must be specified without the parenthesis (e.g.,
persist=1,2).-
echo "lazyfs::torn-op::file=...::persist=...::parts=...::occurrence=..." > /my/path/faults.fifo
-
echo "lazyfs::torn-seq::op=...::file=...::persist=...::occurrence=..." > /my/path/faults.fifo
-
LazyFS expects that every buffer written to the FIFO file terminates with a new line character (echo does this by default). Thus, if using pwrite, for example, make sure you end the buffer with \n.
For additional information regarding possible improvements and collaborations please open an issue or contact: @devzizu, @mj-ramos and @dsrhaslab.
