A virtual file system adapter for Azure Blob storage
OTHER License
Blobfuse2 is an open source project developed to provide a virtual filesystem backed by the Azure Storage. It uses the libfuse open source library (fuse3) to communicate with the Linux FUSE kernel module, and implements the filesystem operations using the Azure Storage REST APIs. This is the next generation blobfuse.
Blobfuse2 is stable and supported by Microsoft when used within its documented limits. Blobfuse2 supports high-performance reads and writes with strong consistency; however, it is recommended that multiple clients do not modify the same blob/file simultaneously to ensure data integrity. Blobfuse2 does not guarantee continuous synchronization of data written to the same blob/file using multiple clients or across multiple mounts of Blobfuse2 concurrently. If you modify an existing blob/file with another client while also reading that object, Blobfuse2 will not return the most up-to-date data. To ensure your reads see the newest blob/file data, disable all forms of caching at kernel (using direct-io
) as well as at Blobfuse2 level, and then re-open the blob/file.
Please submit an issue here for any issues/feature requests/questions.
This section will help you choose the correct config for Blobfuse2.
block-cache
mode, it is strongly recommended that all Blobfuse2 installations be upgraded to version 2.3.2. For more information, see this.streaming
mode is being deprecated.cp
utility to a Blobfuse2 mounted path, use --sparse=never
parameter to avoid data being trimmed. For example, cp --sparse=never src dest
.This page lists various benchmarking results for HNS and FNS Storage account.
Visit this page to see list of supported linux distros.
One of the biggest BlobFuse2 features is our brand new health monitor. It allows customers gain more insight into how their BlobFuse2 instance is behaving with the rest of their machine. Visit here to set it up.
You can install Blobfuse2 by cloning this repository. In the workspace root execute below commands to build the binary.
The general format of the Blobfuse2 commands is blobfuse2 [command] [arguments] --[flag-name]=[flag-value]
help
- Help about any commandmount
- Mounts an Azure container as a filesystem. The supported containers include
mount all
- Mounts all the containers in an Azure account as a filesystem. The supported storage services include
mount list
- Lists all Blobfuse2 filesystems.secure decrypt
- Decrypts a config file.secure encrypt
- Encrypts a config file.secure get
- Gets value of a config parameter from an encrypted config file.secure set
- Updates value of a config parameter.unmount
- Unmounts the Blobfuse2 filesystem.unmount all
- Unmounts all Blobfuse2 filesystems.To see a list of commands, type blobfuse2 -h
and then press the ENTER key.
To learn about a specific command, just include the name of the command (For example: blobfuse2 mount -h
).
--config-file=<PATH>
: The path to the config file.--log-level=<LOG_*>
: The level of logs to capture.--log-file-path=<PATH>
: The path for the log file.--foreground=true
: Mounts the system in foreground mode.--read-only=true
: Mount container in read-only mode.--default-working-dir
: The default working directory to store log files and other blobfuse2 related information.--disable-version-check=true
: Disable the blobfuse2 version check.--secure-config=true
: Config file is encrypted suing 'blobfuse2 secure` command.--passphrase=<STRING>
: Passphrase used to encrypt/decrypt config file.--wait-for-mount=<TIMEOUT IN SECONDS>
: Let parent process wait for given timeout before exit to ensure child has started.--block-cache
: To enable block-cache instead of file-cache. This works only when mounted without any config file.--lazy-write
: To enable async close file handle call and schedule the upload in background.--attr-cache-timeout=<TIMEOUT IN SECONDS>
: The timeout for the attribute cache entries.--no-symlinks=true
: To improve performance disable symlink support.--container-name=<CONTAINER NAME>
: The container to mount.--cancel-list-on-mount-seconds=<TIMEOUT IN SECONDS>
: Time for which list calls will be blocked after mount. ( prevent billing charges on mounting)--virtual-directory=true
: Support virtual directories without existence of a special marker blob for block blob account.--subdirectory=<path>
: Subdirectory to mount instead of entire container.--disable-compression:false
: Disable content encoding negotiation with server. If blobs have 'content-encoding' set to 'gzip' then turn on this flag.--use-adls=false
: Specify configured storage account is HNS enabled or not. This must be turned on when HNS enabled account is mounted.--cpk-enabled=true
: Allows mounting containers with cpk. Use config file or env variables to set cpk encryption key and cpk encryption key sha.--file-cache-timeout=<TIMEOUT IN SECONDS>
: Timeout for which file is cached on local system.--tmp-path=<PATH>
: The path to the file cache.--cache-size-mb=<SIZE IN MB>
: Amount of disk cache that can be used by blobfuse. Default - 80% of free disk space.--high-disk-threshold=<PERCENTAGE>
: If local cache usage exceeds this, start early eviction of files from cache.--low-disk-threshold=<PERCENTAGE>
: If local cache usage comes below this threshold then stop early eviction.--sync-to-flush=false
: Sync call will force upload a file to storage container if this is set to true, otherwise it just evicts file from local cache.--block-size-mb=<SIZE IN MB>
: Size of a block to be downloaded during streaming.--block-cache-block-size=<SIZE IN MB>
: Size of a block to be downloaded as a unit.--block-cache-pool-size=<SIZE IN MB>
: Size of pool to be used for caching. This limits total memory used by block-cache. Default - 80% of free memory available.--block-cache-path=<PATH>
: Path where downloaded blocks will be persisted. Not providing this parameter will disable the disk caching.--block-cache-disk-size=<SIZE IN MB>
: Disk space to be used for caching. Default - 80% of free disk space.--block-cache-disk-timeout=<seconds>
: Timeout for which disk cache is valid.--block-cache-prefetch=<Number of blocks>
: Number of blocks to prefetch at max when sequential reads are in progress. Default - 2 times number of CPU cores.--block-cache-parallelism=<count>
: Number of parallel threads doing upload/download operation. Default - 3 times number of CPU cores.--block-cache-prefetch-on-open=true
: Start prefetching on open system call instead of waiting for first read. Enhances perf if file is read sequentially from offset 0.--attr-timeout=<TIMEOUT IN SECONDS>
: Time the kernel can cache inode attributes.--entry-timeout=<TIMEOUT IN SECONDS>
: Time the kernel can cache directory listing.--negative-timeout=<TIMEOUT IN SECONDS>
: Time the kernel can cache non-existance of file or directory.--allow-other
: Allow other users to have access this mount point.--disable-writeback-cache=true
: Disallow libfuse to buffer write requests if you must strictly open files in O_WRONLY or O_APPEND mode.--ignore-open-flags=true
: Ignore the append and write only flag since O_APPEND and O_WRONLY is not supported with writeback caching.AZURE_STORAGE_ACCOUNT
: Specifies the storage account to be connected.AZURE_STORAGE_ACCOUNT_TYPE
: Specifies the account type 'block' or 'adls'AZURE_STORAGE_ACCOUNT_CONTAINER
: Specifies the name of the container to be mountedAZURE_STORAGE_BLOB_ENDPOINT
: Specifies the blob endpoint to use. Defaults to *.blob.core.windows.net, but is useful for targeting storage emulators.AZURE_STORAGE_AUTH_TYPE
: Overrides the currently specified auth type. Case insensitive. Options: Key, SAS, MSI, SPNAZURE_STORAGE_ACCESS_KEY
: Specifies the storage account key to use for authentication.AZURE_STORAGE_SAS_TOKEN
: Specifies the SAS token to use for authentication.AZURE_STORAGE_IDENTITY_CLIENT_ID
: Only one of these three parameters are needed if multiple identities are present on the system.AZURE_STORAGE_IDENTITY_OBJECT_ID
: Only one of these three parameters are needed if multiple identities are present on the system.AZURE_STORAGE_IDENTITY_RESOURCE_ID
: Only one of these three parameters are needed if multiple identities are present on the system.MSI_ENDPOINT
: Specifies a custom managed identity endpoint, as IMDS may not be available under some scenarios. Uses the MSI_SECRET
parameter as the Secret
header.MSI_SECRET
: Specifies a custom secret for an alternate managed identity endpoint.AZURE_STORAGE_SPN_CLIENT_ID
: Specifies the client ID for your application registrationAZURE_STORAGE_SPN_TENANT_ID
: Specifies the tenant ID for your application registrationAZURE_STORAGE_AAD_ENDPOINT
: Specifies a custom AAD endpoint to authenticate againstAZURE_STORAGE_SPN_CLIENT_SECRET
: Specifies the client secret for your application registration.AZURE_STORAGE_AUTH_RESOURCE
: Scope to be used while requesting for token.http_proxy
: The proxy server address. Example: 10.1.22.4:8080
.https_proxy
: The proxy server address when https is turned off forcing http. Example: 10.1.22.4:8080
.AZURE_STORAGE_CPK_ENCRYPTION_KEY
: Customer provided base64-encoded AES-256 encryption key value.AZURE_STORAGE_CPK_ENCRYPTION_KEY_SHA256
: Base64-encoded SHA256 of the cpk encryption key.Below diagrams guide you to choose right configuration for your workloads.
-o direct_io
CLI parameter is the option you need to use while mounting. Along with this, set file-cache-timeout=0
and all other libfuse caching parameters should also be set to 0. User shall be aware that disabling kernel cache can result into more calls to Azure Storage which will have cost and performance implications.Blobfuse2 does not support overlapping mount paths. While running multiple instances of Blobfuse2 make sure each instance has a unique and non-overlapping mount point.
Blobfuse2 does not support co-existance with NFS on same mount path. Behaviour in this case is undefined.
For block blob accounts, where data is uploaded through other means, Blobfuse2 expects special directory marker files to exist in container. In absence of this few file operations might not work. For e.g. if you have a blob 'A/B/c.txt' then special marker files shall exists for 'A' and 'A/B', otherwise opening of 'A/B/c.txt' will fail. Once a 'ls' operation is done on these directories 'A' and 'A/B' you will be able to open 'A/B/c.txt' as well. Possible workaround to resolve this from your container is to either
create the directory marker files manually through portal or run 'mkdir' command for 'A' and 'A/B' from blobfuse. Refer me for details on this.
In case of BlockBlob accounts, ACLs are not supported by Azure Storage so Blobfuse2 will by default return success for 'chmod' operation. However it will work fine for Gen2 (DataLake) accounts.
When Blobfuse2 is mounted on a container, SYS_ADMIN privileges are required for it to interact with the fuse driver. If container is created without the privilege, mount will fail. Sample command to spawn a docker container is
docker run -it --rm --cap-add=SYS_ADMIN --device=/dev/fuse --security-opt apparmor:unconfined <environment variables> <docker image>
In case of mount all
system may limit on number of containers you can mount in parallel (when you go above 100 containers). To increase this system limit use below command
echo 256 | sudo tee /proc/sys/fs/inotify/max_user_instances
Refer this for block-cache limitations.
By default, Blobfuse2 will log to syslog. The default settings will, in some cases, log relevant file paths to syslog. If this is sensitive information, turn off logging or set log-level to LOG_ERR.
This project is licensed under MIT.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.