Skip to the content.

m3fs add Grafana integration in the release v0.1.9.

How Does It Work

m3fs now will pull Grafana (version 12.0.0) images and setup a Grafana service with a 3FS dashboard automatically.

By default, Grafana service will listening on TCP port 3000, and the default username and password are all admin. The Grafana service started by m3fs contains a ClickHouse data source and a dashboard. After launching a cluster with m3fs, you can visit the Grafana service’s URL (e.g. http://ip:3000) to monitor the running of the 3FS cluster. Take a look at the dashboard:

Overview of the 3FS dashboard

The 3FS Metrics

Type of Metrics

3FS calculates metrics in each service and store metrics to ClickHouse through monitor service. There are some types of metrics in 3FS, and the metric may or may not be reset after each report.

There are many types of metrics:

Type Class Storage Table Description
value ValueRecorder 3fs.counters Set by code and stored to ClickHouse directly. E.g. capacity
count CountRecorder 3fs.counters Increase by code and usually reset to zero after each report. E.g., report read counter every second mean read IOPS.

It is also be used to track current on-flight operations, which is increased at the beginning and is decreased after the operation finished, rather than reset automatically after report.
distribution DistributionRecorder 3fs.distributions Used to calculate P90, P99 and etc for the period between two reports.
latency LatencyRecorder 3fs.distributions Special distribution implementation, record latency with unit ns (nanosecond).

List of Metrics

Following is a partial list of the metrics and their meanings:

Metric Name Type Reset After Report Description
fuse.dirty_inodes value Y Current number of dirty inodes
fuse.op count Y Number of client operations; instance field is the operation name
fuse.piov.bw count Y Number of bytes written by the fuse client to the storage service
fuse.write.latency latency Y Write operation latency
fuse.write.size distribution Y Write operation size
meta_server.batch_op_size count Y Batch operation count of metadata (sync + close + setattr)
meta_server.dist_set_map count Y Number of set disk server map operations
meta_server.op_code count Y Number of times a specific error code appears, tag is the specific error code
meta_server.op_failed count Y Number of failed operations
meta_server.op_idempotent count Y Number of idempotent operations executed
meta_server.op_duplicated count Y Number of duplicated operations
meta_server.op_running count N Number of running operations
meta_server.op_total count Y Total number of operations
meta_server.op_latency latency Y Metadata operation latency (nanoseconds)
meta_server.open_write count Y Number of open/create file operations
meta_server.stat_dir count Y Number of directory stat operations executed
meta_server.stat_file count Y Number of file stat operations executed
meta_server.stat_symlink count Y Number of symbolic link stat operations executed
meta_server.auth_failed count Y Number of authentication failures
storage.check_disk.current count N Number of currently running disk check operations
storage.check_disk.total count Y Total number of disk check operations
storage.chunk_engine.
allocate_times
count Y Number of chunks created
storage.chunk_engine.
checksum_combine
count Y Checksum and merge count (includes checksum and merge, and append operations)
storage.chunk_engine.
checksum_recalculate
count Y Checksum recalculation count
storage.chunk_engine.
copy_on_write_read_bytes
count Y Number of bytes read by COW (Copy-on-Write)
storage.chunk_engine.
copy_on_write_read_times
count Y Number of times COW read data
storage.chunk_engine.
copy_on_write_times
count Y Number of COW executions
storage.chunk_engine.new value N Current number of chunk_engines? (Need to check the code)
storage.chunk_engine.
pwrite_times
count Y Number of pwrite calls
storage.chunk_engine.
safe_write_direct_append
count Y Number of direct append writes during safe_write
storage.chunk_engine.
safe_write_indirect_append
count Y Number of indirect append writes during safe_write
storage.chunk_remove.times count Y Number of chunk delete operations
storage.chunk_write.times count Y Number of chunk write operations
storage.disk_info.available value N Disk available capacity (target unused space + reserved space + file system available space, bytes)
storage.disk_info.capacity value N Total file system capacity (bytes)
storage.disk_info.read_only value N Whether the file system is read-only (1: read-only, implemented by writing .hf3fs_check file in the file system root directory)
storage.disk_info.free value N File system available space (bytes)
storage.do_commit.current count N Number of IOs currently being committed
storage.do_commit.total count Y Total number of committed IOs
storage.do_commit.fails count Y Number of failed commit IOs
storage.do_commit.succ_latency latency Y Success commit latency
storage.do_commit.fail_latency latency Y Failure commit latency
storage.do_query.num_chunks count Y Number of chunks in query results
storage.do_query.total count Y Total number of chunk query operations
storage.do_query.current count N Number of current chunk query operations
storage.do_query.succ_latency latency Y Success query latency
storage.do_query.fail_latency latency Y Failure query latency
storage.do_remove.current count N Number of chunk remove operations
storage.do_remove.fails count Y Number of failed chunk remove operations
storage.do_remove.num_chunks count Y Number of chunks removed
storage.do_remove.total count Y Total number of chunk remove operations
storage.do_remove.succ_latency latency Y Success remove latency
storage.do_remove.fail_latency latency Y Failure remove latency
storage.do_update.current count N Number of current update chunk operations
storage.do_update.total count Y Total number of update chunk operations
storage.do_update.fails count Y Number of failed update chunk operations
storage.do_update.succ_latency latency Y Success update latency
storage.do_update.fail_latency latency Y Failure update latency
storage.remove_range.current count N Number of current range remove chunk operations
storage.remove_range.fails count Y Number of failed range remove chunk operations
storage.remove_range.total count Y Total number of range remove chunk operations
storage.remove_range.succ_latency latency Y Success remove range latency
storage.remove_range.fail_latency latency Y Failure remove range latency
storage.req_remove_chunks.current count N Number of current remove chunk requests
storage.req_remove_chunks.total count Y Total number of remove chunk requests
storage.req_update.bytes count Y Number of chunk update bytes
storage.req_update.current count N Number of current chunk update requests
storage.req_update.fails count Y Number of failed chunk update requests
storage.req_update.total count Y Total number of chunk update requests
storage.req_write.bytes count Y Number of bytes written by chunk write requests
storage.req_write.current count N Number of current chunk write requests
storage.req_write.total count Y Total number of chunk write requests
storage.engine_commit.current count N Number of current commit operations
storage.engine_commit.total count Y Total number of commit operations
storage.engine_commit.fails count Y Number of failed commit operations
storage.engine_update.current count N Number of current chunk update operations
storage.engine_update.total count Y Total number of chunk update operations
storage.engine_update.fails count Y Number of failed chunk update operations
storage.forward.write_bytes count Y Number of forwarded write bytes
storage.forward.syncing_bytes count Y Number of forwarded sync bytes
storage.reliable_forward.current count N Number of current forward operations
storage.reliable_forward.total count Y Total number of forward operations
storage.reliable_forward.fails count Y Number of failed forward operations
storage.target.used_size value N Target used space (bytes)
storage.target.reserved_size value N Target reserved space (bytes)
storage.target.unrecycled_size value N Target unrecycled space (bytes)
storage.target_state value N Target state (0:invalid 1:uptodate 2:online 4:offline)
storage.write.bytes count Y Total bytes written (includes write and update operations)