Monitoring Gitaly and Gitaly Cluster
You can use the available logs and Prometheus metrics to monitor Gitaly and Gitaly Cluster (Praefect).
Metric definitions are available:
- Directly from Prometheus /metricsendpoint configured for Gitaly.
- Using Grafana Explore on a Grafana instance configured against Prometheus.
Monitor Gitaly rate limiting
Gitaly can be configured to limit requests based on:
- Concurrency of requests.
- A rate limit.
Monitor Gitaly request limiting with the gitaly_requests_dropped_total Prometheus metric. This metric provides a total count
of requests dropped due to request limiting. The reason label indicates why a request was dropped:
- 
rate, due to rate limiting.
- 
max_size, because the concurrency queue size was reached.
- 
max_time, because the request exceeded the maximum queue wait time as configured in Gitaly.
Monitor Gitaly concurrency limiting
You can observe specific behavior of concurrency-queued requests using the Gitaly logs and Prometheus:
- In the Gitaly logs, look for the string (or structured log field)
acquire_ms. Messages that have this field are reporting about the concurrency limiter.
- In Prometheus, look for the following metrics:
- 
gitaly_concurrency_limiting_in_progressindicates how many concurrent requests are being processed.
- 
gitaly_concurrency_limiting_queuedindicates how many requests for an RPC for a given repository are waiting due to the concurrency limit being reached.
- 
gitaly_concurrency_limiting_acquiring_secondsindicates how long a request has to wait due to concurrency limits before being processed.
 
- 
Monitor Gitaly cgroups
You can observe the status of control groups (cgroups) using Prometheus:
- 
gitaly_cgroups_reclaim_attempts_total, a gauge for the total number of times there has been a memory reclaim attempt. This number resets each time a server is restarted.
- 
gitaly_cgroups_cpu_usage, a gauge that measures CPU usage per cgroup.
- 
gitaly_cgroup_procs_total, a gauge that measures the total number of processes Gitaly has spawned under the control of cgroups.
pack-objects cache
The following pack-objects cache metrics are available:
- 
gitaly_pack_objects_cache_enabled, a gauge set to1when the cache is enabled. Available labels:dirandmax_age.
- 
gitaly_pack_objects_cache_lookups_total, a counter for cache lookups. Available label:result.
- 
gitaly_pack_objects_generated_bytes_total, a counter for the number of bytes written into the cache.
- 
gitaly_pack_objects_served_bytes_total, a counter for the number of bytes read from the cache.
- 
gitaly_streamcache_filestore_disk_usage_bytes, a gauge for the total size of cache files. Available label:dir.
- 
gitaly_streamcache_index_entries, a gauge for the number of entries in the cache. Available label:dir.
Some of these metrics start with gitaly_streamcache because they are generated by the
streamcache internal library package in Gitaly.
Example:
gitaly_pack_objects_cache_enabled{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache",max_age="300"} 1
gitaly_pack_objects_cache_lookups_total{result="hit"} 2
gitaly_pack_objects_cache_lookups_total{result="miss"} 1
gitaly_pack_objects_generated_bytes_total 2.618649e+07
gitaly_pack_objects_served_bytes_total 7.855947e+07
gitaly_streamcache_filestore_disk_usage_bytes{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 2.6200152e+07
gitaly_streamcache_filestore_removed_total{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1
gitaly_streamcache_index_entries{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1Useful queries
The following are useful queries for monitoring Gitaly:
- 
Use the following Prometheus query to observe the type of connections Gitaly is serving a production environment: sum(rate(gitaly_connections_total[5m])) by (type)
- 
Use the following Prometheus query to monitor the authentication behavior of your GitLab installation: sum(rate(gitaly_authentications_total[5m])) by (enforced, status)In a system where authentication is configured correctly and where you have live traffic, you see something like this: {enforced="true",status="ok"} 4424.985419441742There may also be other numbers with rate 0, but you only have to take note of the non-zero numbers. The only non-zero number should have enforced="true",status="ok". If you have other non-zero numbers, something is wrong in your configuration.The status="ok"number reflects your current request rate. In the example above, Gitaly is handling about 4000 requests per second.
- 
Use the following Prometheus query to observe the Git protocol versions being used in a production environment: sum(rate(gitaly_git_protocol_requests_total[1m])) by (grpc_method,git_protocol,grpc_service)
Monitor Gitaly Cluster
To monitor Gitaly Cluster (Praefect), you can use these Prometheus metrics. There are two separate metrics endpoints from which metrics can be scraped:
- The default /metricsendpoint.
- 
/db_metrics, which contains metrics that require database queries.
Default Prometheus /metrics endpoint
The following metrics are available from the /metrics endpoint:
- 
gitaly_praefect_read_distribution, a counter to track distribution of reads. It has two labels:- 
virtual_storage.
- 
storage.
 They reflect configuration defined for this instance of Praefect. 
- 
- 
gitaly_praefect_replication_latency_bucket, a histogram measuring the amount of time it takes for replication to complete after the replication job starts. Available in GitLab 12.10 and later.
- 
gitaly_praefect_replication_delay_bucket, a histogram measuring how much time passes between when the replication job is created and when it starts. Available in GitLab 12.10 and later.
- 
gitaly_praefect_connections_total, the total number of connections to Praefect. Introduced in GitLab 14.7.
To monitor strong consistency, you can use the following Prometheus metrics:
- 
gitaly_praefect_transactions_total, the number of transactions created and voted on.
- 
gitaly_praefect_subtransactions_per_transaction_total, the number of times nodes cast a vote for a single transaction. This can happen multiple times if multiple references are getting updated in a single transaction.
- 
gitaly_praefect_voters_per_transaction_total: the number of Gitaly nodes taking part in a transaction.
- 
gitaly_praefect_transactions_delay_seconds, the server-side delay introduced by waiting for the transaction to be committed.
- 
gitaly_hook_transaction_voting_delay_seconds, the client-side delay introduced by waiting for the transaction to be committed.
To monitor the number of repositories that have no healthy, up-to-date replicas:
- gitaly_praefect_unavailable_repositories
To monitor repository verification, use the following Prometheus metrics:
- 
gitaly_praefect_verification_queue_depth, the total number of replicas pending verification. This metric is scraped from the database and is only available when Prometheus is scraping the database metrics.
- 
gitaly_praefect_verification_jobs_dequeued_total, the number of verification jobs picked up by the worker.
- 
gitaly_praefect_verification_jobs_completed_total, the number of verification jobs completed by the worker. Theresultlabel indicates the end result of the jobs:- 
validindicates the expected replica existed on the storage.
- 
invalidindicates the replica expected to exist did not exist on the storage.
- 
errorindicates the job failed and has to be retried.
 
- 
- 
gitaly_praefect_stale_verification_leases_released_total, the number of stale verification leases released.
You can also monitor the Praefect logs.
Database metrics /db_metrics endpoint
Introduced in GitLab 14.5.
The following metrics are available from the /db_metrics endpoint:
- 
gitaly_praefect_unavailable_repositories, the number of repositories that have no healthy, up to date replicas.
- 
gitaly_praefect_read_only_repositories, the number of repositories in read-only mode in a virtual storage. This metric is available for backwards compatibility reasons.gitaly_praefect_unavailable_repositoriesis more accurate.
- 
gitaly_praefect_replication_queue_depth, the number of jobs in the replication queue.