I support a large Artifactory setup that includes six distributed sites with a complicated replication hierarchy. As such, when the replications aren't working correctly, I would love to be able to monitor the current status. My suggestion(s)-
Add a jmx bean that tracks current replication lag. I would implement this by having each task, as it gets added to the replication queue, get a "created_timestamp" field, and when the executor finishes a task, have it record that created_timestamp and a completed_timestamp field for each record. The difference between the two could be reported by jmx as "current_replication_lag". Depending on how the task queue / executor setup is implemented, this could include tags for the current repo/replication-job that the task belongs to (in case there are more than one queue, so you can monitor them separately)
When setting this up in the past, we also added a log level that would have the executor log that timestamp when it started working on each task and when it finished each task, allowing us to troubleshoot stuck async tasks and set up alerts if they fell too far behind.
The other really useful metric would be "replication queue length". A counter of "tasks scheduled" and "tasks completed" can be used to get at the data on a second order, but if possible (depending on the queue implementation) actually reporting the current queue length as a metric would be even better.