[RTFACT-17688] Debian Incremental Indexer performance during filteration of removed entries can be improved Created: 21/Oct/18  Updated: 21/Oct/18

Status: Open
Project: Artifactory Binary Repository
Component/s: Debian
Affects Version/s: None
Fix Version/s: None

Type: Performance Priority: Normal
Reporter: Uriah Levy Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


Our code scans and filters the current stored version of the Packages file by version-blocks (multiline metadata blocks of individual versions) block by block, filtering files that have been deleted since the last calculation. 

There are two key factors in this process where it can be improved:

  1. It is enough to delete/override a single file to have our code scan the entire Packages file. Suggested: find a way to make this an o(1) operation. Either the deleted file has a block in the current Packages file or it doesn't. The problem is that we build the new Packages file as we iterate throughout all the blocks in the Packages file, and modelling the Packages file is not a trivial task if other solutions are to be considered.
  2. If fixing #1 is not feasible, consider passing just the filename of to regex matcher rather than the entire metadata block, making the regex computation easier. 

My local test shows that on a 3K packages Packages file, this process alone, when overriding a single deb file, takes 7 seconds.

Generated at Mon Jun 01 20:44:48 UTC 2020 using Jira 8.5.3#805003-sha1:b4933e02eaff29a49114274fe59e1f99d9d963d7.