[RTFACT-18151] createOrhpanItems repair API uses a broken (and complex) SQL query Created: 25/Dec/18  Updated: 31/Dec/18

Status: Open
Project: Artifactory Binary Repository
Component/s: Database
Affects Version/s: 5.5.2, 6.6.0
Fix Version/s: None

Type: Bug Priority: Normal
Reporter: Andrei Komarov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Overview:

Migrating from case insensitive database, could have path aggregations (and conflicts) if there were files deployed in similar manner as below:

fatherDir/childrepo/file.ext
fatherDir/ChiLdrEpo/file2.ext

You would see the two files above under fatherDir/childrepo only.

Artifactory has a built-in mechanism that scans and detects which folders has no father and create the folder automatically.

However the query used for it (below in the JIRA) is broken for at least 2 cases and considered very complex (O(n^2) complexity) which can cause it to run for days (and possibility not complete) with millions of artifacts, and other registered paths scenarios.

Workaround:

Set up replication between two instances, original one should be case insensitive - target should be sensitive


Steps to reproduce 3 scenario the query would be broken for:

1. If there are paths in other repositories which have the same parent paths as in the repository the query was issued for. E.g.:

a. Create paths like below on one repo like below (this already creates a conflict the query should repair):
curl -i -Lvv -X PUT -uadmin:password "localhost:8081/artifactory/generic-local/father/child/foo.bar״
curl -i -Lvv -X PUT -uadmin:password "localhost:8081/artifactory/generic-local/father/chilD/foo.bar"

b. Create similar parent path like below on a different repo:
curl -i -Lvv -X PUT -uadmin:password "localhost:8081/artifactory/generic-local-2/father/chilD/foo.bar"

The createOrphans repair REST API for generic-local repo will return an empty result.


2. If a father path under the same repository will contain the same directory component but a different ancestor to the father:

a. Create paths like below on one repo like below (this already creates a conflict the query should repair):
curl -i -Lvv -X PUT -uadmin:password "localhost:8081/artifactory/generic-local/father/child/foo.bar״
curl -i -Lvv -X PUT -uadmin:password "localhost:8081/artifactory/generic-local/father/chilD/foo.bar"

b. Create a different ancestor path while keeping the same orphan father directory like below:
curl -i -Lvv -X PUT -uadmin:password "localhost:8081/artifactory/generic-local/boo/chilD/foo.bar2"


3. Half a million of conflicts out of total million artifacts on a single repository:

a. Create large repository with conflicts (half million affected)
b. Issue the repair createOrphan API for that repo:

api/repairPaths/createOrphanItems/<repoKey>

c. It will "never" complete

"SELECT * FROM nodes n1" +
                            " WHERE n1.repo = ?" +
                            " AND n1.node_path like ?" +
                            " AND n1.node_name NOT IN" +
                            " (SELECT n2.node_name FROM nodes n2, nodes n3" +
                            " WHERE (n2.node_path like '%/%' AND n2.node_path like CONCAT('%/', n3.node_name))" +
                            " OR (n2.node_path not like '%/%' AND n2.node_path like CONCAT('%', n3.node_name)))",
                    path.getRepo(), emptyIfNullOrDot(path.getPath()) + "%"


 Comments   
Comment by Andrei Komarov [ 31/Dec/18 ]

Workaround:

Set up replication between two instances, original one should be case insensitive - target should be sensitive

Generated at Tue Feb 18 07:58:42 UTC 2020 using JIRA 7.6.16#76018-sha1:9ed376192612a49536ac834c64177a0fed6290f5.