Uploaded image for project: 'Artifactory Binary Repository'
  1. Artifactory Binary Repository
  2. RTFACT-21491

Cascaded python remote repo returns 404 for uncached packages

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: 4 - Normal
    • Resolution: Unresolved
    • Affects Version/s: 6.18.0, 7.2.1
    • Fix Version/s: None
    • Component/s: PyPI, Remote Repository
    • Labels:
      None
    • Severity:
      Medium

      Description

      When configuring two Artifactory instances (pip -> art2 -> art1 -> internet) in a chain to serve pypi packages via remote repo, art2 only serves packages to pip that have been cached at art1 already. Uncached artefacts return 404. The expected behaviour would be that uncached artefacts are fetched from the internet.

      In practical terms this means, starting with empty caches everywhere, the following call fails with 404.

      $ pip download --no-deps --no-cache foo -i http://art2/...

      But the following sequence succeeds.

      # triggers caching in art1
      $ pip download --no-deps --no-cache foo -i http://art1/...
      
      $ rm foo*.tar.gz
      $ pip download --no-deps --no-cache foo -i http://art2/... 

      Fetching Metadata for packages works (pip always can determine the list of available versions for a package), but pip then fails to download the actual package file.

      I already investigated a bit by examining the HTTP-requests.

      When pip directly talks to Artifactory (pip -> art1 -> internet), the package URLs announced by Artifactory through the package metadata are of the form

      http://<BASE-URL>/api/pypi/<REPO>/packages/packages/<HASH-BASED-PACKAGE-PATH>

      Pip then requests exactly that URL, which works as expected. If the package is missing, the request triggers an upstream lookup and initiates downloading from upstream.

      However, when one Artifactory instance talks to another upstream Artifactory  instance (pip -> art2 -> art1 -> internet), the request art2 -> art1 is:

      http://<BASE-URL>/api/pypi/<REPO>/packages/<HASH-BASED-PACKAGE-PATH>

      Note that there only is one "packages" in the URL.

      This URL does not trigger any further upstream lookup and therefore causes a 404 downstream for uncached artefacts.

      The funny part is, once an artefact is cached, any URL works, regardless of how many "packages/..." levels it contains. This explains why uncached artefacts are hit by that issue only.

        Attachments

          Activity

            People

            Assignee:
            Unassigned
            Reporter:
            mvogt Markus Vogt
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:

                Sync Status

                Connection: RTFACT Sync
                RTMID-21491 -
                SYNCHRONIZED
                • Last Sync Date: