GitLab import from a file

GitLab provides option to import a new project from a file. This option is created for migrating projects from one GitLab instance to another. In case of Pagure to GitLab importer we need to adapt Pagure data to file format used by GitLab. This document will investigate that option.

GitLab file format

For purpose of investigation of the GitLab export format I tried to export test project I created during the investigation of GitLab API. See GitLab API investigation.

The export will generate one archive in tar.gz format. This archive contains a following directory structure:

2023-01-20_11-48-813_testgroup519_arc_export
├── GITLAB_REVISION
├── GITLAB_VERSION
├── lfs-objects
│   └── 45a5d77993d525cdda15d08e63c34339a1bf49a43756a05908082bb04b4c4087
├── lfs-objects.json
├── project.bundle
├── project.design.bundle
├── snippets
├── tree
│   ├── project
│   │   ├── auto_devops.ndjson
│   │   ├── boards.ndjson
│   │   ├── ci_cd_settings.ndjson
│   │   ├── ci_pipelines.ndjson
│   │   ├── container_expiration_policy.ndjson
│   │   ├── custom_attributes.ndjson
│   │   ├── error_tracking_setting.ndjson
│   │   ├── external_pull_requests.ndjson
│   │   ├── issues.ndjson
│   │   ├── labels.ndjson
│   │   ├── merge_requests.ndjson
│   │   ├── metrics_setting.ndjson
│   │   ├── milestones.ndjson
│   │   ├── pipeline_schedules.ndjson
│   │   ├── project_badges.ndjson
│   │   ├── project_feature.ndjson
│   │   ├── project_members.ndjson
│   │   ├── prometheus_metrics.ndjson
│   │   ├── protected_branches.ndjson
│   │   ├── protected_environments.ndjson
│   │   ├── protected_tags.ndjson
│   │   ├── push_rule.ndjson
│   │   ├── releases.ndjson
│   │   ├── security_setting.ndjson
│   │   ├── service_desk_setting.ndjson
│   │   └── snippets.ndjson
│   └── project.json
├── uploads
│   └── 8b4f7247f154d0b77c5d7d13e16cb904
│       └── Infra___Releng_2022.jpg
└── VERSION

7 directories, 35 files

Following is the explanation of some of the files found in the archive:

  • GitLab metadata files (version and revision)

  • .bundle file which is created by git bundle command. You can easily look at the content of .bundle file by using git clone command.

  • .design.bundle contains all the attachments from issues and merge requests. It is a repository file bundled by git bundle command.

  • lsf-object.json contains list of hashes of designs and their mapping to issue id or merge request id. This is something we can skip, because Pagure doesn’t have this feature.

  • VERSION file contains version, but I was not able what this version refers to. My assumption is that it’s version of the export tool.

  • lfs-objects/ folder contains all the designs named by hash. This is something we can skip, because Pagure doesn’t have this feature.

  • snippets/ folder contains GitLab snippets.

  • tree/project.json file contains all the project metadata in JSON format.

  • tree/project/ contains files in ndjson format describing various objects defined in GitLab project. For purpose of this investigation only issues.ndjson and merge_requests.ndjson are important for us.

  • uploads/ folder contains all the attachments from issues or merge requests.

Conversion of Pagure project to GitLab file formats

For purpose of the investigation I tried to convert ARC project hosted on Pagure to GitLab import format. For this purpose I started with the export generated by GitLab and changed files to correspond to what I want to import.

Here is the list of all files that I needed to prepare and their content with explanation:

  • project.bundle is a binary bundle file created by git bundle command. It was created by running git bundle create project.bundle –all inside ARC project repository.

  • tree/project/issues.ndjson contains issues description in ndjson format. The file contains project_id or author_id set to 0, instead it contains author object with FAS username and public FAS e-mail. Unfortunately if the author_id isn’t recognized by GitLab it will create the issue or comment as a user who is providing the import, completely ignoring the author object in JSON.

    {"title":"Investigate the GitLab API for Pagure to Gitlab importer","author_id":0,"author":{"username": "zlopez","email": "michal.konecny@pacse.eu"},"project_id":42729361,"created_at":"2023-01-19T11:41:40.000Z","updated_at":"2023-01-19T14:06:47.659Z","description":"Investigate the GitLab API for Pagure to Gitlab importer ARC investigation. This ticket will also work as a test ticket in investigation.","iid":1,"updated_by_id":null,"weight":null,"confidential":false,"due_date":null,"lock_version":0,"time_estimate":0,"relative_position":513,"last_edited_at":null,"last_edited_by_id":null,"discussion_locked":null,"closed_at":"2023-01-19T14:06:47.641Z","closed_by_id":3072529,"health_status":null,"external_key":null,"issue_type":"issue","state":"closed","events":[{"project_id":42729361,"author_id":3072529,"created_at":"2023-01-19T13:07:11.164Z","updated_at":"2023-01-19T13:07:11.164Z","action":"created","target_type":"Issue","fingerprint":null},{"project_id":42729361,"author_id":3072529,"created_at":"2023-01-19T14:06:47.712Z","updated_at":"2023-01-19T14:06:47.712Z","action":"closed","target_type":"Issue","fingerprint":null}],"timelogs":[],"notes":[{"note":"Here's a sample comment as you requested @zlopez.","noteable_type":"Issue","author_id":3072529,"created_at":"2023-01-19T12:59:59.000Z","updated_at":"2023-01-19T12:59:59.000Z","project_id":42729361,"attachment":{"url":null},"line_code":null,"commit_id":null,"st_diff":null,"system":false,"updated_by_id":null,"type":null,"position":null,"original_position":null,"resolved_at":null,"resolved_by_id":null,"discussion_id":"f98cdeabaaec68ae453e1dbf5d9e535fbbcede0a","change_position":null,"resolved_by_push":null,"confidential":null,"last_edited_at":"2023-01-19T12:59:59.000Z","author":{"name":"Zlopez"},"award_emoji":[],"events":[{"project_id":42729361,"author_id":3072529,"created_at":"2023-01-19T13:13:21.071Z","updated_at":"2023-01-19T13:13:21.071Z","action":"commented","target_type":"Note","fingerprint":null}]}],"label_links":[],"resource_label_events":[],"resource_milestone_events":[],"resource_state_events":[{"user_id":3072529,"created_at":"2023-01-19T14:06:47.734Z","state":"closed","source_commit":null,"close_after_error_tracking_resolve":false,"close_auto_resolve_prometheus_alert":false}],"designs":[],"design_versions":[],"issue_assignees":[],"zoom_meetings":[],"award_emoji":[],"resource_iteration_events":[]}
    {"title":"Test open issue","author_id":0,"author":{"username": "akashdeep","email": "akashdeep.dhar@gmail.com"},"project_id":42729361,"created_at":"2023-01-19T14:07:05.823Z","updated_at":"2023-01-20T11:48:02.495Z","description":"Test open issue","iid":2,"updated_by_id":null,"weight":null,"confidential":false,"due_date":null,"lock_version":0,"time_estimate":0,"relative_position":1026,"last_edited_at":null,"last_edited_by_id":null,"discussion_locked":null,"closed_at":null,"closed_by_id":null,"health_status":null,"external_key":null,"issue_type":"issue","state":"opened","events":[{"project_id":42729361,"author_id":3072529,"created_at":"2023-01-19T14:07:05.930Z","updated_at":"2023-01-19T14:07:05.930Z","action":"created","target_type":"Issue","fingerprint":null}],"timelogs":[],"notes":[{"note":"![Infra___Releng_2022](/uploads/8b4f7247f154d0b77c5d7d13e16cb904/Infra___Releng_2022.jpg)","noteable_type":"Issue","author_id":3072529,"created_at":"2023-01-20T11:48:02.435Z","updated_at":"2023-01-20T11:48:02.435Z","project_id":42729361,"attachment":{"url":null},"line_code":null,"commit_id":null,"st_diff":null,"system":false,"updated_by_id":null,"type":null,"position":null,"original_position":null,"resolved_at":null,"resolved_by_id":null,"discussion_id":"30302c7dee98663fcfca845a2ec2715eb3e35e4f","change_position":null,"resolved_by_push":null,"confidential":null,"last_edited_at":"2023-01-20T11:48:02.435Z","author":{"name":"Zlopez"},"award_emoji":[],"events":[{"project_id":42729361,"author_id":3072529,"created_at":"2023-01-20T11:48:02.617Z","updated_at":"2023-01-20T11:48:02.617Z","action":"commented","target_type":"Note","fingerprint":null}]},{"note":"added [1 design](/testgroup519/arc/-/issues/2/designs?version=490993)","noteable_type":"Issue","author_id":3072529,"created_at":"2023-01-19T14:07:45.310Z","updated_at":"2023-01-19T14:07:45.315Z","project_id":42729361,"attachment":{"url":null},"line_code":null,"commit_id":null,"st_diff":null,"system":true,"updated_by_id":null,"type":null,"position":null,"original_position":null,"resolved_at":null,"resolved_by_id":null,"discussion_id":"e15e7c584cc7e6c7e298529f034f0b55eeacca90","change_position":null,"resolved_by_push":null,"confidential":null,"last_edited_at":"2023-01-19T14:07:45.315Z","author":{"name":"Zlopez"},"award_emoji":[],"system_note_metadata":{"commit_count":null,"action":"designs_added","created_at":"2023-01-19T14:07:45.343Z","updated_at":"2023-01-19T14:07:45.343Z"},"events":[]}],"label_links":[],"resource_label_events":[],"resource_milestone_events":[],"resource_state_events":[],"designs":[{"project_id":42729361,"filename":"Infra___Releng_2022.jpg","relative_position":null,"iid":1,"notes":[]}],"design_versions":[{"sha":"69419c215f53d401c1b3c451e6fc08e3351d2679","created_at":"2023-01-19T14:07:45.233Z","author_id":3072529,"actions":[{"event":"creation","design":{"project_id":42729361,"filename":"Infra___Releng_2022.jpg","relative_position":null,"iid":1}}]}],"issue_assignees":[],"zoom_meetings":[],"award_emoji":[],"resource_iteration_events":[]}
    

Importing the archive to GitLab

Archive for the migration is prepared by executing tar -czvf test_arc_export.tar.gz . command. This needs to be executed in the root folder of the prepared file structure, otherwise the import will fail with No such file or directory.

To import the archive to GitLab API call could be used. Here is the full API call made by curl:

curl --request POST --header "PRIVATE-TOKEN: XXX" --form "namespace=testgroup519" --form "path=arc2" --form "file=@test_arc_export.tar.gz" "https://gitlab.com/api/v4/projects/import"

To check for any error in the import use GitLab import status API call. This could be made by curl:

curl --header "PRIVATE-TOKEN: XXX" "https://gitlab.com/api/v4/projects/<id_returned_by_import_call>/import"

Conclusion

At this point I ended up with the investigation, because the situation is the same as in case of using API. Which is much more convenient to use and provides a better response in case of errors (I spent two days trying to debug No such file or directory [FILTERED] error message).