|
| 1 | +.. Copyright (c) 2025 - 2025, Oracle and/or its affiliates. All rights reserved. |
| 2 | +.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. |
| 3 | +
|
| 4 | +.. _tutorial-gen-build-spec: |
| 5 | + |
| 6 | +********************************************************* |
| 7 | +Rebuilding Third-Party Artifacts from Source with Macaron |
| 8 | +********************************************************* |
| 9 | + |
| 10 | +In this tutorial, you'll learn how to use Macaron's new ``gen-build-spec`` command to automatically generate build specification (buildspec) files from analyzed software packages. |
| 11 | +These buildspecs help document and automate the build process for packages, enabling reproducibility and ease of integration with infrastructures such as Reproducible Central. |
| 12 | + |
| 13 | +.. list-table:: |
| 14 | + :widths: 25 |
| 15 | + :header-rows: 1 |
| 16 | + |
| 17 | + * - Currently Supported packages |
| 18 | + * - Maven packages built with Gradle or Maven |
| 19 | + |
| 20 | +.. contents:: :local: |
| 21 | + |
| 22 | +********** |
| 23 | +Motivation |
| 24 | +********** |
| 25 | + |
| 26 | +Software ecosystems such as Maven Central are foundational to modern software supply chains, providing centralized repositories for libraries, plugins, and other components. However, one ongoing challenge is the separation between distributed binaries and their corresponding source code and build processes. For example, in Maven Central, there is often no direct, transparent link between a published binary and the environment in which it was built. In fact, recent studies show that around 84% of the top 1200 most commonly used Java artifacts are not built through transparent CI/CD pipelines. |
| 27 | + |
| 28 | +This lack of transparency introduces security risks: users must trust not just the upstream source code, but also the build environment itself—including tools, plugins, and configuration details—which may not be visible or reproducible. As supply chain security becomes increasingly critical, rebuilding artifacts from source has become an essential strategy. This process enables deeper code review, verification of binary-source equivalence, and greater control over dependencies. |
| 29 | + |
| 30 | +However, rebuilding artifacts reliably is difficult due to differences in build environments (such as JDK versions or specific build commands), and the challenge only increases with large, complex dependency graphs. Macaron addresses these issues by automating the extraction of build specifications from open CI/CD workflows (like GitHub Actions), improving source code detection, and providing the tools needed to make reproducible rebuilds easier and more robust. By supporting this workflow, Macaron helps increase both the security and transparency of the open-source software supply chain. |
| 31 | + |
| 32 | +********** |
| 33 | +Background |
| 34 | +********** |
| 35 | + |
| 36 | +A build specification is a file that describes all necessary information to rebuild a package from source. This includes metadata such as the build tool, the specific build command to run, the language version, e.g., JDK for Java, and artifact coordinates. Macaron can now generate this file automatically for supported ecosystems, greatly simplifying reproducible builds. |
| 37 | + |
| 38 | +The generated buildspec will be stored in an ecosystem- and PURL-specific path under the ``output/`` directory (see more under :ref:`Output Files Guide <output_files_guide>`). |
| 39 | + |
| 40 | +****************************** |
| 41 | +Installation and Prerequisites |
| 42 | +****************************** |
| 43 | + |
| 44 | +You need: |
| 45 | + |
| 46 | +* Docker |
| 47 | +* Macaron image (see :ref:`installation-guide`) |
| 48 | +* GitHub Token (see :ref:`prepare-github-token`) |
| 49 | + |
| 50 | +************************* |
| 51 | +Step 1: Analyze a Package |
| 52 | +************************* |
| 53 | + |
| 54 | +Before generating a buildspec, Macaron must first analyze the target package. For example, to analyze a Maven Java package: |
| 55 | + |
| 56 | +.. code-block:: shell |
| 57 | +
|
| 58 | + ./run_macaron.sh analyze -purl pkg:maven/org.apache.hugegraph/[email protected] |
| 59 | +
|
| 60 | +This command will inspect the source repository, CI/CD configuration, and extract build-related data into the local database at ``output/macaron.db``. |
| 61 | + |
| 62 | +******************************************* |
| 63 | +Step 2: Generate a Build Specification File |
| 64 | +******************************************* |
| 65 | + |
| 66 | +After analysis is complete, you can generate a buildspec for the package using the ``gen-build-spec`` command. |
| 67 | + |
| 68 | +.. code-block:: shell |
| 69 | +
|
| 70 | + ./run_macaron.sh gen-build-spec -purl pkg:maven/org.apache.hugegraph/[email protected] --database output/macaron.db |
| 71 | +
|
| 72 | +
|
| 73 | +After execution, the buildspec will be created at: |
| 74 | + |
| 75 | +.. code-block:: text |
| 76 | +
|
| 77 | + output/<purl_based_path>/macaron.buildspec |
| 78 | +
|
| 79 | +where ``<purl_based_path>`` is the directory structure according to the PackageURL (PURL). |
| 80 | + |
| 81 | +In the example above, the buildspec is located at: |
| 82 | + |
| 83 | +.. code-block:: text |
| 84 | +
|
| 85 | + output/maven/org_apache_hugegraph/computer-k8s/macaron.buildspec |
| 86 | +
|
| 87 | +***************************************** |
| 88 | +Step 3: Review and Use the Buildspec File |
| 89 | +***************************************** |
| 90 | + |
| 91 | +The generated buildspec uses the `Reproducible Central buildspec <https://reproducible-central.org/spec/>`_ format, for example: |
| 92 | + |
| 93 | +.. code-block:: ini |
| 94 | +
|
| 95 | + # Copyright (c) 2025, Oracle and/or its affiliates. |
| 96 | + # Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. |
| 97 | + # Generated by Macaron version 0.15.0 |
| 98 | + # Input PURL - pkg:maven/org.apache.hugegraph/[email protected] |
| 99 | + # Initial default JDK version 8 and default build command [['mvn', '-DskipTests=true', ... ]] |
| 100 | + groupId=org.apache.hugegraph |
| 101 | + artifactId=computer-k8s |
| 102 | + version=1.0.0 |
| 103 | + gitRepo=https://github.com/apache/hugegraph-computer |
| 104 | + gitTag=d2b95262091d6572cc12dcda57d89f9cd44ac88b |
| 105 | + tool=mvn |
| 106 | + jdk=8 |
| 107 | + newline=lf |
| 108 | + command="mvn -DskipTests=true -Dmaven.test.skip=true -Dmaven.site.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true clean package" |
| 109 | + buildinfo=target/computer-k8s-1.0.0.buildinfo |
| 110 | +
|
| 111 | +You can now use this file to automate reproducible builds, for example as part of the Reproducible Central infrastructure. |
| 112 | + |
| 113 | +******************************* |
| 114 | +How It Works: Behind the Scenes |
| 115 | +******************************* |
| 116 | + |
| 117 | +The ``gen-build-spec`` command extracts build data from Macaron’s SQLite database, using several modules: |
| 118 | + |
| 119 | +- **macaron_db_extractor.py:** extracts metadata and build information using SQLAlchemy ORM mapped classes. |
| 120 | +- **Maven and Gradle CLI Parsers:** parses and patches build commands from CI/CD configs, to ensure compatibility with reproducible build systems. |
| 121 | +- **jdk_finder.py:** identifies the JDK version by parsing CI/CD config or, when unavailable, extracting it from ``META-INF/MANIFEST.MF`` in Maven Central artifacts. |
| 122 | +- **jdk_version_normalizer.py:** ensures only the major JDK version is included, as required by the buildspec format. |
| 123 | + |
| 124 | +This feature is described in more detail in our accepted ASE 2025 Industry ShowCase paper: `"Unlocking Reproducibility: Automating re-Build Process for Open-Source Software" <https://arxiv.org/pdf/2509.08204>`_. |
| 125 | + |
| 126 | +*********************************** |
| 127 | +Frequently Asked Questions (FAQs) |
| 128 | +*********************************** |
| 129 | + |
| 130 | +*Q: What formats are supported for buildspec output?* |
| 131 | +A: Currently, only ``rc-buildspec`` is supported. |
| 132 | + |
| 133 | +*Q: Do I need to analyze the package every time before generating a buildspec?* |
| 134 | +A: No, you only need to analyze the package once unless you want to update the database with newer information. |
| 135 | + |
| 136 | +*Q: Can Macaron generate buildspecs for other ecosystems besides Maven?* |
| 137 | +A: Ecosystem support is actively expanding. See :ref:`Supported Builds <supported_build_gen_tools>` for the latest details. |
| 138 | + |
| 139 | +*********************************** |
| 140 | +Future Work and Contributions |
| 141 | +*********************************** |
| 142 | + |
| 143 | +We plan to support more ecosystems, deeper integration with artifact repositories, and more user-configurable buildspec options. Contributions are welcome! |
| 144 | + |
| 145 | +*********************************** |
| 146 | +See Also |
| 147 | +*********************************** |
| 148 | + |
| 149 | +- :ref:`Output Files Guide <output_files_guide>` |
| 150 | +- :ref:`installation-guide` |
| 151 | +- :ref:`Supported Builds <supported_build_gen_tools>` |
0 commit comments