Skip to content

Commit b33ec0a

Browse files
committed
docs: add a tutorial for gen-build-spec
Signed-off-by: behnazh-w <[email protected]>
1 parent b0c46fa commit b33ec0a

File tree

4 files changed

+164
-1
lines changed

4 files changed

+164
-1
lines changed

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,15 +76,19 @@ To learn how to define your own checks, see the steps in the [checks documentati
7676

7777
* Behnaz Hassanshahi, Trong Nhan Mai, Alistair Michael, Benjamin Selwyn-Smith, Sophie Bates, and Padmanabhan Krishnan: [Macaron: A Logic-based Framework for Software Supply Chain Security Assurance](https://dl.acm.org/doi/abs/10.1145/3605770.3625213), SCORED 2023. Best paper award :trophy:
7878

79+
* * Behnaz Hassanshahi, Trong Nhan Mai, Benjamin Selwyn-Smith, and Nicholas Allen: [Unlocking Reproducibility: Automating re-Build Process for Open-Source Software](https://arxiv.org/pdf/2509.08204), ASE Industry Showcase 2025.
80+
7981
* Ridwan Shariffdeen, Behnaz Hassanshahi, Martin Mirchev, Ali El Husseini, Abhik Roychoudhury [Detecting Python Malware in the Software Supply Chain with Program Analysis](https://labs.oracle.com/pls/apex/f?p=94065:10:11591088449483:11569), ICSE-SEIP 2025.
8082

8183
* Jens Dietrich, Tim White, Behnaz Hassanshahi, Paddy Krishnan [Levels of Binary Equivalence for the Comparison of Binaries
82-
from Alternative Builds](https://arxiv.org/pdf/2410.08427), pre-print on arXiv.
84+
from Alternative Builds](https://arxiv.org/pdf/2410.08427), ICSME Industry Track 2025.
8385

8486
* Jens Dietrich, Tim White, Valerio Terragni, Behnaz Hassanshahi [Towards Cross-Build Differential Testing](https://labs.oracle.com/pls/apex/f?p=94065:10:11591088449483:11549), ICST 2025.
8587

8688
* Jens Dietrich, Tim White, Mohammad Mahdi Abdollahpour, Elliott Wen, Behnaz Hassanshahi [BinEq-A Benchmark of Compiled Java Programs to Assess Alternative Builds](https://dl.acm.org/doi/10.1145/3689944.3696162), SCORED 2024.
8789

90+
* Jens Dietrich and Behnaz Hassanshahi [DALEQ--Explainable Equivalence for Java Bytecode](https://arxiv.org/pdf/2508.01530), ASE Industry Showcase 2025.
91+
8892
## Security
8993

9094
Please consult the [security guide](./SECURITY.md) for our responsible security vulnerability disclosure process.

docs/source/pages/supported_technologies/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,13 @@ such as GitHub Actions workflows.
2323
* Go
2424
* Docker
2525

26+
.. _supported_build_gen_tools:
27+
28+
------------------------------
29+
Build Specification Generation
30+
------------------------------
31+
32+
* Maven and Gradle builds for Java artifacts
2633

2734
.. _supported_git_services:
2835

docs/source/pages/tutorials/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ For the full list of supported technologies, such as CI services, registries, an
1919

2020
commit_finder
2121
detect_malicious_package
22+
rebuild_third_party_artifacts
2223
detect_vulnerable_github_actions
2324
provenance
2425
detect_malicious_java_dep
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
.. Copyright (c) 2025 - 2025, Oracle and/or its affiliates. All rights reserved.
2+
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
3+
4+
.. _tutorial-gen-build-spec:
5+
6+
*********************************************************
7+
Rebuilding Third-Party Artifacts from Source with Macaron
8+
*********************************************************
9+
10+
In this tutorial, you'll learn how to use Macaron's new ``gen-build-spec`` command to automatically generate build specification (buildspec) files from analyzed software packages.
11+
These buildspecs help document and automate the build process for packages, enabling reproducibility and ease of integration with infrastructures such as Reproducible Central.
12+
13+
.. list-table::
14+
:widths: 25
15+
:header-rows: 1
16+
17+
* - Currently Supported packages
18+
* - Maven packages built with Gradle or Maven
19+
20+
.. contents:: :local:
21+
22+
**********
23+
Motivation
24+
**********
25+
26+
Software ecosystems such as Maven Central are foundational to modern software supply chains, providing centralized repositories for libraries, plugins, and other components. However, one ongoing challenge is the separation between distributed binaries and their corresponding source code and build processes. For example, in Maven Central, there is often no direct, transparent link between a published binary and the environment in which it was built. In fact, recent studies show that around 84% of the top 1200 most commonly used Java artifacts are not built through transparent CI/CD pipelines.
27+
28+
This lack of transparency introduces security risks: users must trust not just the upstream source code, but also the build environment itself—including tools, plugins, and configuration details—which may not be visible or reproducible. As supply chain security becomes increasingly critical, rebuilding artifacts from source has become an essential strategy. This process enables deeper code review, verification of binary-source equivalence, and greater control over dependencies.
29+
30+
However, rebuilding artifacts reliably is difficult due to differences in build environments (such as JDK versions or specific build commands), and the challenge only increases with large, complex dependency graphs. Macaron addresses these issues by automating the extraction of build specifications from open CI/CD workflows (like GitHub Actions), improving source code detection, and providing the tools needed to make reproducible rebuilds easier and more robust. By supporting this workflow, Macaron helps increase both the security and transparency of the open-source software supply chain.
31+
32+
**********
33+
Background
34+
**********
35+
36+
A build specification is a file that describes all necessary information to rebuild a package from source. This includes metadata such as the build tool, the specific build command to run, the language version, e.g., JDK for Java, and artifact coordinates. Macaron can now generate this file automatically for supported ecosystems, greatly simplifying reproducible builds.
37+
38+
The generated buildspec will be stored in an ecosystem- and PURL-specific path under the ``output/`` directory (see more under :ref:`Output Files Guide <output_files_guide>`).
39+
40+
******************************
41+
Installation and Prerequisites
42+
******************************
43+
44+
You need:
45+
46+
* Docker
47+
* Macaron image (see :ref:`installation-guide`)
48+
* GitHub Token (see :ref:`prepare-github-token`)
49+
50+
*************************
51+
Step 1: Analyze a Package
52+
*************************
53+
54+
Before generating a buildspec, Macaron must first analyze the target package. For example, to analyze a Maven Java package:
55+
56+
.. code-block:: shell
57+
58+
./run_macaron.sh analyze -purl pkg:maven/org.apache.hugegraph/[email protected]
59+
60+
This command will inspect the source repository, CI/CD configuration, and extract build-related data into the local database at ``output/macaron.db``.
61+
62+
*******************************************
63+
Step 2: Generate a Build Specification File
64+
*******************************************
65+
66+
After analysis is complete, you can generate a buildspec for the package using the ``gen-build-spec`` command.
67+
68+
.. code-block:: shell
69+
70+
./run_macaron.sh gen-build-spec -purl pkg:maven/org.apache.hugegraph/[email protected] --database output/macaron.db
71+
72+
73+
After execution, the buildspec will be created at:
74+
75+
.. code-block:: text
76+
77+
output/<purl_based_path>/macaron.buildspec
78+
79+
where ``<purl_based_path>`` is the directory structure according to the PackageURL (PURL).
80+
81+
In the example above, the buildspec is located at:
82+
83+
.. code-block:: text
84+
85+
output/maven/org_apache_hugegraph/computer-k8s/macaron.buildspec
86+
87+
*****************************************
88+
Step 3: Review and Use the Buildspec File
89+
*****************************************
90+
91+
The generated buildspec uses the `Reproducible Central buildspec <https://reproducible-central.org/spec/>`_ format, for example:
92+
93+
.. code-block:: ini
94+
95+
# Copyright (c) 2025, Oracle and/or its affiliates.
96+
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
97+
# Generated by Macaron version 0.15.0
98+
# Input PURL - pkg:maven/org.apache.hugegraph/[email protected]
99+
# Initial default JDK version 8 and default build command [['mvn', '-DskipTests=true', ... ]]
100+
groupId=org.apache.hugegraph
101+
artifactId=computer-k8s
102+
version=1.0.0
103+
gitRepo=https://github.com/apache/hugegraph-computer
104+
gitTag=d2b95262091d6572cc12dcda57d89f9cd44ac88b
105+
tool=mvn
106+
jdk=8
107+
newline=lf
108+
command="mvn -DskipTests=true -Dmaven.test.skip=true -Dmaven.site.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true clean package"
109+
buildinfo=target/computer-k8s-1.0.0.buildinfo
110+
111+
You can now use this file to automate reproducible builds, for example as part of the Reproducible Central infrastructure.
112+
113+
*******************************
114+
How It Works: Behind the Scenes
115+
*******************************
116+
117+
The ``gen-build-spec`` command extracts build data from Macaron’s SQLite database, using several modules:
118+
119+
- **macaron_db_extractor.py:** extracts metadata and build information using SQLAlchemy ORM mapped classes.
120+
- **Maven and Gradle CLI Parsers:** parses and patches build commands from CI/CD configs, to ensure compatibility with reproducible build systems.
121+
- **jdk_finder.py:** identifies the JDK version by parsing CI/CD config or, when unavailable, extracting it from ``META-INF/MANIFEST.MF`` in Maven Central artifacts.
122+
- **jdk_version_normalizer.py:** ensures only the major JDK version is included, as required by the buildspec format.
123+
124+
This feature is described in more detail in our accepted ASE 2025 Industry ShowCase paper: `"Unlocking Reproducibility: Automating re-Build Process for Open-Source Software" <https://arxiv.org/pdf/2509.08204>`_.
125+
126+
***********************************
127+
Frequently Asked Questions (FAQs)
128+
***********************************
129+
130+
*Q: What formats are supported for buildspec output?*
131+
A: Currently, only ``rc-buildspec`` is supported.
132+
133+
*Q: Do I need to analyze the package every time before generating a buildspec?*
134+
A: No, you only need to analyze the package once unless you want to update the database with newer information.
135+
136+
*Q: Can Macaron generate buildspecs for other ecosystems besides Maven?*
137+
A: Ecosystem support is actively expanding. See :ref:`Supported Builds <supported_build_gen_tools>` for the latest details.
138+
139+
***********************************
140+
Future Work and Contributions
141+
***********************************
142+
143+
We plan to support more ecosystems, deeper integration with artifact repositories, and more user-configurable buildspec options. Contributions are welcome!
144+
145+
***********************************
146+
See Also
147+
***********************************
148+
149+
- :ref:`Output Files Guide <output_files_guide>`
150+
- :ref:`installation-guide`
151+
- :ref:`Supported Builds <supported_build_gen_tools>`

0 commit comments

Comments
 (0)