Skip to content

Commit 6b3aa95

Browse files
authored
Fix loading jars with encoded paths (#2759)
Fixes #2576 and #2103, which are both that the model assembler can't load jars if the jar's path has percent-encoded characters. This may happen for artifacts resolved by coursier (see linked issues). The model assembler can load Smithy files from a jar - model discovery. This works by looking for a 'META-INF/smithy/manifest' file within the jar, which contains the paths of smithy model files within the jar. You can give the model assembler a class loader to use for model discovery, but you can also add a jar to discover models from using a regular `addImport`. The problem is with the latter case. Because jars are like zip files, you can't read files within the jar using regular file system operations. Instead, you can either use the JarFile api to open the jar, and read specific entries (like a zip file), or you can directly read specific entries via a special URL with the format `jar:file:/path/to/jar.jar!/path/in/jar`. The latter is the way that our model discovery apis work. The problem was that URL doesn't perform any encoding or decoding itself, so callers/consumers have to take care of it, which we were not doing. For example, if the jar file has a path of `/foo/bar%45baz.jar`, we would create a URL like: `jar:file:/foo/bar%45baz.jar!/META-INF/smithy/manifest`, and when the JDK tried to use this URL to read from the file system, it would decode it and attempt to read `/foo/[email protected]`. Since we weren't encoding the URL to begin with, this was only a problem when the path contained `%`. The URL class has a section in its javadoc about this footgun, strongly recommending using `Path::toURI`, `URI::toURL`, when you want to make URLs from paths. And in more recent jdk versions, URL constructors are deprecated in favor of `URI::toURL`. I found that the `Path::toURI` part is particularly important, because it handles encoding for paths correctly, and makes sure windows paths are handled properly. I couldn't use `URI::toURL` though, because the manifest path needs to be appended, and the scheme changed to `jar:` (technically the `file:/...` is all the "scheme-specific part"). So instead, when creating a manifest URL from a path, `createSmithyJarManifestUrl` will now create a `Path` and convert that to `URI` to get an encoded `file:/` URI that is also correct for windows, then constructs the final `URL` using the string form of that `URI`. This should still meet `URL`'s requirement that it be created with an already-correct URL. Another thing to note is that `createSmithyJarManifestUrl` can accept stuff like `file:/foo.jar` or `jar:file:/foo.jar`. This change does not apply to those - they aren't paths obviously, and we don't know if they were properly encoded by the caller. If they were, trying to extract the path and do the same path -> URI -> URL conversion would result in a double-encoded URL that doesn't point to the correct location, and trying to instead just do URI -> URL is pointless (also `jar:file:/foo.jar` can't be directly turned into a `URI` - the `jar:` scheme requires the `!/` part). So for these, the previous behavior is kept, and I added a bit to the javadoc to say they need to be valid URLs. This technically changes the `SourceLocation::filename` for jars that are `addImport`ed, when the jar path has reserved characters (like spaces). Such jars were importable before because URL doesn't validate encoding, and the JDK decoding the URL would be a no-op since it wasn't encoded to begin with. So we could _just_ encode `%`, meaning only the newly-allowed paths would have any encoding. Disregarding the fact that this side-steps the assumed pre-conditions of URL, it would actually exacerbate an existing inconsistency between `SourceLocation::filename` of models discovered on the classpath, and models discovered from imports. The JDK encodes the URLs of resources on the classpath, so their models' source locations will be encoded. There was an additional inconsistency on windows, where imported jar models would have a URL like `jar:file:C:\foo\bar.jar!/META-INF/smithy/foo.smithy`, which the jdk could still gracefully read from, but is incorrect and inconsistent with discovered models which correctly look like `jar:file:/C:/foo/bar.jar!/META-INF/smithy/foo.smithy`. So this change means that all `jar:file:` `SourceLocation::filename`s will look the same and are well-formed URLs/URIs. As a consequence of changing `SourceLocation::filename`, I had to fix some code in PluginContext which was relying on being able to compare the source location filename with a stringified version of each configured source path, in order to figure out which shapes/metadata are part of sources. This was working because 1. Encoded paths didn't work at all, and 2. `jar:file:` filenames on windows had windows path separators. So I updated this comparison code to compare `jar:file:` filenames to a string version of each configured source's _URI_. The fact that the specific format of the filename was being relied on is mildly concerning, but considering the fact that models discovered on the classpath already had a different format, I think this is ok. I would also be surprised if there's a lot of code out there manually importing jars, instead of providing a classloader.
1 parent 5bd4acf commit 6b3aa95

File tree

8 files changed

+213
-12
lines changed

8 files changed

+213
-12
lines changed

smithy-build/src/main/java/software/amazon/smithy/build/PluginContext.java

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
package software.amazon.smithy.build;
66

77
import java.nio.file.Path;
8+
import java.util.ArrayList;
89
import java.util.Collection;
910
import java.util.Collections;
1011
import java.util.List;
@@ -37,6 +38,7 @@ public final class PluginContext implements ToSmithyBuilder<PluginContext> {
3738
private final ClassLoader pluginClassLoader;
3839
private final Set<Path> sources;
3940
private final String artifactName;
41+
private final List<String> sourceUris;
4042
private Model nonTraitsModel;
4143

4244
private PluginContext(Builder builder) {
@@ -50,6 +52,15 @@ private PluginContext(Builder builder) {
5052
settings = builder.settings;
5153
pluginClassLoader = builder.pluginClassLoader;
5254
sources = builder.sources.copy();
55+
// Normalize the source paths into URI strings, which are URI encoded and use '/' separators,
56+
// which is the format of paths in 'jar:file:/' filenames, so they can be compared in the
57+
// 'isSource*' methods.
58+
// This is done preemptively so we don't have to do the same work repeatedly, and the number
59+
// of configured sources is typically very low.
60+
sourceUris = new ArrayList<>(sources.size());
61+
for (Path source : sources) {
62+
sourceUris.add(source.toUri().toString());
63+
}
5364
}
5465

5566
/**
@@ -223,6 +234,21 @@ private boolean isSource(FromSourceLocation sourceLocation) {
223234
String location = sourceLocation.getSourceLocation().getFilename();
224235
int offsetFromStart = findOffsetFromStart(location);
225236

237+
if (offsetFromStart > 0) {
238+
// Offset means the filename is a URI, so compare to the URI paths to account for encoding and windows.
239+
for (String sourceUri : sourceUris) {
240+
// Compare starting after the protocol (the source uri will always be "file", because we created it
241+
// from a path).
242+
int regionCompareLength = sourceUri.length() - 5;
243+
if (location.regionMatches(offsetFromStart, sourceUri, 5, regionCompareLength)) {
244+
return true;
245+
}
246+
}
247+
248+
return false;
249+
}
250+
251+
// Filename is a regular path, so we can compare to sources directly.
226252
for (Path path : sources) {
227253
String pathString = path.toString();
228254
int offsetFromStartInSource = findOffsetFromStart(pathString);

smithy-build/src/test/java/software/amazon/smithy/build/plugins/SourcesPluginTest.java

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,16 @@
1010
import static org.hamcrest.Matchers.is;
1111
import static org.hamcrest.Matchers.not;
1212

13+
import java.io.File;
14+
import java.io.IOException;
1315
import java.net.URISyntaxException;
16+
import java.nio.file.Files;
17+
import java.nio.file.Path;
1418
import java.nio.file.Paths;
19+
import java.util.Comparator;
20+
import org.junit.jupiter.api.AfterEach;
1521
import org.junit.jupiter.api.Assertions;
22+
import org.junit.jupiter.api.BeforeEach;
1623
import org.junit.jupiter.api.Test;
1724
import software.amazon.smithy.build.MockManifest;
1825
import software.amazon.smithy.build.PluginContext;
@@ -26,6 +33,19 @@
2633
import software.amazon.smithy.utils.ListUtils;
2734

2835
public class SourcesPluginTest {
36+
37+
private Path tempDirectory;
38+
39+
@BeforeEach
40+
public void before() throws IOException {
41+
tempDirectory = Files.createTempDirectory(getClass().getSimpleName());
42+
}
43+
44+
@AfterEach
45+
public void after() throws IOException {
46+
Files.walk(tempDirectory).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
47+
}
48+
2949
@Test
3050
public void copiesFilesForSourceProjection() throws URISyntaxException {
3151
Model model = Model.assembler()
@@ -236,4 +256,69 @@ public void omitsUnsupportedFilesFromManifest() throws URISyntaxException {
236256
assertThat(manifestString, containsString("a.smithy\n"));
237257
assertThat(manifestString, not(containsString("foo.md")));
238258
}
259+
260+
@Test
261+
public void copiesModelFromJarWithEncodedPathWithSourceProjection()
262+
throws IOException, URISyntaxException {
263+
Path source = Paths.get(getClass().getResource("sources/jar-import.jar").toURI());
264+
Path jarPath = tempDirectory.resolve(Paths.get("special%45path", "special.jar"));
265+
266+
Files.createDirectories(jarPath.getParent());
267+
Files.copy(source, jarPath);
268+
269+
Model model = Model.assembler()
270+
.addImport(jarPath)
271+
.addImport(getClass().getResource("notsources/d.smithy"))
272+
.assemble()
273+
.unwrap();
274+
MockManifest manifest = new MockManifest();
275+
PluginContext context = PluginContext.builder()
276+
.fileManifest(manifest)
277+
.model(model)
278+
.originalModel(model)
279+
.sources(ListUtils.of(jarPath))
280+
.build();
281+
new SourcesPlugin().execute(context);
282+
String manifestString = manifest.getFileString("manifest").get();
283+
284+
assertThat(manifestString, containsString("special/a.smithy\n"));
285+
assertThat(manifestString, containsString("special/b/b.smithy\n"));
286+
assertThat(manifestString, containsString("special/b/c/c.json\n"));
287+
assertThat(manifestString, not(containsString("jar-import/d.json")));
288+
assertThat(manifest.getFileString("special/a.smithy").get(), containsString("string A"));
289+
assertThat(manifest.getFileString("special/b/b.smithy").get(), containsString("string B"));
290+
assertThat(manifest.getFileString("special/b/c/c.json").get(), containsString("\"foo.baz#C\""));
291+
}
292+
293+
@Test
294+
public void copiesModelFromJarWithEncodedPathWithNonSourceProjection()
295+
throws IOException, URISyntaxException {
296+
Path source = Paths.get(getClass().getResource("sources/jar-import.jar").toURI());
297+
Path jarPath = tempDirectory.resolve(Paths.get("special%45path", "special.jar"));
298+
299+
Files.createDirectories(jarPath.getParent());
300+
Files.copy(source, jarPath);
301+
302+
Model model = Model.assembler()
303+
.addImport(jarPath)
304+
.assemble()
305+
.unwrap();
306+
MockManifest manifest = new MockManifest();
307+
ProjectionConfig projection = ProjectionConfig.builder().build();
308+
PluginContext context = PluginContext.builder()
309+
.fileManifest(manifest)
310+
.projection("foo", projection)
311+
.model(model)
312+
.originalModel(model)
313+
.sources(ListUtils.of(jarPath))
314+
.build();
315+
new SourcesPlugin().execute(context);
316+
String manifestString = manifest.getFileString("manifest").get();
317+
318+
assertThat(manifestString, containsString("model.json"));
319+
assertThat(manifestString, not(containsString("special")));
320+
assertThat(manifest.getFileString("model.json").get(), containsString("\"foo.baz#A\""));
321+
assertThat(manifest.getFileString("model.json").get(), containsString("\"foo.baz#B\""));
322+
assertThat(manifest.getFileString("model.json").get(), containsString("\"foo.baz#C\""));
323+
}
239324
}

smithy-model/src/main/java/software/amazon/smithy/model/loader/ModelDiscovery.java

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,11 @@
1010
import java.io.IOException;
1111
import java.io.InputStream;
1212
import java.io.InputStreamReader;
13+
import java.net.URI;
1314
import java.net.URL;
1415
import java.net.URLConnection;
1516
import java.nio.charset.StandardCharsets;
17+
import java.nio.file.Paths;
1618
import java.util.ArrayList;
1719
import java.util.Enumeration;
1820
import java.util.LinkedHashSet;
@@ -180,27 +182,36 @@ public static String getSmithyModelPathFromJarUrl(URL modelUrl) {
180182
* to a file, (e.g., "/foo/baz.jar"), a file URL (e.g., "file:/baz.jar"),
181183
* or a JAR URL (e.g., "jar:file:/baz.jar").
182184
*
185+
* <p>If {@code fileOrUrl} is a URL, it must be absolute and percent-encoded
186+
* to ensure the manifest is read from the correct location.
187+
*
183188
* @param fileOrUrl Filename or URL that points to a JAR.
184189
* @return Returns the computed URL.
185190
*/
186191
public static URL createSmithyJarManifestUrl(String fileOrUrl) {
187192
try {
188-
return new URL(getFilenameWithScheme(fileOrUrl) + "!/" + MANIFEST_PATH);
193+
// URL expects callers to perform percent-encoding before construction,
194+
// and for consumers to perform decoding. When given a URL-like string,
195+
// (e.g. file:/foo.jar), we have to assume it is properly encoded because
196+
// otherwise we risk double-encoding. For file paths, we have to encode
197+
// to make sure that when the consumer of the URL performs decoding (i.e.
198+
// to read from the file system), they will get back the original path.
199+
String jarUrl;
200+
if (fileOrUrl.startsWith("jar:")) {
201+
jarUrl = fileOrUrl;
202+
} else if (fileOrUrl.startsWith("file:")) {
203+
jarUrl = "jar:" + fileOrUrl;
204+
} else {
205+
URI jarUri = Paths.get(fileOrUrl).toUri();
206+
jarUrl = "jar:" + jarUri;
207+
}
208+
209+
return new URL(jarUrl + "!/" + MANIFEST_PATH);
189210
} catch (IOException e) {
190211
throw new ModelImportException(e.getMessage(), e);
191212
}
192213
}
193214

194-
private static String getFilenameWithScheme(String filename) {
195-
if (filename.startsWith("jar:")) {
196-
return filename;
197-
} else if (filename.startsWith("file:")) {
198-
return "jar:" + filename;
199-
} else {
200-
return "jar:file:" + filename;
201-
}
202-
}
203-
204215
private static Set<String> parseManifest(URL location) throws IOException {
205216
Set<String> models = new LinkedHashSet<>();
206217
URLConnection connection = location.openConnection();

smithy-model/src/test/java/software/amazon/smithy/model/loader/ModelAssemblerTest.java

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1459,4 +1459,57 @@ public void handlesSameModelWhenBuiltAndImported() throws Exception {
14591459
assertTrue(combinedModel.expectShape(ShapeId.from("smithy.example#MachineData$machineId"), MemberShape.class)
14601460
.hasTrait(RequiredTrait.ID));
14611461
}
1462+
1463+
@Test
1464+
public void importsJarFromPathWithEncodedChars() throws Exception {
1465+
Path source = Paths.get(getClass().getResource("assembler-valid-jar").toURI());
1466+
Path jarPath = outputDirectory.resolve(Paths.get("path%20with%45encoded%25chars", "test.jar"));
1467+
1468+
Files.createDirectories(jarPath.getParent());
1469+
Files.copy(JarUtils.createJarFromDir(source), jarPath);
1470+
1471+
Model model = Model.assembler()
1472+
.addImport(jarPath)
1473+
.assemble()
1474+
.unwrap();
1475+
1476+
assertTrue(model.getShape(ShapeId.from("smithy.example#ExampleString")).isPresent());
1477+
}
1478+
1479+
@Test
1480+
public void importsJarFromUrlWithEncodedChars() throws Exception {
1481+
Path source = Paths.get(getClass().getResource("assembler-valid-jar").toURI());
1482+
Path jarPath = outputDirectory.resolve(Paths.get("path%20with%45encoded%25chars", "test.jar"));
1483+
1484+
Files.createDirectories(jarPath.getParent());
1485+
Files.copy(JarUtils.createJarFromDir(source), jarPath);
1486+
1487+
Model model = Model.assembler()
1488+
.addImport(jarPath.toUri().toURL())
1489+
.assemble()
1490+
.unwrap();
1491+
1492+
assertTrue(model.getShape(ShapeId.from("smithy.example#ExampleString")).isPresent());
1493+
}
1494+
1495+
@Test
1496+
public void importedJarsWithEncodedCharsHaveCorrectFilename() throws Exception {
1497+
Path source = Paths.get(getClass().getResource("assembler-valid-jar").toURI());
1498+
Path jarPath = outputDirectory.resolve(Paths.get("path%20with%45encoded%25chars", "test.jar"));
1499+
1500+
Files.createDirectories(jarPath.getParent());
1501+
Files.copy(JarUtils.createJarFromDir(source), jarPath);
1502+
1503+
Model model = Model.assembler()
1504+
.addImport(jarPath)
1505+
.assemble()
1506+
.unwrap();
1507+
1508+
String filename = model.expectShape(ShapeId.from("smithy.example#ExampleString"))
1509+
.getSourceLocation()
1510+
.getFilename();
1511+
String fileContents = IoUtils.readUtf8Url(new URL(filename));
1512+
1513+
assertThat(fileContents, containsString("string ExampleString"));
1514+
}
14621515
}

smithy-model/src/test/java/software/amazon/smithy/model/loader/ModelDiscoveryTest.java

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
import java.net.MalformedURLException;
1414
import java.net.URL;
1515
import java.net.URLClassLoader;
16+
import java.nio.file.Paths;
1617
import java.util.List;
1718
import java.util.stream.Collectors;
1819
import org.junit.jupiter.api.Assertions;
@@ -132,8 +133,10 @@ public void requiresModelNameToBeValidWhenParsing() throws IOException {
132133

133134
@Test
134135
public void createSmithyManifestUrlFromPath() throws IOException {
136+
// Accounts for windows drive letters and path separators
137+
String normalizedJarPath = Paths.get("/foo.jar").toUri().getPath();
135138
assertThat(ModelDiscovery.createSmithyJarManifestUrl("/foo.jar"),
136-
equalTo(new URL("jar:file:/foo.jar!/META-INF/smithy/manifest")));
139+
equalTo(new URL("jar:file:" + normalizedJarPath + "!/META-INF/smithy/manifest")));
137140
}
138141

139142
@Test
@@ -147,4 +150,19 @@ public void createSmithyManifestUrlFromJarUrl() throws IOException {
147150
assertThat(ModelDiscovery.createSmithyJarManifestUrl("jar:file:/foo.jar"),
148151
equalTo(new URL("jar:file:/foo.jar!/META-INF/smithy/manifest")));
149152
}
153+
154+
@Test
155+
public void encodesSmithyManifestUrlFromPath() throws IOException {
156+
String normalizedAndEncodedJarPath = Paths.get("/foo bar.jar").toUri().getRawPath();
157+
assertThat(ModelDiscovery.createSmithyJarManifestUrl("/foo bar.jar"),
158+
equalTo(new URL("jar:file:" + normalizedAndEncodedJarPath + "!/META-INF/smithy/manifest")));
159+
}
160+
161+
@Test
162+
public void preservesEncodingOfSmithyManifestUrlFromUrl() throws IOException {
163+
assertThat(ModelDiscovery.createSmithyJarManifestUrl("file:/foo%20bar.jar"),
164+
equalTo(new URL("jar:file:/foo%20bar.jar!/META-INF/smithy/manifest")));
165+
assertThat(ModelDiscovery.createSmithyJarManifestUrl("jar:file:/foo%20bar.jar"),
166+
equalTo(new URL("jar:file:/foo%20bar.jar!/META-INF/smithy/manifest")));
167+
}
150168
}
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Manifest-Version: 1.0
2+
Created-By: 11.0.6 (Amazon.com Inc.)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
valid.smithy
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
$version: "2.0"
2+
3+
namespace smithy.example
4+
5+
string ExampleString

0 commit comments

Comments
 (0)