Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 60 additions & 20 deletions src/archive.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ use std::path::Path;

use crate::entry::{EntryFields, EntryIo};
use crate::error::TarError;
use crate::header::BLOCK_SIZE;
use crate::header::{SparseEntry, BLOCK_SIZE};
use crate::other;
use crate::pax::*;
use crate::{Entry, GnuExtSparseHeader, GnuSparseHeader, Header};
use crate::{Entry, GnuExtSparseHeader, Header};

/// A top-level representation of an archive file.
///
Expand Down Expand Up @@ -282,6 +282,7 @@ impl<'a, R: Read> Iterator for Entries<'a, R> {
}
}

#[allow(unused_assignments)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we can drop this now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, rustc 1.87.0-nightly still complains.

impl<'a> EntriesFields<'a> {
fn next_entry_raw(
&mut self,
Expand Down Expand Up @@ -430,26 +431,68 @@ impl<'a> EntriesFields<'a> {
));
}
pax_extensions = Some(EntryFields::from(entry).read_all()?);
// This entry has two headers.
// Keep pax_extensions for the next ustar header.
processed -= 1;
continue;
}

let mut fields = EntryFields::from(entry);
fields.long_pathname = gnu_longname;
fields.long_linkname = gnu_longlink;
fields.pax_extensions = pax_extensions;
// False positive: unused assignment
// https://github.com/rust-lang/rust/issues/22630
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this has been fixed so we should be able to drop the assignment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have extra assignments here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we address this by doing:

fields.pax_extensions = pax_extensions.take();

?

pax_extensions = None; // Reset pax_extensions after use
fields.long_pathname = if is_recognized_header && fields.is_pax_sparse() {
fields.pax_sparse_name()
} else {
gnu_longname
};
fields.long_linkname = gnu_longlink;
self.parse_sparse_header(&mut fields)?;
return Ok(Some(fields.into_entry()));
}
}

fn parse_sparse_header(&mut self, entry: &mut EntryFields<'a>) -> io::Result<()> {
if !entry.header.entry_type().is_gnu_sparse() {
if !entry.is_pax_sparse() && !entry.header.entry_type().is_gnu_sparse() {
return Ok(());
}
let gnu = match entry.header.as_gnu() {
Some(gnu) => gnu,
None => return Err(other("sparse entry type listed but not GNU header")),
};
let mut sparse_map = Vec::<SparseEntry>::new();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main goals I tried to keep for the tar crate is to minimize internal allocations. Instead of having a temporary list here could this be refactored to avoid the intermediate allocation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Here we need to convert strings to numbers and the number of pairs is not fixed.

let mut real_size = 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about this doesn't feel quite right because real_size isn't set in all branches of the if statement below whereas prior it was always set to a particular value.

Copy link
Copy Markdown
Contributor Author

@ncihnegn ncihnegn Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. It is set in both branches, line 428 and 452.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do though:
let real_size = if entry.is_pax_sparse() { ... } else { ... }

if entry.is_pax_sparse() {
real_size = entry.pax_sparse_realsize()?;
let mut num_bytes_read = 0;
let mut reader = io::BufReader::with_capacity(BLOCK_SIZE as usize, &self.archive.inner);
let mut read_decimal_line = || -> io::Result<u64> {
let mut str = String::new();
num_bytes_read += reader.read_line(&mut str)?;
Comment thread
cgwalters marked this conversation as resolved.
str.strip_suffix("\n")
.and_then(|s| s.parse::<u64>().ok())
.ok_or_else(|| other("failed to read a decimal line"))
};

let num_entries = read_decimal_line()?;
for _ in 0..num_entries {
let offset = read_decimal_line()?;
let size = read_decimal_line()?;
sparse_map.push(SparseEntry { offset, size });
}
let rem = BLOCK_SIZE as usize - (num_bytes_read % BLOCK_SIZE as usize);
entry.size -= (num_bytes_read + rem) as u64;
} else if entry.header.entry_type().is_gnu_sparse() {
let gnu = match entry.header.as_gnu() {
Some(gnu) => gnu,
None => return Err(other("sparse entry type listed but not GNU header")),
};
real_size = gnu.real_size()?;
for block in gnu.sparse.iter() {
if !block.is_empty() {
let offset = block.offset()?;
let size = block.length()?;
sparse_map.push(SparseEntry { offset, size });
}
}
}

// Sparse files are represented internally as a list of blocks that are
// read. Blocks are either a bunch of 0's or they're data from the
Expand Down Expand Up @@ -478,12 +521,7 @@ impl<'a> EntriesFields<'a> {
let data = &mut entry.data;
let reader = &self.archive.inner;
let size = entry.size;
let mut add_block = |block: &GnuSparseHeader| -> io::Result<_> {
if block.is_empty() {
return Ok(());
}
let off = block.offset()?;
let len = block.length()?;
let mut add_block = |off: u64, len: u64| -> io::Result<_> {
if len != 0 && (size - remaining) % BLOCK_SIZE != 0 {
return Err(other(
"previous block in sparse file was not \
Expand All @@ -510,10 +548,10 @@ impl<'a> EntriesFields<'a> {
data.push(EntryIo::Data(reader.take(len)));
Ok(())
};
for block in gnu.sparse.iter() {
add_block(block)?
for block in sparse_map {
add_block(block.offset, block.size)?
}
if gnu.is_extended() {
if entry.header.as_gnu().map(|gnu| gnu.is_extended()) == Some(true) {
let mut ext = GnuExtSparseHeader::new();
ext.isextended[0] = 1;
while ext.is_extended() {
Expand All @@ -523,12 +561,14 @@ impl<'a> EntriesFields<'a> {

self.next += BLOCK_SIZE;
for block in ext.sparse.iter() {
add_block(block)?;
if !block.is_empty() {
add_block(block.offset()?, block.length()?)?;
}
}
}
}
}
if cur != gnu.real_size()? {
if cur != real_size {
return Err(other(
"mismatch in sparse file chunks and \
size in header",
Expand Down
42 changes: 42 additions & 0 deletions src/entry.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ use crate::archive::ArchiveInner;
use crate::error::TarError;
use crate::header::bytes2path;
use crate::other;
use crate::pax::{GNU_SPARSE_MAJOR_EXTENSION, GNU_SPARSE_MINOR_EXTENSION};
use crate::{Archive, Header, PaxExtensions};

/// A read-only view into an entry of an archive.
Expand Down Expand Up @@ -300,6 +301,47 @@ impl<'a> EntryFields<'a> {
self.read_to_end(&mut v).map(|_| v)
}

/// Check if the tar file is using PAX sparse extensions.
pub fn is_pax_sparse(&mut self) -> bool {
Comment thread
ncihnegn marked this conversation as resolved.
if let Some(ref pax) = self.pax_extensions {
let mut extensions = PaxExtensions::new(pax).filter_map(|f| f.ok());
return extensions
.find(|f| *f == GNU_SPARSE_MAJOR_EXTENSION)
.is_some()
&& extensions
.find(|f| *f == GNU_SPARSE_MINOR_EXTENSION)
.is_some();
}
false
}

pub fn pax_sparse_name(&mut self) -> Option<Vec<u8>> {
if let Some(ref pax) = self.pax_extensions {
return PaxExtensions::new(pax)
.filter_map(|f| f.ok())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of "swallowing" errors like this, my preference would be to make this function return Result<Option<Vec<u8>>> and propagate this error instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. To propagate the error using Result we will be dropping the Vec.

.find(|f| f.key_bytes() == b"GNU.sparse.name")
.map(|f| f.value_bytes().to_vec());
}
None
}

pub fn pax_sparse_realsize(&mut self) -> io::Result<u64> {
if let Some(ref pax) = self.pax_extensions {
let pax = PaxExtensions::new(pax)
.filter_map(|f| f.ok())
.find(|f| f.key_bytes() == b"GNU.sparse.realsize")
.map(|f| f.value_bytes());
if let Some(field) = pax {
let str =
std::str::from_utf8(&field).map_err(|_| other("failed to read string"))?;
return str
.parse::<u64>()
.map_err(|_| other("failed to parse the real size"));
}
}
Err(other("PAX extension GNU.sparse.realsize not found"))
}

fn path(&self) -> io::Result<Cow<Path>> {
bytes2path(self.path_bytes())
}
Expand Down
6 changes: 6 additions & 0 deletions src/header.rs
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,12 @@ pub struct GnuHeader {
pub pad: [u8; 17],
}

/// Description of a spare entry.
pub struct SparseEntry {
pub offset: u64,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also document the fields please

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offset and size names are self explanatory.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely true, but I think there's a general principle here that everything pub should have a docstring, even if trivial.

In some other crates I maintain we use deny(missing_docs).

pub size: u64,
}

/// Description of the header of a spare entry.
///
/// Specifies the offset/number of bytes of a chunk of data in octal.
Expand Down
13 changes: 13 additions & 0 deletions src/pax.rs
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,24 @@ impl<'entry> PaxExtensions<'entry> {
}

/// A key/value pair corresponding to a pax extension.
#[derive(PartialEq)]
pub struct PaxExtension<'entry> {
key: &'entry [u8],
value: &'entry [u8],
}

/// Constant of the GNU sparse major extension.
pub const GNU_SPARSE_MAJOR_EXTENSION: PaxExtension<'_> = PaxExtension {
key: b"GNU.sparse.major",
value: b"1",
};

/// Constant of the GNU sparse minor extension.
pub const GNU_SPARSE_MINOR_EXTENSION: PaxExtension<'_> = PaxExtension {
key: b"GNU.sparse.minor",
value: b"0",
};

pub fn pax_extensions_value(a: &[u8], key: &str) -> Option<u64> {
for extension in PaxExtensions::new(a) {
let current_extension = match extension {
Expand Down
16 changes: 16 additions & 0 deletions tests/all.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1285,6 +1285,22 @@ fn sparse_with_trailing() {
assert_eq!(&s[0x100_000..], "1MB through\n");
}

#[test]
fn pax_sparse() {
let rdr = Cursor::new(tar!("pax_sparse.tar"));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post the xz fiasco let's be a bit more sensitive about committing binary data to git. Can you add the script that generates this at least? Or probably better honestly for tests, just assume we have a working external tar binary that can generate the relevant data.

let mut ar = Archive::new(rdr);
let td = TempBuilder::new().prefix("tar-rs").tempdir().unwrap();
ar.unpack(td.path()).unwrap();

let mut s = String::new();
File::open(td.path().join("sparse_begin.txt"))
.unwrap()
.read_to_string(&mut s)
.unwrap();
assert_eq!(&s[..5], "test\n");
assert!(s[5..].chars().all(|x| x == '\u{0}'));
}

#[test]
fn writing_sparse() {
let mut ar = Builder::new(Vec::new());
Expand Down
Binary file added tests/archives/pax_sparse.tar
Binary file not shown.