Skip to content

Question for the implementation of ParseFileParallel #29

@lch32111

Description

@lch32111

Hello.

I also have a question about the implementation of ParseFileParallel.
Actually, you use ProcessBlocksImpl by assigning block_begin and block_end for each thread in the multi threaded configuration.

My concern is how your code is handling the case where the buffer has an uncomplete line at the end of blocks.
For example, Let's assume we have block_begin 4 and block_end 8 for thread 2 in ProcessBlocksImpl. I have an virtual obj lines for this example:

# BLOCK 4 Start
v 0.0 0.0 0.0
...
# BLOCK 4 End

# BLOCK 5 Start
v 0.0 0.0 0.0
...
# BLOCK 5 End

# BLOCK 6 Start
v 0.0 0.0 0.0
...
# BLOCK 6 End

# BLOCK 7 Start
v 0.0 0.0 0.0
v 0.0 0.0 0.0
...
v 0.0 0.0
# BLOCK 7 End

# BLOCK 8 Start
0.0
v 0.0 0.0 0.0
...
# BLOCK 8 END

In this case, when processing BLOCK 7, it encounters an uncomplete line v 0.0 0.0, missing one element of the vertex. I think your code is not handling this case in the multi thread case. In a single thread case, your code is handling this case by copying the rest of the line into the back_buffer with the remainder variable and stop_parsing_after_eol false.

I guess the problem is caused by stop_parsing_after_eol set as true in the multi thread case.

for (size_t i = 0; i != tasks.size(); ++i) {
bool is_last = i + 1 == tasks.size();
auto begin = tasks[i];
auto end = is_last ? num_blocks : (tasks[i + 1] + 1);
bool stop_parsing_after_eol = !is_last;
auto chunk = &(*chunks)[i];
threads.emplace_back(ProcessBlocks, source, i, begin, end, stop_parsing_after_eol, chunk, context);
threads.back().detach();
}

On the above code, you are setting stop_parsing_after_eol as true for all the threads except for the last one. As a result,
for (size_t i = block_begin; i != block_end; ++i) {
auto remainder = size_t{};
bool last_block = (i + 1 == block_end) || reached_eof;
if (!last_block) {
file_offset = (i + 1) * kBlockSize;
if (auto ec = reader->ReadBlock(file_offset, kBlockSize, back_buffer + kMaxLineLength)) {
chunk->error = Error{ ec };
return;
}
} else if (stop_parsing_after_eol) {
if (auto ptr = static_cast<const char*>(memchr(text.data(), '\n', kMaxLineLength))) {
auto pos = static_cast<size_t>(ptr - text.data());
line = text.substr(0, pos);
if (EndsWith(line, '\r')) {
line.remove_suffix(1);
}
++chunk->text.line_count;
if (auto rc = ProcessLine(line, chunk, context); rc != rapidobj_errc::Success) {
chunk->error = Error{ make_error_code(rc), std::string(line), chunk->text.line_count };
}
} else {
++chunk->text.line_count;
auto ec = make_error_code(rapidobj_errc::LineTooLongError);
chunk->error = Error{ ec, std::string(text, 0, kMaxLineLength), chunk->text.line_count };
}
return;
}

When i becomes block_end - 1 (the last i), it will at most process one line and then exit the ProcessBlocksImpl without handling the rest of the text data in the branch else if (stop_parsing_after_eol). Even though we set stop_parsing_after_eol as false in other threads, we need more code to handle the last line of BLOCK 7 which has a missing element. I think you have to read the next block (BLOCK 8 in my example) and then process one line to get the missing element.

I might be confused with your code because I have looked through your code for two days,
but what I still have seen works like that.
If you have any idea for this, please let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions