-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Hello.
I also have a question about the implementation of ParseFileParallel.
Actually, you use ProcessBlocksImpl by assigning block_begin and block_end for each thread in the multi threaded configuration.
My concern is how your code is handling the case where the buffer has an uncomplete line at the end of blocks.
For example, Let's assume we have block_begin 4 and block_end 8 for thread 2 in ProcessBlocksImpl. I have an virtual obj lines for this example:
# BLOCK 4 Start
v 0.0 0.0 0.0
...
# BLOCK 4 End
# BLOCK 5 Start
v 0.0 0.0 0.0
...
# BLOCK 5 End
# BLOCK 6 Start
v 0.0 0.0 0.0
...
# BLOCK 6 End
# BLOCK 7 Start
v 0.0 0.0 0.0
v 0.0 0.0 0.0
...
v 0.0 0.0
# BLOCK 7 End
# BLOCK 8 Start
0.0
v 0.0 0.0 0.0
...
# BLOCK 8 END
In this case, when processing BLOCK 7, it encounters an uncomplete line v 0.0 0.0, missing one element of the vertex. I think your code is not handling this case in the multi thread case. In a single thread case, your code is handling this case by copying the rest of the line into the back_buffer with the remainder variable and stop_parsing_after_eol false.
I guess the problem is caused by stop_parsing_after_eol set as true in the multi thread case.
rapidobj/include/rapidobj/rapidobj.hpp
Lines 7124 to 7133 in 744374a
| for (size_t i = 0; i != tasks.size(); ++i) { | |
| bool is_last = i + 1 == tasks.size(); | |
| auto begin = tasks[i]; | |
| auto end = is_last ? num_blocks : (tasks[i + 1] + 1); | |
| bool stop_parsing_after_eol = !is_last; | |
| auto chunk = &(*chunks)[i]; | |
| threads.emplace_back(ProcessBlocks, source, i, begin, end, stop_parsing_after_eol, chunk, context); | |
| threads.back().detach(); | |
| } |
On the above code, you are setting
stop_parsing_after_eol as true for all the threads except for the last one. As a result,rapidobj/include/rapidobj/rapidobj.hpp
Lines 6932 to 6962 in 744374a
| for (size_t i = block_begin; i != block_end; ++i) { | |
| auto remainder = size_t{}; | |
| bool last_block = (i + 1 == block_end) || reached_eof; | |
| if (!last_block) { | |
| file_offset = (i + 1) * kBlockSize; | |
| if (auto ec = reader->ReadBlock(file_offset, kBlockSize, back_buffer + kMaxLineLength)) { | |
| chunk->error = Error{ ec }; | |
| return; | |
| } | |
| } else if (stop_parsing_after_eol) { | |
| if (auto ptr = static_cast<const char*>(memchr(text.data(), '\n', kMaxLineLength))) { | |
| auto pos = static_cast<size_t>(ptr - text.data()); | |
| line = text.substr(0, pos); | |
| if (EndsWith(line, '\r')) { | |
| line.remove_suffix(1); | |
| } | |
| ++chunk->text.line_count; | |
| if (auto rc = ProcessLine(line, chunk, context); rc != rapidobj_errc::Success) { | |
| chunk->error = Error{ make_error_code(rc), std::string(line), chunk->text.line_count }; | |
| } | |
| } else { | |
| ++chunk->text.line_count; | |
| auto ec = make_error_code(rapidobj_errc::LineTooLongError); | |
| chunk->error = Error{ ec, std::string(text, 0, kMaxLineLength), chunk->text.line_count }; | |
| } | |
| return; | |
| } |
When
i becomes block_end - 1 (the last i), it will at most process one line and then exit the ProcessBlocksImpl without handling the rest of the text data in the branch else if (stop_parsing_after_eol). Even though we set stop_parsing_after_eol as false in other threads, we need more code to handle the last line of BLOCK 7 which has a missing element. I think you have to read the next block (BLOCK 8 in my example) and then process one line to get the missing element.
I might be confused with your code because I have looked through your code for two days,
but what I still have seen works like that.
If you have any idea for this, please let me know.