Ok so here are the actual course corrections I had to make to push through a replacement implementation of a btree.
Note that almost all of the problems aren't with the implementation, it basically one shot that. Almost all the issues are with integrating the change with the wider system.
"The btree library is buggy, and inefficient (using mmap, a poor design idea). Can you extract it to an interface, and then implement a clean new version of the interface that does not use mmap? It should be a balanced btree. Don't copy the old design in anything other than the interface. Look at how SkipListReader and SkipListWriter uses a BufferPool class and use that paradigm. The new code should be written from scratch and does not need to be binary compatible with the old implementation. It also needs extremely high test coverage, as this is notoriously finnicky programming."
"Let's move the old implementation to a separate package called legacy and give them a name like LegacyBTree... "
"Let's add a factory method to the interfaces for creating an appropriate implementation, for the writer based on a system property (\"index.useLegacyBTree\"), and for the reader, based on whether the destination file has the magic word for the new implementation. The old one has no magic word."
"Are these changes good, high quality, good engineering practices, in line with known best practices and the style guide?"
"Yeah the existing code owns the lifetime of the LongArray, so I think we'd need larger changes there to do this cleanly. "
"What does WordLexicon do? If it's small, perhaps having multiple implementations is better"
"Yes that seems better. Do we use BTrees anywhere else still?"
"There should be an integration test that exercises the whole index construction code and performs lookups on the constructed index. Find and run that."
"That's the wrong test. It may just be in a class called IntegrationTest, and may not be in the index module."
"Look at the entire change set, all unstaged changes, are these changes good, high quality, good engineering practices, in line with known best practices and the style guide?"
"Remove the dead class. By the way, the size estimator for the new btree, does it return a size that is strictly greater than the largest possible size? "
"But yeah, the pool size is very small. It should be configurable as a system property. index.wordLexiconPoolSize maybe. Something like 1 GB is probably good."
"Can we change the code to make BufferPool optional? To have a version that uses buffered reads instead?"
"The new page source shoud probably return buffers to a (bounded) free list when they are closed, so we can limit allocation churn."
"Are these latest changes good, high quality, good engineering practices, in line with known best practices and the style guide?"
"Yes, all this is concurrent code so it needs to be safe."
"Scan the rest of the change set for concurrency issues too."
"Do we have test coverage for both of the btree reader modes (bufferpool, direct)?"
"Neat. Think carefully, are there any edge cases our testing might have missed? This is notoriously finnicky programming, DBMSes often have hundreds if not thousands of tests for their btrees..."
"Any other edge cases? Are the binary search functions tested for all corner cases?"
"Can you run coverage for the tests to see if there are any notable missing branches?"
"Nice. Let's lower the default pool size to 64 MB by the way, so we don't blow up the Xmx when we run tests in a suite."
"I notice we're pretty inconsistent in calling the new B+-tree a B-Tree in various places. Can you clean that up?"
"Do you think we should rename these to reflect their actual implementation? Seems confusing the way it is right now."
"Can you amend the readme for the module to describe the new situation, that the legacy modules are on the way out, and information about the new design?"
"Add a note about the old implemenation being not very performant, and known to have correctness issues."
"Fix the guice/zookeeper issues before proceeding. This is a broken window."
"It is pre-existing, let's ignore it for now. It seems like a much deeper issue, and might inflate this change scope."
"Let's disable the broken test, and add a comment explaining when and any information we have on what may or may not cause the issue."
"What do you think about making the caller (IndexFactory) decide which WordLexicon backing implementation to use, with maybe different factory methods in WordLexicon to facilitate?"
"I'm looking at PagedBTreeReader. We're sometimes constructing it with a factory method, and sometimes directly. Would it make sense to have a named factory method for the \"PagedBTreeReader(Path filePath, int poolSize)\" case as well, so it's clearer just what that does?"
"There's a class called LinuxSystemCalls. This lets us do preads on file descriptors directly, and (appropriately) set fadviseRandom(). Let's change the channel backed code to use that instead of FileChannels, and rename it to something more appropriate. This is a somewhat big change. Plan carefully."
"Let's not support the case when LinuxSystemCalls.isAvailable() is false, the rest of the index fails in that scenario as well. I think good names are \"direct\" (for buffer pool) and \"buffered\" (for os cached), to align with standard open() nomenclature."
"I'm not a huge fan of PreadPageSource. It's first of all named based on who uses it, not what it does. It's also very long lived, and leaking memory whenever the free list is full. Let's use Arena.ofAuto() to fix the latter, and come up with a better name. I also don't know if we'll ever do unaligned reads in this? Can we verify whether that's ever actually necessary?"
"How do we decide whether to open a direct or buffered word lexicon?"
"I think this should be a system property. \"index.wordLexicon.useBuffered\", along with \"index.wordLexicon.poolSizeBytes\" maybe?"
"Is the BufferPoolPageSource really consistent with the rest of the nomenclature?"
"Are there other inconsistencies in naming or nomenclature?"