[CAS] Add a new API in ObjectStore to import a CAS tree by cachemeifyoucan · Pull Request #10839 · swiftlang/llvm-project ·

Add a new API to ObjectStore that can import a cas tree from another CAS. The two ObjectStores don't have to share the same hashing algorithm since all the objects will be rehashed and inserted into the new database.

benlangmuir · 2025-06-13T19:58:25Z

llvm/lib/CAS/ObjectStore.cpp

+if (this == &Upstream)
+return Other;
+
+// FIXME: This replicates the logic in `OnDiskGraphDB::importFullTree`.


Is there any intention to unify these in the future? I'm not clear if we need importFullTree or it could just use this function instead.

I can see OnDiskGraphDB::importFullTree can do better when importing, knowing the upstream CAS has the exact same hash and schema. The answer is maybe but maybe not if we want full optimization.

That said. I need to update the comments anyway since I copy them from the other function and some do not apply here. I updated FIXME to be more of an explanation instead.

Add a new API to ObjectStore that can import a cas tree from another CAS. The two ObjectStores don't have to share the same hashing algorithm since all the objects will be rehashed and inserted into the new database.

We should probably add a unit test that imports between two CAS with different hash schema

benlangmuir · 2025-06-13T20:51:56Z

llvm/tools/llvm-cas/llvm-cas.cpp

@@ -138,6 +141,8 @@ int main(int Argc, char **Argv) {
 clEnumValN(MergeTrees, "merge", "merge paths/cas-ids"),
 clEnumValN(GetCASIDForFile, "get-cas-id", "get cas id for file"),
 clEnumValN(Import, "import", "import objects from another CAS"),
+clEnumValN(ImportFromUpstream, "import-from-upstream",


Maybe this should replace the implementation of --import instead? The existing --import relies on the hash being the same (for an early out check on existing objects) but doesn't actually validate they are the same hash. It's not clear we need both of these APIs to be available on llvm-cas

I thought --import is a test option for chained database to exercise that code path so I should probably not replace it.
I can probably take the name from it since I will expect simple name like --import is better used to exercise public CAS APIs. I can maybe move the current import option into llvm-cas-test utility.

Current --import does not use chaining, does it? Seems like it works the same as this new functionality except it is recursive. It's not clear we need to keep it.

benlangmuir · 2025-06-13T21:00:26Z

llvm/lib/CAS/ObjectStore.cpp

+// Check if the node exists already.
+auto CurrentID = Cur.Refs.front();
+Cur.Refs.pop_front();
+auto Ref = getReference(Upstream.getID(CurrentID));


Upstream.getID(CurrentID) could be a different hash schema than this->getReference. While a collision is probably unlikely in practice, it could cause issues if the hash size differs. We could possibly keep this fast path if we add a check that the CASContext matches?

Good call. I have tested importing a different casID schema but not with a different hashing algorithm. I think that is doable (I can switch SHA1 for the testing plugin dylib implementation in LLVM).
We can also go the other way that we only support hashing with the same schema. I was ambitiously thinking I can even change CAS schema when creating a reproducer when CAS is used but that is actually not possible in current state so I can dial back this implementation. The benefit of same schema import is that it can be more efficient.

cachemeifyoucan requested review from akyrtzi and benlangmuir June 13, 2025 17:46

benlangmuir reviewed Jun 13, 2025
View reviewed changes

[CAS] Add a new API in ObjectStore to import a CAS tree

9292e05
Add a new API to ObjectStore that can import a cas tree from another CAS. The two ObjectStores don't have to share the same hashing algorithm since all the objects will be rehashed and inserted into the new database.

cachemeifyoucan force-pushed the eng/PR-cas-import-tree branch from 487eced to 9292e05 Compare June 13, 2025 20:40

benlangmuir reviewed Jun 13, 2025
View reviewed changes

Uh oh!

Uh oh!

benlangmuir Jun 13, 2025

Uh oh!

cachemeifyoucan Jun 13, 2025

Uh oh!

cachemeifyoucan Jun 13, 2025

Uh oh!

Uh oh!

benlangmuir Jun 13, 2025

Uh oh!

cachemeifyoucan Jun 13, 2025

Uh oh!

benlangmuir Jun 13, 2025

Uh oh!

benlangmuir Jun 13, 2025

Uh oh!

cachemeifyoucan Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!