Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: evaluate missing splits #1268

Conversation

thivyanth
Copy link

@thivyanth thivyanth commented Oct 1, 2024

Addresses #1260

Checklist

@Muennighoff @isaac-chung

  • Run tests locally to make sure nothing is broken using make test.
    • 944 passed, 236 skipped, 55 warnings in 232.13s (0:03:52)
  • Run the formatter to format the code using make lint.

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thivyanth Thanks for taking a first stab at it! For any new functionality, we usually add a test / test cases to help confirm that the added code works. Could you please add a test for this? Specifically the following cases:

  1. Results exist, but no missing splits
  2. Results exist, and 1 missing split.

I can review the PR afterwards.

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but still a few things to adjust.

mteb/evaluation/MTEB.py Outdated Show resolved Hide resolved
mteb/evaluation/MTEB.py Outdated Show resolved Hide resolved
mteb/evaluation/MTEB.py Outdated Show resolved Hide resolved
mteb/evaluation/MTEB.py Outdated Show resolved Hide resolved
@isaac-chung isaac-chung changed the title Feature/missing-splits-evaluation fix: evaluate missing splits Oct 4, 2024
@thivyanth
Copy link
Author

@KennethEnevoldsen Thank you for the suggestions. I've implemented the changes as requested but from scratch. I manually checked the logs to ensure that only the missing splits are being evaluated rather than running a pytest. Could you please suggest a way to verify this with pytest, specifically to confirm that only the missing splits are evaluated? @isaac-chung @Muennighoff

(The latest commit has passed make test and make lint.)

@thivyanth
Copy link
Author

Wrote a test as well.

(The latest commit has passed make test and make lint.)

@KennethEnevoldsen
Copy link
Contributor

@thivyanth rerunning failed test to ensure that everything passes

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already looking much better - A few more changes are needed though, let me know if there are any questions

self.last_evaluated_splits[task.metadata_dict["name"]] = set()
self.last_evaluated_splits[task.metadata_dict["name"]].add(split)

new_results = MTEBResults.from_task_results(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider what to do with the MTEB version specification.

Ideally we should add multiple "1.2.11, 1.2.16". We should also add a flag to only append to existing files if the version matches (should probably be on by default)

f"{task.metadata.name} results already exist. Loading results from disk."
)
evaluation_results.append(existing_results)
self.last_evaluated_splits[task.metadata.name] = [] # Add this line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an error?

@@ -488,3 +563,11 @@ def _save_model_metadata(model_meta: ModelMeta, output_folder: Path) -> None:

with save_path.open("w") as f:
json.dump(model_meta.to_dict(), f)

def get_last_evaluated_splits(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? (I would just add some logging messages instead)

Comment on lines +489 to +490
if existing_results:
merged_results = self._merge_results(existing_results, new_results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth merging using the MTEBResult object, instead of directly on the dict.

new_results = MTEBResults(...)
if existing_results:
  new_results.update(existing_results)

Comment on lines +22 to +33
evaluation.run(
model,
eval_splits=["train", "test"],
save_predictions=True,
output_folder=str(tmp_path / "testcase1"),
verbosity=2,
)
last_evaluated_splits = evaluation.get_last_evaluated_splits()
print(last_evaluated_splits)
assert "NFCorpus" in last_evaluated_splits
assert set(last_evaluated_splits["NFCorpus"]) == {"train", "test"}
assert len(last_evaluated_splits["NFCorpus"]) == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify tests a bit: (general across tests)

Suggested change
evaluation.run(
model,
eval_splits=["train", "test"],
save_predictions=True,
output_folder=str(tmp_path / "testcase1"),
verbosity=2,
)
last_evaluated_splits = evaluation.get_last_evaluated_splits()
print(last_evaluated_splits)
assert "NFCorpus" in last_evaluated_splits
assert set(last_evaluated_splits["NFCorpus"]) == {"train", "test"}
assert len(last_evaluated_splits["NFCorpus"]) == 2
result_obj = evaluation.run(
model,
eval_splits=["train", "test"],
save_predictions=True,
output_folder=str(tmp_path / "testcase1"),
verbosity=2,
)
# check splits here based on object - no need to last_evaluated_splits

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why save_predictions is true?

@Muennighoff
Copy link
Contributor

@thivyanth great work thus far! It's a pretty useful PR so would be amazing to have it merged soon if you have time to address the comments?

@thivyanth
Copy link
Author

@thivyanth great work thus far! It's a pretty useful PR so would be amazing to have it merged soon if you have time to address the comments?

I just saw this message, few hours ago I had mailed you

@Muennighoff
Copy link
Contributor

Muennighoff commented Oct 26, 2024

Any update here @thivyanth? Would be great to get this in! 🙌

@KennethEnevoldsen
Copy link
Contributor

@thivyanth would love to get this merged in. Do you still have time to work on it?

@isaac-chung isaac-chung changed the base branch from main to eval-missing-splits November 26, 2024 22:17
@isaac-chung isaac-chung merged commit 2f42162 into embeddings-benchmark:eval-missing-splits Nov 26, 2024
10 checks passed
@isaac-chung
Copy link
Collaborator

Moving this to a separate branch to be continued.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants