fix: Enforce revision ID and model names for future contributions #56

KennethEnevoldsen · 2024-11-26T15:21:32Z

Added tests to:

Enforce revision ID for new models
Enforce consistent model names for new models

Added this change mainly due to duplicates models results and due to recent addition not including a model revision (which I we should probably discourage).

This fixes model results for:

voyage

(needed for the leaderboard)

Added this change mainly due to duplicates models results and due to recent addition not including a model revision (which I we should probably discourage). This fixes models results for: - voyage

isaac-chung

Looks good. Some non-blocking Qs and suggestions.

isaac-chung · 2024-11-26T15:37:53Z

README.md

@@ -4,6 +4,10 @@ type: evaluation
 submission_name: MTEB
 ---

+> [!NOTE]  
+> Previously it was possible to submit models results to MTEB by adding the results to the model metadata. This is no longer an option as we want to ensure high quality metadata.


Question: is that currently enforced? Or will only be in effect once the new leaderboard goes public?

The new leaderboard does not fetch it automatically (only fetches from here)

However, we had a PR that fetches everything from HF (once, but probably not something we will run again).

E.g. the models:

'vectoriseai/bge-base-en-v1.5',
'vectoriseai/bge-large-en-v1.5',
'vectoriseai/bge-small-en',
'vectoriseai/bge-small-en-v1.5',
'vectoriseai/e5-base',
'vectoriseai/e5-base-v2',
'vectoriseai/e5-large',
'vectoriseai/e5-large-v2',
'vectoriseai/e5-small-v2',
'vectoriseai/gte-base',
'vectoriseai/gte-large',
'vectoriseai/gte-small',
'vectoriseai/multilingual-e5-large',

are all copies and then creates duplicates on the leaderboard.

pr. #57 removes these

isaac-chung · 2024-11-26T15:40:17Z

tests/test_ensure_correct_metadata.py

+@pytest.mark.parametrize("model_rev_pair", model_rev_pairs)
+def test_organization_is_specified_for_new_additions(model_rev_pair):
+    """
+    Models added after 26th of November should include a organization ID within their name, e.g. "myorg/my_embedding_model".


Suggested change

Models added after 26th of November should include a organization ID within their name, e.g. "myorg/my_embedding_model".

Models added after November 26, 2024 should include a organization ID within their name, e.g. "myorg/my_embedding_model".

isaac-chung · 2024-11-26T15:41:19Z

tests/test_ensure_correct_metadata.py

+@pytest.mark.parametrize("model_rev_pair", model_rev_pairs)
+def test_model_meta_in_folders(model_rev_pair):
+    """
+    Models added after the 26th of November should contain a model_meta.json file


Suggested change

Models added after the 26th of November should contain a model_meta.json file

Models added after November 26, 2024 should contain a model_meta.json file

- Added minor fixes from #56 - removed all vectoriseai/* models (e.g. vectoriseai/gte-large) They seems to be duplicates of the original model.

fix: Enforce revision ID and model names for future contributions

d5686d5

Added this change mainly due to duplicates models results and due to recent addition not including a model revision (which I we should probably discourage). This fixes models results for: - voyage

KennethEnevoldsen requested review from orionw and isaac-chung November 26, 2024 15:22

KennethEnevoldsen enabled auto-merge (squash) November 26, 2024 15:24

isaac-chung approved these changes Nov 26, 2024

View reviewed changes

KennethEnevoldsen merged commit af02824 into main Nov 26, 2024
2 checks passed

KennethEnevoldsen added a commit that referenced this pull request Nov 26, 2024

fix: remove duplicate models

43c7443

- Added minor fixes from #56 - removed all vectoriseai/* models (e.g. vectoriseai/gte-large) They seems to be duplicates of the original model.

KennethEnevoldsen mentioned this pull request Nov 26, 2024

fix: remove duplicate models #57

Merged

isaac-chung pushed a commit that referenced this pull request Nov 26, 2024

fix: remove duplicate models (#57)

2368324

- Added minor fixes from #56 - removed all vectoriseai/* models (e.g. vectoriseai/gte-large) They seems to be duplicates of the original model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Enforce revision ID and model names for future contributions #56

fix: Enforce revision ID and model names for future contributions #56

KennethEnevoldsen commented Nov 26, 2024 •

edited

Loading

isaac-chung left a comment

isaac-chung Nov 26, 2024

KennethEnevoldsen Nov 26, 2024

KennethEnevoldsen Nov 26, 2024 •

edited

Loading

isaac-chung Nov 26, 2024

isaac-chung Nov 26, 2024

	Models added after 26th of November should include a organization ID within their name, e.g. "myorg/my_embedding_model".
	Models added after November 26, 2024 should include a organization ID within their name, e.g. "myorg/my_embedding_model".

	Models added after the 26th of November should contain a model_meta.json file
	Models added after November 26, 2024 should contain a model_meta.json file

fix: Enforce revision ID and model names for future contributions #56

fix: Enforce revision ID and model names for future contributions #56

Conversation

KennethEnevoldsen commented Nov 26, 2024 • edited Loading

isaac-chung left a comment

Choose a reason for hiding this comment

isaac-chung Nov 26, 2024

Choose a reason for hiding this comment

KennethEnevoldsen Nov 26, 2024

Choose a reason for hiding this comment

KennethEnevoldsen Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

isaac-chung Nov 26, 2024

Choose a reason for hiding this comment

isaac-chung Nov 26, 2024

Choose a reason for hiding this comment

KennethEnevoldsen commented Nov 26, 2024 •

edited

Loading

KennethEnevoldsen Nov 26, 2024 •

edited

Loading