Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Enforce revision ID and model names for future contributions #56

Merged
merged 1 commit into from
Nov 26, 2024

Conversation

KennethEnevoldsen
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen commented Nov 26, 2024

Added tests to:

  • Enforce revision ID for new models
  • Enforce consistent model names for new models

Added this change mainly due to duplicates models results and due to recent addition not including a model revision (which I we should probably discourage).

This fixes model results for:

  • voyage

(needed for the leaderboard)

Added this change mainly due to duplicates models results and due to recent addition not including a model revision (which I we should probably discourage).

This fixes models results for:
- voyage
Copy link
Contributor

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some non-blocking Qs and suggestions.

@@ -4,6 +4,10 @@ type: evaluation
submission_name: MTEB
---

> [!NOTE]
> Previously it was possible to submit models results to MTEB by adding the results to the model metadata. This is no longer an option as we want to ensure high quality metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is that currently enforced? Or will only be in effect once the new leaderboard goes public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new leaderboard does not fetch it automatically (only fetches from here)

However, we had a PR that fetches everything from HF (once, but probably not something we will run again).

E.g. the models:

'vectoriseai/bge-base-en-v1.5',
'vectoriseai/bge-large-en-v1.5',
'vectoriseai/bge-small-en',
'vectoriseai/bge-small-en-v1.5',
'vectoriseai/e5-base',
'vectoriseai/e5-base-v2',
'vectoriseai/e5-large',
'vectoriseai/e5-large-v2',
'vectoriseai/e5-small-v2',
'vectoriseai/gte-base',
'vectoriseai/gte-large',
'vectoriseai/gte-small',
'vectoriseai/multilingual-e5-large',

are all copies and then creates duplicates on the leaderboard.

Copy link
Contributor Author

@KennethEnevoldsen KennethEnevoldsen Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr. #57 removes these

@pytest.mark.parametrize("model_rev_pair", model_rev_pairs)
def test_organization_is_specified_for_new_additions(model_rev_pair):
"""
Models added after 26th of November should include a organization ID within their name, e.g. "myorg/my_embedding_model".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Models added after 26th of November should include a organization ID within their name, e.g. "myorg/my_embedding_model".
Models added after November 26, 2024 should include a organization ID within their name, e.g. "myorg/my_embedding_model".

@pytest.mark.parametrize("model_rev_pair", model_rev_pairs)
def test_model_meta_in_folders(model_rev_pair):
"""
Models added after the 26th of November should contain a model_meta.json file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Models added after the 26th of November should contain a model_meta.json file
Models added after November 26, 2024 should contain a model_meta.json file

@KennethEnevoldsen KennethEnevoldsen merged commit af02824 into main Nov 26, 2024
2 checks passed
KennethEnevoldsen added a commit that referenced this pull request Nov 26, 2024
- Added minor fixes from #56
- removed all vectoriseai/* models (e.g. vectoriseai/gte-large)

They seems to be duplicates of the original model.
isaac-chung pushed a commit that referenced this pull request Nov 26, 2024
- Added minor fixes from #56
- removed all vectoriseai/* models (e.g. vectoriseai/gte-large)

They seems to be duplicates of the original model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants