Embedding Models Overview

A comprehensive guide to the diverse models we offer.

We currently offer a variety of models to cater to a wide array of use cases. Our ongoing efforts are dedicated to expanding our portfolio and continuously introducing new models. As these models become available, they will be promptly added to our API Reference.

Models

The selection of our models is based on their rankings on the MTEB leaderboard and overall popularity. For most use cases, we recommend using the e5-large-v2 model, which is currently the top performer on the MTEB leaderboard. If a smaller model is what you're seeking, we suggest the all-MiniLM-L6-v2 model. For multilingual use cases, the multilingual-e5-base model would be most suitable.

NameRecommended Sequence LengthDimensions
stella-large-zh-v2 (Chinese)10241024
bge-large-en-v1.55121024
gte-large5121024
e5-large-v25121024
instructor-large512768
multilingual-e5-large5121024
multilingual-e5-base512768
all-MiniLM-L6-v2256384
paraphrase-multilingual-mpnet-base-v2128768

Performance Benchmarks

The performance benchmarks are sourced from the MTEB leaderboard. The Classification Average is derived from 12 datasets, and the Retrieval Average is based on 15 datasets. The Overall Average represents the mean scores over 56 datasets across a range of tasks.

NameClassification AverageRetrieval AverageAverage
stella-large-zh-v2 (Chinese)69.0570.1465.13
bge-large-en-v1.575.9754.2964.23
gte-large73.3352.2263.13
e5-large-v275.2450.5662.25
instructor-large73.8647.5761.59
multilingual-e5-large74.8151.4361.5
multilingual-e5-base73.0248.8859.45
text-embedding-ada-002 (reference)70.9349.2560.99
all-MiniLM-L6-v263.2142.6956.53
paraphrase-multilingual-mpnet-base-v267.935.3454.71