Jina AI Launches World’s First Open-Supply 8K Textual content Embedding, Rivaling OpenAI

Berlin, Germany – October 25, 2023 – Jina AI, the Berlin-based synthetic intelligence firm, is thrilled to announce the launch of its second-generation textual content embedding mannequin: jina-embeddings-v2
. This cutting-edge mannequin is now the one open-source providing that helps a formidable 8K (8192 tokens) context size, placing it on par with OpenAI’s proprietary mannequin, text-embedding-ada-002
, by way of each capabilities and efficiency on the Massive Text Embedding Benchmark (MTEB) leaderboard.
Benchmarking In opposition to the Finest 8K Mannequin from Open AI
When instantly in contrast with OpenAI’s 8K mannequin text-embedding-ada-002
, jina-embeddings-v2
showcases its mettle. Under is a efficiency comparability desk, highlighting areas the place jina-embeddings-v2
significantly excels:
Rank | Mannequin | Mannequin Measurement (GB) | Embedding Dimensions | Sequence Size | Common (56 datasets) | Classification Common (12 datasets) | Reranking Common (4 datasets) | Retrieval Common (15 datasets) | Summarization Common (1 dataset) |
---|---|---|---|---|---|---|---|---|---|
15 | text-embedding-ada-002 | Unknown | 1536 | 8191 | 60.99 | 70.93 | 84.89 | 56.32 | 30.8 |
17 | jina-embeddings-v2-base-en | 0.27 | 768 | 8192 | 60.38 | 73.45 | 85.38 | 56.98 | 31.6 |
Notably, jina-embedding-v2
outperforms its OpenAI counterpart in Classification Common, Reranking Common, Retrieval Common, and Summarization Common.
Options and Advantages
Jina AI’s dedication to innovation is obvious on this newest providing:
- From Scratch to Superiority: The
jina-embeddings-v2
was constructed from the bottom up. Over the final three months, the workforce at Jina AI engaged in intensive R&D, knowledge assortment, and tuning. The end result is a mannequin that marks a big leap from its predecessor. - Unlocking Prolonged Context Potential with 8K:
jina-embeddings-v2
isn’t only a technical feat; its 8K context size opens doorways to new trade functions:- Authorized Doc Evaluation: Guarantee each element in in depth authorized texts is captured and analyzed.
- Medical Analysis: Embed scientific papers holistically for superior analytics and discovery.
- Literary Evaluation: Dive deep into long-form content material, capturing nuanced thematic parts.
- Monetary Forecasting: Attain superior insights from detailed monetary studies.
- Conversational AI: Enhance chatbot responses to intricate consumer queries.
Benchmarking reveals that in a number of datasets, this prolonged context enabled jina-embeddings-v2
to outperform different main base embedding fashions, emphasizing the sensible benefits of longer context capabilities.

- Availability: Each fashions are freely accessible for obtain on Huggingface:
- Base Mannequin (0.27G) – Designed for heavy-duty duties requiring increased accuracy, like educational analysis or enterprise analytics.
- Small Mannequin (0.07G) – Crafted for light-weight functions reminiscent of cellular apps or gadgets with restricted computing sources.
- Measurement Choices for Totally different Wants: Understanding the varied wants of the AI group, Jina AI provides two variations of the mannequin:
jinaai/jina-embeddings-v2-base-en · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

jinaai/jina-embeddings-v2-small-en · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

In reflecting on the journey and significance of this launch, Dr. Han Xiao, CEO of Jina AI, shared his ideas:
“Within the ever-evolving world of AI, staying forward and making certain open entry to breakthroughs is paramount. With
jina-embeddings-v2
, we have achieved a big milestone. Not solely have we developed the world’s first open-source 8K context size mannequin, however we’ve additionally introduced it to a efficiency degree on par with trade giants like OpenAI. Our mission at Jina AI is obvious: we intention to democratize AI and empower the group with instruments that have been as soon as confined to proprietary ecosystems. Immediately, I’m proud to say, we’ve taken an enormous leap in direction of that imaginative and prescient.”
This pioneering spirit is obvious in Jina AI’s forward-looking plans.
A Glimpse into the Future
Jina AI is dedicated to main the forefront of innovation in AI. Right here’s what’s subsequent on their roadmap:
- Educational Insights: An instructional paper detailing the technical intricacies and benchmarks of
jina-embeddings-v2
will quickly be revealed, permitting the AI group to realize deeper insights. - API Improvement: The workforce is within the superior levels of creating an OpenAI-like embeddings API platform. This may present customers with the aptitude to effortlessly scale the embedding mannequin in line with their wants.
- Language Growth: Venturing into multilingual embeddings, Jina AI is setting its sights on launching German-English fashions, additional increasing its repertoire.
About Jina AI GmbH:
Situated at Ohlauer Str. 43 (1st flooring), zone A, 10999 Berlin, Germany, Jina AI is on the vanguard of reshaping the panorama of multimodal synthetic intelligence. For inquiries, please attain out at [email protected].