La plateforme | Mistral AI
Mistral AI brings the strongest open generative fashions to the builders, together with environment friendly methods to deploy and customise them for manufacturing.
We’re opening a beta entry to our first platform services in the present day. We begin easy: la plateforme serves three chat endpoints for producing textual content following textual directions and an embedding endpoint. Every endpoint has a unique efficiency/value tradeoff.
Generative endpoints
The 2 first endpoints, mistral-tiny and mistral-small, presently use our two launched open fashions; the third, mistral-medium, makes use of a prototype mannequin with increased performances that we’re testing in a deployed setting.
We serve instructed variations of our fashions. We’ve got labored on consolidating the best alignment methods (environment friendly fine-tuning, direct desire optimisation) to create easy-to-control and pleasant-to-use fashions. We pre-train fashions on knowledge extracted from the open Internet and carry out instruction fine-tuning from annotations.
Mistral-tiny. Our most cost-effective endpoint presently serves Mistral 7B Instruct v0.2, a brand new minor launch of Mistral 7B Instruct.
Mistral-tiny solely works in English. It obtains 7.6 on MT-Bench.
The instructed mannequin might be downloaded here.
Mistral-small. This endpoint presently serves our latest mannequin, Mixtral 8x7B, described in additional element in our blog post.
It masters English/French/Italian/German/Spanish and code and obtains 8.3 on MT-Bench.
Mistral-medium. Our highest-quality endpoint presently serves a prototype mannequin,
that’s presently among the many high serviced fashions accessible based mostly on normal benchmarks. It masters English/French/Italian/German/Spanish and code and obtains a rating of 8.6 on MT-Bench. The next desk evaluate the efficiency of the bottom fashions of Mistral-medium, Mistral-small and the endpoint of a competitor.

Embedding endpoint
Mistral-embed, our embedding endpoint, serves an embedding mannequin with a 1024 embedding dimension. Our embedding mannequin has been designed with retrieval capabilities in thoughts. It achieves a retrieval rating of 55.26 on MTEB.
API specs
Our API follows the specs of the favored chat interface initially proposed by our dearest competitor. We offer a Python and Javascript consumer library to question our endpoints. Our endpoints permit customers to supply a system immediate to set the next stage of moderation on mannequin outputs for purposes the place this is a crucial requirement.
Ramping up from beta entry to normal availability
Anybody can register to make use of our API as of in the present day as we progressively ramp up our capability. Our enterprise staff will help qualify your wants and speed up entry. Anticipate tough edges as we stabilise our platform in direction of absolutely self-served availability.
Acknowledgement
We’re grateful to Nvidia for supporting us in integrating TRT-LLM and Triton and dealing alongside us to make a sparse combination of specialists appropriate with TRT-LLM.