Documentation
Guide to the Embaas documentation.
Welcome to Embaas, a powerful API service that allows you to extract text from various document file types, generate numerical vector representations (embeddings), and much more. We help you streamline your AI app development!
Overview
At Embaas, we prioritize the security and privacy of your data. We want to assure you that we do not store any of your data. You can review our privacy policy for more information.
Our offerings include the Document Extraction API (Document to Embeddings) and the Embeddings API, which have various functionalities and use-cases. Both these APIs are in beta stage, implying that they may be subject to changes. However, this should not hinder you from using our APIs. We always strive to improve our services to ensure a seamless user experience.
Our model selection is based on the MTEB leaderboard and popularity of the models. For a detailed list of our available embedding models, refer to our available embedding models.
Document Extraction API (Document to Embeddings)
The Document Extraction API enables you to extract text from various document file types. With this API, you can extract text, chunk it into smaller parts, and create embeddings for the extracted text. The API currently supports most text file formats and we are working on audio-to-text support as well as including image-to-text.
To extract text from a document, make a POST request to the /v1/document/extract-text/
endpoint. You can specify options such as chunking the text, defining the chunk size, overlap between chunks, and the text splitter. You can also choose to create embeddings for the extracted text by setting the should_embed
parameter to true
and specifying the model
parameter.
Moreover, the API also provides an /v1/document/extract-text/bytes/
endpoint, which allows you to extract text from document bytes directly, without writing them to a physical file.
The response metadata varies based on the MIME type of the document. For example, for PDF files, the metadata includes the start and end page numbers of each text chunk.
Embeddings API
The Embeddings API allows you to generate numerical vector representations, known as embeddings, for a list of input texts. By utilizing pre-trained machine learning models, this API can convert textual data into a format that can be efficiently processed by various data analysis tools.
To generate embeddings for a list of input texts, make a POST request to the /v1/embeddings/
endpoint. The request body should be in JSON format and include the texts
field, an array of strings representing the sentences or pieces of text for which embeddings should be generated, and the model
field, which specifies the embedding model to be used.
The response from the API will include the generated embeddings for each input text, along with an index that indicates the position of the text in the original list.
Reduce API
The Reduce API facilitates the compression of your existing numerical vector representations, commonly known as embeddings. By leveraging advanced supervised learning algorithms, this API can shrink the dimensions of your embeddings by up to 70%, preserving a similar level of accuracy.
To achieve this reduction, send a POST request to the /v1/reduce/
endpoint. Ensure your request body is structured in JSON format, incorporating the embeddings
field, an array containing the embeddings you wish to reduce, and the model
field, dictating the specific reduction model to be employed.
The API's response will deliver the compressed embeddings corresponding to each input, accompanied by an index reflecting the initial placement of the embedding in the submitted list.