Evaluate and run large AI models easily and affordably with Bytez, treating models as functions – achieve GPU performance at CPU pricing.
Below is the basic usage for the python client library. See Libraries for javascript and julia examples.
from bytez import Bytez
client = Bytez("YOUR_BYTEZ_KEY_HERE")
model = client.model("Qwen/Qwen2-7B-Instruct")
model.load()
input_text = "Once upon a time there was a beautiful home where"
model_params = {"max_new_tokens": 20, "min_new_tokens": 5, "temperature": 0.5}
result = model.run(input_text, model_params=model_params)
output = result["output"]
generated_text = output[0]["generated_text"]
print(generated_text)
Streaming usage (only text-generation models support streaming currently)
from bytez import Bytez
client = Bytez("YOUR_BYTEZ_KEY_HERE")
model = client.model("Qwen/Qwen2-7B-Instruct")
model.load()
input_text = "Once upon a time there was a beautiful home where"
model_params = {"max_new_tokens": 20, "min_new_tokens": 5, "temperature": 0.5}
stream = model.run(
input_text,
stream=True,
model_params=model_params,
)
for chunk in stream:
print(f"Output: {chunk}")
Each link below has a quickstart and detailed examples for all supported ML tasks for a given client
Two steps to run inference in seconds:
-
Get your API Key by visiting the Bytez Settings Page
-
Choose how you want to perform inference with Bytez:
-
Use the Bytez Model Playground on bytez.com (Great for exploring and trying models)
-
Install a client library:
-
Hit the REST API directly
-
Run inference locally via Docker
To use this API, you need an API key. Obtain your key by visiting the settings page Bytez Settings Page
To then use it in code (python example):
from bytez import Bytez
client = Bytez("YOUR_BYTEZ_KEY_HERE")
All users are provided with 100 credits worth of free compute per month!
You can play with models without having to write any code by visiting Bytez
We've set up a public sandbox in Postman to demo our API. Note: this is the v2 endpoint, which allows you to demo both closed and open source AI models.
Category | Description |
---|---|
Closed Source Examples | Examples for using closed-source models from leading providers (Anthropic, OpenAI, Cohere and more!) |
Open Source Examples | Examples demonstrating how to use HTTP requests to interact with 23k+ open-source models on the platform. |
Open Source Examples - Image as Input | Examples using images as input across various tasks, including classification and segmentation. |
Open Source Examples - Messages as Input | Examples using messages as input, ideal for chat-based applications and sentiment analysis. |
Open Source Examples - Text as Input | Examples for handling text input, such as summarization, translation, and general NLP tasks. |
Open Source Examples - Multi-Input | Examples that handle multiple types of input simultaneously, such as text and images. |
Useful Functions & Model Library | Explore utility functions to list models by task, clusters, and more for streamlined model selection. |
Load and run a model after installing our python library (pip install bytez
).
Full documentation can be found here.
import os
from bytez import Bytez
client = Bytez("YOUR_BYTEZ_KEY_HERE")
# Initalize a model
model = client.model('openai-community/gpt2')
# Start a model
model.load()
# Run a model
output = model.run("Once upon a time there was a", model_params={"max_new_tokens": 20,"min_new_tokens": 5})
print(output)
See the API Documentation for all examples.
Load and run a model after installing our Typescript library (npm i bytez.js
).
Full documentation can be found here.
import Bytez from "bytez.js";
client = new Bytez("YOUR_BYTEZ_KEY_HERE");
// Grab a model
model = client.model("openai-community/gpt2");
// Start a model
await model.load();
// Run a model
const output = await model.run("Once upon a time there was a", {
// huggingface params
max_new_tokens: 20,
min_new_tokens: 5
});
console.log(output);
See API Documentation for all examples.
Load and run a model after installing our Bytez library (add Bytez
).
Full documentation can be found here.
Interactive Notebook! (Coming Soon)
using Bytez
client = Bytez.init("YOUR_BYTEZ_KEY_HERE");
# Grab a model
model = client.model("openai-community/gpt2")
# Start a model
model.load()
# Run a model
options = Dict(
"params" => Dict(
"max_new_tokens" => 20,
"min_new_tokens" => 5,
"temperature" => 0.5,
)
)
output = model.run(input_text, options)
println(output)
Bytez has a REST API for loading, running, and requesting new models.
curl --location 'https://api.bytez.com/model/load' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "openai-community/gpt2",
"concurrency": 1
}'
curl --location 'https://api.bytez.com/model/run' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "openai-community/gpt2",
"prompt": "Once upon a time there was a",
"params": {
"min_length": 30,
"max_length": 256
}
}'
curl --location 'https://api.bytez.com/model/job' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "openai-community/gpt2"
}'
See the API Documentation for all endpoints.
All Bytez model images are available on Docker Hub, models can be played with via our Models page 🤙
The source code that runs for a given model in the docker image can be found here
We currently support 20K+ open source AI models across 30+ ML tasks.
Task | Total Models |
---|---|
Total Available | 14559 |
Text-generation | 5765 |
Summarization | 380 |
Unconditional-image-generation | 416 |
Text2text-generation | 393 |
Audio-classification | 390 |
Image-classification | 533 |
Zero-shot-classification | 213 |
Token-classification | 546 |
Video-classification | 419 |
Text-classification | 474 |
Fill-mask | 358 |
Text-to-image | 467 |
Depth-estimation | 53 |
Object-detection | 405 |
Sentence-similarity | 457 |
Image-segmentation | 322 |
Image-to-text | 249 |
Zero-shot-image-classification | 174 |
Translation | 592 |
Automatic-speech-recognition | 455 |
Question-answering | 563 |
Image-feature-extraction | 114 |
Visual-question-answering | 105 |
Feature-extraction | 399 |
Mask-generation | 77 |
Zero-shot-object-detection | 27 |
Text-to-video | 11 |
Text-to-speech | 173 |
Document-question-answering | 18 |
Text-to-audio | 11 |
Here's a sample of some models that can be run - with their required RAM.
Model Name | Required RAM (GB) |
---|---|
EleutherAI/gpt-neo-2.7B | 2.23 |
bigscience/bloom-560m | 3.78 |
succinctly/text2image-prompt-generator | 1.04 |
ai-forever/mGPT | 9.59 |
microsoft/phi-1 | 9.16 |
facebook/opt-1.3b | 8.06 |
tiiuae/falcon-40b-instruct | 182.21 |
tiiuae/falcon-7b-instruct | 27.28 |
codellama/CodeLlama-7b-Instruct-hf | 26.64 |
deepseek-ai/deepseek-coder-6.7b-instruct | 26.50 |
upstage/SOLAR-10.7B-Instruct-v1.0 | 57.63 |
elyza/ELYZA-japanese-Llama-2-7b-instruct | 38.24 |
NousResearch/Meta-Llama-3-8B-Instruct | 30.93 |
codellama/CodeLlama-70b-Instruct-hf | 372.52 |
To see the full list, run:
models = client.list_models()
print(models)
To see a task specific list, run:
models = client.list_models(task="text-generation")
print(models)
We value your feedback to improve our documentation and services. If you have any suggestions, please join our Discord or contact us via email at [email protected]