Simple cost, pay for what you use.
Start for freeExperiment with our API with $20 of free compute.
Pay as you goYou are billed per millisecond of compute.
ScalableAccess to scalable infrastructure, without the hassle of maintaining it.

Compute cost


per second

Choose a plan to start using Pipeline

Full access to our serverless API
Developer plan


+ compute cost

Custom Model deployment

Access to all our state-of-the-art pre-trained models

Unlimited API requests

Billed per millisecond of compute

Basic chat and email support

FREE for the first month and $20 of free compute

Enterprise plan

Everything in Developer plan

Custom compute pricing

Private cluster

On-prem or other cloud deployments

Dedicated support

How is cost calculated?

You’re billed only when a GPU is computing your request


GPU active

  1. 1. API request sent
  2. 2. GPU starts processing request
  3. 3. GPU compute ends
  4. 4. API sends result

Example compute cost per model

Here are real-world examples of how much running a model will cost
ModelEstimated cost
$0.0222/ 1K tokens(~40s of compute)
GPT-2 Large
$0.0167/ 1K tokens(~30s of compute)
GPT-Neo 2.7B
$0.0267/ 1K tokens(~48s of compute)
$0.0083/ 9 images(~15s of compute)
$0.0133/ 9 images(~24s of compute)

Frequently asked questions

A pipeline surrounds a model with inference logic so that your user's inputs and the model's raw outputs can be processed to produce the desired output. For example, the GPT-J pipeline adds input arguments and logit sampling techniques to the underlying GPT-J model within it.
That depends on your task, but for text generation our best pipeline is GPT-J, and for image generation our best pipeline is DALL·E Mega. You can experiment with all our other pipelines too, or ask us for advice on which one to use.
Yes! We can deploy custom models and create a private endpoint so you can run on our serverless hardware. This functionality isn't yet available over the API, so send us a message and we'll discuss your needs and how we can serve them.
It's rare that you'll experience changing inference times as long as your requests are identical. However this can happen on occasion when our proprietary task distribution software decides to route your task to different GPUs. We are always improving these algorithms.
Until bandwidth across PCIE into the GPU is instantaneous, all ML compute providers will be troubled by the cold-start issue where models first require loading before inference can be served. With our custom distribution software we have minimised the impact of these, and you will never be charged for model loading time.
ML models only deal with numbers, not words. So to convert a word to a number, you need to break it down into tokens. Different tokenizers use different strategies, but a token roughly corresponds to five characters of text content in our pipelines.

Start your AI journey today

Join over 2,500+ customers already using Pipeline.