Simple cost, pay for what you use.
Start for freeExperiment with our API with $20 of free compute.
Pay as you goYou are billed per millisecond of compute.
ScalableAccess to scalable infrastructure, without the hassle of maintaining it.

Compute cost


per second

Choose a plan to start using Pipeline

Full access to our serverless API
Developer plan


+ compute cost

Custom Model deployment

Access to all our state-of-the-art pre-trained models

Unlimited API requests

Billed per millisecond of compute

Basic chat and email support

FREE for the first month and $20 of free compute

Enterprise plan

Everything in Developer plan

Custom compute pricing

Private cluster

On-prem or other cloud deployments

Dedicated support

How is cost calculated?

You’re billed only when a GPU is computing your request


GPU active

  1. 1. API request sent
  2. 2. GPU starts processing request
  3. 3. GPU compute ends
  4. 4. API sends result

Example compute cost per model

Here are real-world examples of how much running a model will cost
ModelEstimated cost
$0.0222/ 1K tokens(~40s of compute)
GPT-2 Large
$0.0167/ 1K tokens(~30s of compute)
GPT-Neo 2.7B
$0.0267/ 1K tokens(~48s of compute)
$0.0083/ 9 images(~15s of compute)
$0.0133/ 9 images(~24s of compute)

Frequently asked questions

What is serverless GPU? What is serverless GPU?

Serverless GPU refers to the use of graphics processing units (GPUs) in a serverless computing environment. In a serverless environment, you can run your applications and services without having to worry about the underlying infrastructure. Instead of setting up and maintaining physical servers or virtual machines, you can simply deploy your model and let Pipeline AI handle the rest.

Is inference faster on GPU or CPU?

In general, GPU (graphics processing unit) is faster than CPU (central processing unit) for machine learning inference, because GPU is specifically designed to perform many calculations in parallel. Machine learning models often involve a large number of mathematical operations, and GPU can perform these operations much faster than CPU. However, the actual speed difference will depend on the specific hardware and the type of model being used.

For example, some types of machine learning models may be more suited to CPU inference, because they require a high level of sequential processing that is better suited to CPU architecture.In general, deep learning models, which involve a large number of matrix operations, are well- suited to GPU inference, while models that rely more on sequential processing, such as decision trees, may be better suited to CPU inference.

Ultimately, the choice between GPU and CPU for machine learning inference will depend on the specific needs of your application and the hardware resources that are available to you.

Can I use your serverless API for my own ML models?

Yes! You can deploy custom models and create a private endpoint for you.

Do you have a limit to how many API calls you can handle?

No. Pipeline AI is built to support a few requests to hundreds of requests per minute. If your usage is higher than the average customer, we offer our enterprise plan. Feel free to contact us for more information on this.

How do you handle cold-start?

Until bandwidth across PCIE into the GPU is instantaneous, all ML compute providers will be troubled by the cold-start issue where models first require loading before inference can be served. With our custom distribution software we have minimised the impact of these.

Start your AI journey today

Join over 2,500+ customers already using Pipeline.