Custom Model deployment
Access to all our state-of-the-art pre-trained models
Unlimited API requests
Billed per millisecond of compute
Basic chat and email support
FREE for the first month and $20 of free compute
Everything in Developer plan
Custom compute pricing
Private cluster
On-prem or other cloud deployments
Dedicated support
Billable
GPU active
Serverless GPU refers to the use of graphics processing units (GPUs) in a serverless computing environment. In a serverless environment, you can run your applications and services without having to worry about the underlying infrastructure. Instead of setting up and maintaining physical servers or virtual machines, you can simply deploy your model and let Pipeline AI handle the rest.
In general, GPU (graphics processing unit) is faster than CPU (central processing unit) for machine learning inference, because GPU is specifically designed to perform many calculations in parallel. Machine learning models often involve a large number of mathematical operations, and GPU can perform these operations much faster than CPU. However, the actual speed difference will depend on the specific hardware and the type of model being used.
For example, some types of machine learning models may be more suited to CPU inference, because they require a high level of sequential processing that is better suited to CPU architecture.In general, deep learning models, which involve a large number of matrix operations, are well- suited to GPU inference, while models that rely more on sequential processing, such as decision trees, may be better suited to CPU inference.
Ultimately, the choice between GPU and CPU for machine learning inference will depend on the specific needs of your application and the hardware resources that are available to you.
Yes! You can deploy custom models and create a private endpoint for you.
No. Pipeline AI is built to support a few requests to hundreds of requests per minute. If your usage is higher than the average customer, we offer our enterprise plan. Feel free to contact us for more information on this.
Until bandwidth across PCIE into the GPU is instantaneous, all ML compute providers will be troubled by the cold-start issue where models first require loading before inference can be served. With our custom distribution software we have minimised the impact of these.