Documentation
Provision GPU pods, connect over SSH, and deploy large language models on Poidex.
Quick Start
Get a GPU pod running in under a minute. Install the CLI and authenticate with your API key from the dashboard.
# Install the Poidex CLI
pip install poidex-cli
# Authenticate with your API key
poidex auth login --api-key pdx_live_xxxxxxxxxxxxxxxxxxxx
# Verify the connection
poidex account whoami
Launch your first on-demand RTX 4090 pod:
poidex pod create \
--gpu rtx4090 \
--image pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime \
--disk 50 \
--name my-first-pod
CLI Interface
The poidex tool manages the full lifecycle of your pods.
# List all your pods and their status
poidex pod list
# Show live details for a pod
poidex pod describe my-first-pod
# Stop a pod (billing halts immediately)
poidex pod stop my-first-pod
# Resume a stopped pod
poidex pod start my-first-pod
# Terminate and delete a pod permanently
poidex pod rm my-first-pod
# Stream real-time GPU utilization
poidex pod metrics my-first-pod --watch
Add --output json to any command to integrate Poidex into CI/CD pipelines.
SSH Connection
Every pod exposes a secure SSH endpoint. Register your public key, then connect using the host and port shown in your dashboard.
# Upload your public key to your Poidex account
poidex ssh-key add --file ~/.ssh/id_ed25519.pub
Connect to your running pod over encrypted SSH:
# Connect using the host and port from your dashboard
ssh [email protected] -p 32221
# Forward a local port (e.g. Jupyter or an inference server)
ssh [email protected] -p 32221 -L 8888:localhost:8888
# Copy files to your pod over SCP
scp -P 32221 ./dataset.tar.gz [email protected]:/workspace/
All SSH sessions use key-based authentication only. Password login is disabled by default.
Deploying LLMs
Serve an open-weight model with vLLM in a few commands. The example launches an OpenAI-compatible endpoint.
# Install vLLM
pip install vllm
# Launch an OpenAI-compatible server on port 8000
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Meta-Llama-3-8B-Instruct \
--port 8000 \
--gpu-memory-utilization 0.90
From your local machine, forward the port and send a request:
# Forward the inference port locally
ssh [email protected] -p 32221 -L 8000:localhost:8000
# Send a completion request
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"messages": [{"role": "user", "content": "Hello from Poidex!"}]
}'
You're ready to scale
Provision additional pods on demand and tear them down per second when your job completes. Review live rates on the pricing page.