Documentation

Provision GPU pods, connect over SSH, and deploy large language models on Poidex.

Quick Start

Get a GPU pod running in under a minute. Install the CLI and authenticate with your API key from the dashboard.

bash install & authenticate

# Install the Poidex CLI
pip install poidex-cli

# Authenticate with your API key
poidex auth login --api-key pdx_live_xxxxxxxxxxxxxxxxxxxx

# Verify the connection
poidex account whoami

Launch your first on-demand RTX 4090 pod:

bash launch a pod

poidex pod create \
  --gpu rtx4090 \
  --image pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime \
  --disk 50 \
  --name my-first-pod

CLI Interface

The poidex tool manages the full lifecycle of your pods.

bash pod management

# List all your pods and their status
poidex pod list

# Show live details for a pod
poidex pod describe my-first-pod

# Stop a pod (billing halts immediately)
poidex pod stop my-first-pod

# Resume a stopped pod
poidex pod start my-first-pod

# Terminate and delete a pod permanently
poidex pod rm my-first-pod

# Stream real-time GPU utilization
poidex pod metrics my-first-pod --watch

Add --output json to any command to integrate Poidex into CI/CD pipelines.

SSH Connection

Every pod exposes a secure SSH endpoint. Register your public key, then connect using the host and port shown in your dashboard.

bash register SSH key

# Upload your public key to your Poidex account
poidex ssh-key add --file ~/.ssh/id_ed25519.pub

Connect to your running pod over encrypted SSH:

bash ssh into a pod

# Connect using the host and port from your dashboard
ssh [email protected] -p 32221

# Forward a local port (e.g. Jupyter or an inference server)
ssh [email protected] -p 32221 -L 8888:localhost:8888

# Copy files to your pod over SCP
scp -P 32221 ./dataset.tar.gz [email protected]:/workspace/

All SSH sessions use key-based authentication only. Password login is disabled by default.

Deploying LLMs

Serve an open-weight model with vLLM in a few commands. The example launches an OpenAI-compatible endpoint.

bash inside the pod (over SSH)

# Install vLLM
pip install vllm

# Launch an OpenAI-compatible server on port 8000
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Meta-Llama-3-8B-Instruct \
  --port 8000 \
  --gpu-memory-utilization 0.90

From your local machine, forward the port and send a request:

bash query the endpoint

# Forward the inference port locally
ssh [email protected] -p 32221 -L 8000:localhost:8000

# Send a completion request
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello from Poidex!"}]
  }'

You're ready to scale

Provision additional pods on demand and tear them down per second when your job completes. Review live rates on the pricing page.