What are credits and how are they used?

Credits are the unit of consumption in Latentface. Each minute of active Backstage session (face swap running) uses one credit. Credits reset monthly on your billing date.

Can I use Latentface without a GPU?

Yes — that's the core benefit. All face swap processing happens in Latentface's cloud infrastructure. Your local device only needs a browser and a webcam.

What makes Latentface different from other face swap tools?

Most face swap tools require a powerful local GPU and complex setup. Latentface runs entirely in the cloud, works on any device, and is ready in seconds with a single click.

How does Latentface handle my privacy and data?

Your video stream is processed in real time and not stored unless you explicitly enable recording. All data is encrypted in transit and at rest.

What happens if I run out of credits?

Your Backstage sessions will pause until your credits renew at the start of the next billing cycle. You can also upgrade your plan at any time to get more credits immediately.

Is there a free plan I can try first?

Yes! The Free plan gives you 10 credits per month, 480p quality, and up to 10 minutes of recording — no credit card required.

Pay-per-call face swap, match, blend, and enhance API — Python + TypeScript SDKs, models on Hugging Face, sub-1s p95 latency.

Face Embedding API: The 2026 Developer's Guide

The short answer: Latentface exposes face-model inference as a pay-per-call REST API with typed Python and TypeScript SDKs. Four endpoints cover the common operations: compute a 512-dim face embedding, compare two faces for similarity, swap a face between two images, and blend N faces into a new synthesized face. Free tier starts at 100 calls/day with watermarked output; paid tiers start at $49/mo for 10K calls.

Installation:

pip install latentface
# or
npm install @latentface/sdk

Your first face swap in 4 lines of Python:

from latentface import Latentface
client = Latentface(api_key="your-key")
result = client.swap(source="source.jpg", target="target.jpg")
result.save("swapped.png")

Get your API key (free tier) →

What Latentface is (and what it's not)

Latentface is a face-model inference API shaped like Hugging Face — research-community flavored, public-by-default, priced per API call. You sign up, get a key, start calling. No sales process, no demo request, no annual contract. Pay for what you use; free tier is generous enough to build and test an integration before you commit.

Latentface is not enterprise-SaaS-shaped. There are no SLA contracts, dedicated GPU pools, or 4-hour support guarantees. If you need those, the rest of the face-API category (Face++, AWS Rekognition Face, Azure Face API) is the correct starting point — they'll charge more, onboard slower, and deliver the enterprise-tier assurances Latentface doesn't offer.

What Latentface does offer that those enterprise APIs don't:

Face swap as a first-class endpoint. Most face APIs focus on detection, verification, and landmarking. Latentface is face-swap-first because it runs on the same pipeline that powers the Onlyface real-time face-swap product.
Multi-face blend — N faces in, one synthesized face out. Unique to Latentface among commercial APIs as of Q2 2026.
Public Hugging Face presence. Three of our core models are published as public Spaces; you can try them before signing up.

The four endpoints

`/v1/match` — face similarity

Compute how similar two faces are. Returns a cosine similarity score in [0, 1].

result = client.match(
    face_a="person_1.jpg",
    face_b="person_2.jpg",
)
print(result.similarity)  # → 0.87
print(result.are_same_person)  # → True (threshold 0.6)

Use cases: identity verification, deduplication, "who is this photo closest to" workflows.

Pricing: $0.005 per call. Free tier: 100 calls/day.

`/v1/swap` — face swap between two images

Take face A from a source image and apply it to face B's position in a target image. Returns a PNG.

result = client.swap(
    source="face_a.jpg",
    target="scene_with_face_b.jpg",
    enhance=True,  # apply GPEN enhancement — default True
)
result.save("swapped.png")

Use cases: dating app "what would our baby look like," e-commerce "try these glasses on your face," game-studio "NPC with player's face," social app "Snapchat-style face stories."

Pricing: $0.01 per call (swap + enhance). $0.008 per call without enhancement.

`/v1/blend` — multi-face blend (N→1)

Take 2 or more faces and synthesize a new face that blends all of them. Unlike swap, the output face is not present in any of the inputs — it's a new face computed from the inputs' geometry.

result = client.blend(
    faces=["parent_a.jpg", "parent_b.jpg"],
    weights=[0.5, 0.5],  # optional; defaults to equal weights
)
result.save("future_baby.png")

Use cases: "future baby" predictors, family mashups, friend-group blends, character design tools. Powers the faceblend.app product.

Pricing: $0.015 per call (blend + enhance).

`/v1/enhance` — GPEN face enhancement

Apply GPEN (GAN Prior Embedded Network) beauty enhancement to a face. Skin smoothing that preserves texture; not a blur.

result = client.enhance(image="portrait.jpg", preset="natural")
result.save("enhanced.png")

Presets: natural, studio, soft. Use cases: video-call beauty filters, selfie enhancement, portrait retouch.

Pricing: $0.008 per call.

Authentication

All requests carry an API key via the Authorization header:

curl -X POST https://api.latentface.net/v1/swap \
  -H "Authorization: Bearer $LATENTFACE_API_KEY" \
  -F "source=@face_a.jpg" \
  -F "target=@scene_with_face_b.jpg"

Keys are scoped per environment — create separate keys for development, staging, and production, so if a dev key leaks you rotate it without touching production.

Keys are created and rotated from the dashboard at latentface.net/dashboard/keys. Deleted keys are hard-revoked within 60 seconds; there's no "soft" revocation window.

Rate limits

Free tier: 100 calls per 24 hours, with a 1 req/sec burst cap.

Paid tiers:

Tier	Monthly quota	Burst (req/sec)	Soft cap behavior
Free	100/day	1	Hard rate-limit with 429
Starter ($49/mo)	10,000/mo	10	429 at quota; can enable overage
Scale ($199/mo)	100,000/mo	50	Priority queue; 429 only at hard cap
Enterprise	Custom	Custom	Dedicated GPU pool; no queue contention

Rate limits are per API key. Exceeding the burst returns HTTP 429 with a Retry-After header in seconds.

Python SDK reference

The Python SDK is a thin wrapper over the REST API. Both synchronous and async versions are available.

# Sync
from latentface import Latentface
client = Latentface(api_key="key")
result = client.swap(source="a.jpg", target="b.jpg")

# Async
from latentface import AsyncLatentface
import asyncio

async def main():
    client = AsyncLatentface(api_key="key")
    result = await client.swap(source="a.jpg", target="b.jpg")

asyncio.run(main())

Input flexibility

All endpoints accept three input types for images:

File path (string): source="a.jpg" — the SDK reads the file.
Bytes: source=image_bytes — pass raw bytes directly.
URL: source="https://example.com/a.jpg" — the server fetches the image. Subject to a 10MB size limit and a 5-second download timeout.

Response object

Every endpoint returns a Result object with:

result.image       # bytes — the output image
result.save(path)  # write to disk
result.base64      # base64-encoded image string
result.latency_ms  # server-side processing latency
result.request_id  # for support inquiries
result.cost_usd    # this specific call's cost

Error handling

The SDK raises typed exceptions:

from latentface import Latentface, LatentfaceError, NoFaceDetectedError, RateLimitError

client = Latentface(api_key="key")
try:
    result = client.swap(source="a.jpg", target="b.jpg")
except NoFaceDetectedError as e:
    print(f"No face in image: {e.image_name}")
except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.retry_after}s")
except LatentfaceError as e:
    print(f"API error: {e.code} — {e.message}")

Errors include a request_id you can cite in support tickets.

TypeScript SDK reference

The TypeScript SDK mirrors the Python API. It's an ESM module with full type definitions.

import { Latentface } from '@latentface/sdk';

const client = new Latentface({ apiKey: process.env.LATENTFACE_API_KEY });

const result = await client.swap({
  source: './a.jpg',     // path (Node) or File (browser)
  target: './b.jpg',
  enhance: true,
});

await result.save('./swapped.png');

Browser usage is supported but carries caveats: API keys should never ship in browser-bundled code. Use a server-side proxy (e.g., a Next.js API route) that holds the key and forwards authenticated requests from the browser. Example proxy pattern:

// app/api/latentface-swap/route.ts  (Next.js 15 App Router)
import { Latentface } from '@latentface/sdk';

export async function POST(req: Request) {
  const form = await req.formData();
  const client = new Latentface({ apiKey: process.env.LATENTFACE_API_KEY });
  const result = await client.swap({
    source: form.get('source') as File,
    target: form.get('target') as File,
  });
  return new Response(result.image, {
    headers: { 'Content-Type': 'image/png' },
  });
}

Latency: what to expect

Measured end-to-end (request leaves client → response body arrives). Network-dependent, so numbers reflect a median North American client.

Endpoint	Starter tier p50	Scale tier p50	p99 (any tier)
`/v1/match`	180 ms	140 ms	450 ms
`/v1/swap`	620 ms	380 ms	1100 ms
`/v1/blend`	780 ms	520 ms	1400 ms
`/v1/enhance`	350 ms	250 ms	680 ms

For interactive UX (a web app where the user is waiting), the Scale tier's p50 is appropriate — users perceive sub-500ms as responsive. For batch pipelines (a background job processing thousands of photos), latency matters less than throughput, and the Starter tier's queue is sized appropriately.

Running the models on Hugging Face instead

All three core models (face-swap, face-match, face-blend) are published as public Hugging Face Spaces:

latentface/face-swap
latentface/face-match
latentface/face-blend

You can evaluate model quality before signing up for the API by running a few examples on HF. For production traffic, the Latentface API is faster (dedicated GPU pool, no HF queue), cheaper (per-call vs HF's pay-for-GPU-time), and includes the SDK wrappers, error typing, and rate-limit infrastructure that running models on HF Spaces doesn't.

We publish to Hugging Face because we think developers should be able to see the models before buying. The API is the packaged, production-ready version of the same models.

Five example integrations

1. Dating app: "what would our baby look like"

def future_baby(couple_photos):
    client = Latentface(api_key=os.environ["LATENTFACE_API_KEY"])
    return client.blend(faces=couple_photos, weights=[0.5, 0.5])

2. E-commerce: "try these glasses on your face"

Combine swap with product overlays:

def try_glasses(selfie, model_with_glasses):
    client = Latentface(api_key="...")
    # Swap customer's face onto model-with-glasses
    return client.swap(source=selfie, target=model_with_glasses)

3. Game studio: "NPC with player's face"

Batch processing — process a folder of NPC templates with the player's face:

for npc_template in Path("npc_templates").glob("*.png"):
    result = client.swap(source=player_selfie, target=npc_template)
    result.save(f"npc_output/{npc_template.stem}_as_player.png")

4. Security: identity verification

result = client.match(face_a=stored_selfie, face_b=live_camera_frame)
verified = result.similarity > 0.75  # threshold per your security policy

5. Social app: "which celebrity do you look like"

Run match against your own celebrity-embedding database (not bundled; see our /docs/guides/celebrity-db for how to build one with Latentface):

my_embedding = client.embed(image=user_selfie).vector
similarities = [
    (celeb_name, cosine_similarity(my_embedding, celeb_emb))
    for celeb_name, celeb_emb in my_celebrity_db.items()
]
top5 = sorted(similarities, key=lambda x: x[1], reverse=True)[:5]

Benchmark: Latentface vs the alternatives

We benchmarked Latentface against two commercial alternatives (Face++, AWS Rekognition) and the open-source baseline (running InSwapper locally on an RTX 3090) on 1,000 diverse face-swap inputs.

Stack	Median latency	Cost per 1K calls	Setup time
Latentface Starter	620 ms	$10	3 min
Latentface Scale	380 ms	$10	3 min
Face++	1200 ms	$18	45 min
AWS Rekognition (no swap endpoint)	—	—	—
InSwapper local (RTX 3090)	210 ms	$0*	3+ hours

*InSwapper local cost is "free" in direct API terms but assumes you own and maintain the GPU. Amortized hardware + power + maintenance is typically $0.50–$2 per 1K calls depending on your specifics.

AWS Rekognition doesn't offer a face-swap endpoint (face detection and verification only), so it's not comparable on this specific metric. For pure face-similarity matching, Rekognition is competitive but lacks the swap / blend capabilities.

Frequently asked questions

How do I add face swap to my app?

Install the Latentface SDK (pip install latentface or npm install @latentface/sdk), get an API key from the dashboard, and call client.swap(source, target). Most integrations are 3–5 lines of code. See the quickstart at /docs/quickstart for a full example.

What's the cheapest face-embedding API?

Latentface's pricing: $0.005/call for face match, $0.008/call for enhance, $0.01/call for swap, $0.015/call for blend. Free tier at 100 calls/day. Compared to Face++ ($0.018/call equivalent) and AWS Rekognition ($0.001/call for detection only, no swap), Latentface is mid-priced but includes operations the others don't offer.

Is there a Python SDK for face swap?

Yes. pip install latentface gives you a typed Python SDK with sync and async clients, type stubs for IDE autocompletion, and automatic retry on transient failures. See the Python SDK reference above. A TypeScript SDK is also available via npm install @latentface/sdk.

Is Latentface a Hugging Face alternative?

Latentface is complementary to Hugging Face, not a replacement. Our models are published as public HF Spaces, so you can evaluate them there. The Latentface API adds: sub-1s p95 latency (vs HF Spaces queue), per-call pricing (vs HF's pay-per-GPU-time), typed SDKs, and rate-limit infrastructure. For prototyping, HF Spaces is great; for production, the API is production-ready.

How much does face-swap API cost per call?

$0.01/call on the Starter tier, with the enhancement step included. Without enhancement: $0.008/call. Volume discounts apply on the Scale tier and above. See /pricing for the full table.

What's the latency on the free tier?

Same as the Starter tier (p50 620ms for swap). Free tier traffic goes through the same queue and GPU pool — the difference is the 100-calls-per-day quota and a watermark on swap/blend outputs. If you need no watermark or higher quota, Starter at $49/mo is the next step.

Can I run Latentface models on-device?

Not today. The current models are GPU-bound (RTX 3060+ equivalent for acceptable latency) and ship as ONNX weights too large for typical mobile / edge deployment. On-device support is on the 2026 Q4 roadmap, likely starting with a distilled match-only model for iOS / Android.

Do you store my uploaded images?

No. Uploaded images live in server memory for the duration of the request (typically under 1 second) and are discarded after the response is generated. We do not persist image data, and we do not use your uploaded images for model training. Full details at /privacy-policy.

Is there an open-source client?

Both SDKs are open source. Python SDK: github.com/latentface/latentface-python. TypeScript SDK: github.com/latentface/latentface-js. Both are Apache 2.0. Contributions welcome.

Ready to build?

Get your API key — free tier, no credit card →

Free tier includes 100 calls/day across all endpoints. Upgrade when you're ready for production volume. SDKs are open source; Hugging Face Spaces are public for evaluation.

Last reviewed: 2026-04-17. Latentface is a developer-first face-model API operated by OS Designers, Inc. (South Korea). See /docs/api for the full endpoint reference and /pricing for detailed pricing.