Face Embedding API: The 2026 Developer's Guide
Pay-per-call face swap, match, blend, and enhance API — Python + TypeScript SDKs, models on Hugging Face, sub-1s p95 latency.
Face Embedding API: The 2026 Developer's Guide
The short answer: Latentface exposes face-model inference as a pay-per-call REST API with typed Python and TypeScript SDKs. Four endpoints cover the common operations: compute a 512-dim face embedding, compare two faces for similarity, swap a face between two images, and blend N faces into a new synthesized face. Free tier starts at 100 calls/day with watermarked output; paid tiers start at $49/mo for 10K calls.
Installation:
pip install latentface
# or
npm install @latentface/sdk
Your first face swap in 4 lines of Python:
from latentface import Latentface
client = Latentface(api_key="your-key")
result = client.swap(source="source.jpg", target="target.jpg")
result.save("swapped.png")
Get your API key (free tier) →
What Latentface is (and what it's not)
Latentface is a face-model inference API shaped like Hugging Face — research-community flavored, public-by-default, priced per API call. You sign up, get a key, start calling. No sales process, no demo request, no annual contract. Pay for what you use; free tier is generous enough to build and test an integration before you commit.
Latentface is not enterprise-SaaS-shaped. There are no SLA contracts, dedicated GPU pools, or 4-hour support guarantees. If you need those, the rest of the face-API category (Face++, AWS Rekognition Face, Azure Face API) is the correct starting point — they'll charge more, onboard slower, and deliver the enterprise-tier assurances Latentface doesn't offer.
What Latentface does offer that those enterprise APIs don't:
- Face swap as a first-class endpoint. Most face APIs focus on detection, verification, and landmarking. Latentface is face-swap-first because it runs on the same pipeline that powers the Onlyface real-time face-swap product.
- Multi-face blend — N faces in, one synthesized face out. Unique to Latentface among commercial APIs as of Q2 2026.
- Public Hugging Face presence. Three of our core models are published as public Spaces; you can try them before signing up.
The four endpoints
/v1/match — face similarity
Compute how similar two faces are. Returns a cosine similarity score in [0, 1].
result = client.match(
face_a="person_1.jpg",
face_b="person_2.jpg",
)
print(result.similarity) # → 0.87
print(result.are_same_person) # → True (threshold 0.6)
Use cases: identity verification, deduplication, "who is this photo closest to" workflows.
Pricing: $0.005 per call. Free tier: 100 calls/day.
/v1/swap — face swap between two images
Take face A from a source image and apply it to face B's position in a target image. Returns a PNG.
result = client.swap(
source="face_a.jpg",
target="scene_with_face_b.jpg",
enhance=True, # apply GPEN enhancement — default True
)
result.save("swapped.png")
Use cases: dating app "what would our baby look like," e-commerce "try these glasses on your face," game-studio "NPC with player's face," social app "Snapchat-style face stories."
Pricing: $0.01 per call (swap + enhance). $0.008 per call without enhancement.
/v1/blend — multi-face blend (N→1)
Take 2 or more faces and synthesize a new face that blends all of them. Unlike swap, the output face is not present in any of the inputs — it's a new face computed from the inputs' geometry.
result = client.blend(
faces=["parent_a.jpg", "parent_b.jpg"],
weights=[0.5, 0.5], # optional; defaults to equal weights
)
result.save("future_baby.png")
Use cases: "future baby" predictors, family mashups, friend-group blends, character design tools. Powers the faceblend.app product.
Pricing: $0.015 per call (blend + enhance).
/v1/enhance — GPEN face enhancement
Apply GPEN (GAN Prior Embedded Network) beauty enhancement to a face. Skin smoothing that preserves texture; not a blur.
result = client.enhance(image="portrait.jpg", preset="natural")
result.save("enhanced.png")
Presets: natural, studio, soft. Use cases: video-call beauty filters, selfie enhancement, portrait retouch.
Pricing: $0.008 per call.
Authentication
All requests carry an API key via the Authorization header:
curl -X POST https://api.latentface.net/v1/swap \
-H "Authorization: Bearer $LATENTFACE_API_KEY" \
-F "source=@face_a.jpg" \
-F "target=@scene_with_face_b.jpg"
Keys are scoped per environment — create separate keys for development, staging, and production, so if a dev key leaks you rotate it without touching production.
Keys are created and rotated from the dashboard at latentface.net/dashboard/keys. Deleted keys are hard-revoked within 60 seconds; there's no "soft" revocation window.
Rate limits
Free tier: 100 calls per 24 hours, with a 1 req/sec burst cap.
Paid tiers:
| Tier | Monthly quota | Burst (req/sec) | Soft cap behavior |
|---|---|---|---|
| Free | 100/day | 1 | Hard rate-limit with 429 |
| Starter ($49/mo) | 10,000/mo | 10 | 429 at quota; can enable overage |
| Scale ($199/mo) | 100,000/mo | 50 | Priority queue; 429 only at hard cap |
| Enterprise | Custom | Custom | Dedicated GPU pool; no queue contention |
Rate limits are per API key. Exceeding the burst returns HTTP 429 with a Retry-After header in seconds.
Python SDK reference
The Python SDK is a thin wrapper over the REST API. Both synchronous and async versions are available.
# Sync
from latentface import Latentface
client = Latentface(api_key="key")
result = client.swap(source="a.jpg", target="b.jpg")
# Async
from latentface import AsyncLatentface
import asyncio
async def main():
client = AsyncLatentface(api_key="key")
result = await client.swap(source="a.jpg", target="b.jpg")
asyncio.run(main())
Input flexibility
All endpoints accept three input types for images:
- File path (string):
source="a.jpg"— the SDK reads the file. - Bytes:
source=image_bytes— pass raw bytes directly. - URL:
source="https://example.com/a.jpg"— the server fetches the image. Subject to a 10MB size limit and a 5-second download timeout.
Response object
Every endpoint returns a Result object with:
result.image # bytes — the output image
result.save(path) # write to disk
result.base64 # base64-encoded image string
result.latency_ms # server-side processing latency
result.request_id # for support inquiries
result.cost_usd # this specific call's cost
Error handling
The SDK raises typed exceptions:
from latentface import Latentface, LatentfaceError, NoFaceDetectedError, RateLimitError
client = Latentface(api_key="key")
try:
result = client.swap(source="a.jpg", target="b.jpg")
except NoFaceDetectedError as e:
print(f"No face in image: {e.image_name}")
except RateLimitError as e:
print(f"Rate limited. Retry after: {e.retry_after}s")
except LatentfaceError as e:
print(f"API error: {e.code} — {e.message}")
Errors include a request_id you can cite in support tickets.
TypeScript SDK reference
The TypeScript SDK mirrors the Python API. It's an ESM module with full type definitions.
import { Latentface } from '@latentface/sdk';
const client = new Latentface({ apiKey: process.env.LATENTFACE_API_KEY });
const result = await client.swap({
source: './a.jpg', // path (Node) or File (browser)
target: './b.jpg',
enhance: true,
});
await result.save('./swapped.png');
Browser usage is supported but carries caveats: API keys should never ship in browser-bundled code. Use a server-side proxy (e.g., a Next.js API route) that holds the key and forwards authenticated requests from the browser. Example proxy pattern:
// app/api/latentface-swap/route.ts (Next.js 15 App Router)
import { Latentface } from '@latentface/sdk';
export async function POST(req: Request) {
const form = await req.formData();
const client = new Latentface({ apiKey: process.env.LATENTFACE_API_KEY });
const result = await client.swap({
source: form.get('source') as File,
target: form.get('target') as File,
});
return new Response(result.image, {
headers: { 'Content-Type': 'image/png' },
});
}
Latency: what to expect
Measured end-to-end (request leaves client → response body arrives). Network-dependent, so numbers reflect a median North American client.
| Endpoint | Starter tier p50 | Scale tier p50 | p99 (any tier) |
|---|---|---|---|
/v1/match | 180 ms | 140 ms | 450 ms |
/v1/swap | 620 ms | 380 ms | 1100 ms |
/v1/blend | 780 ms | 520 ms | 1400 ms |
/v1/enhance | 350 ms | 250 ms | 680 ms |
For interactive UX (a web app where the user is waiting), the Scale tier's p50 is appropriate — users perceive sub-500ms as responsive. For batch pipelines (a background job processing thousands of photos), latency matters less than throughput, and the Starter tier's queue is sized appropriately.
Running the models on Hugging Face instead
All three core models (face-swap, face-match, face-blend) are published as public Hugging Face Spaces:
latentface/face-swaplatentface/face-matchlatentface/face-blend
You can evaluate model quality before signing up for the API by running a few examples on HF. For production traffic, the Latentface API is faster (dedicated GPU pool, no HF queue), cheaper (per-call vs HF's pay-for-GPU-time), and includes the SDK wrappers, error typing, and rate-limit infrastructure that running models on HF Spaces doesn't.
We publish to Hugging Face because we think developers should be able to see the models before buying. The API is the packaged, production-ready version of the same models.
Five example integrations
1. Dating app: "what would our baby look like"
def future_baby(couple_photos):
client = Latentface(api_key=os.environ["LATENTFACE_API_KEY"])
return client.blend(faces=couple_photos, weights=[0.5, 0.5])
2. E-commerce: "try these glasses on your face"
Combine swap with product overlays:
def try_glasses(selfie, model_with_glasses):
client = Latentface(api_key="...")
# Swap customer's face onto model-with-glasses
return client.swap(source=selfie, target=model_with_glasses)
3. Game studio: "NPC with player's face"
Batch processing — process a folder of NPC templates with the player's face:
for npc_template in Path("npc_templates").glob("*.png"):
result = client.swap(source=player_selfie, target=npc_template)
result.save(f"npc_output/{npc_template.stem}_as_player.png")
4. Security: identity verification
result = client.match(face_a=stored_selfie, face_b=live_camera_frame)
verified = result.similarity > 0.75 # threshold per your security policy
5. Social app: "which celebrity do you look like"
Run match against your own celebrity-embedding database (not bundled; see our /docs/guides/celebrity-db for how to build one with Latentface):
my_embedding = client.embed(image=user_selfie).vector
similarities = [
(celeb_name, cosine_similarity(my_embedding, celeb_emb))
for celeb_name, celeb_emb in my_celebrity_db.items()
]
top5 = sorted(similarities, key=lambda x: x[1], reverse=True)[:5]
Benchmark: Latentface vs the alternatives
We benchmarked Latentface against two commercial alternatives (Face++, AWS Rekognition) and the open-source baseline (running InSwapper locally on an RTX 3090) on 1,000 diverse face-swap inputs.
| Stack | Median latency | Cost per 1K calls | Setup time |
|---|---|---|---|
| Latentface Starter | 620 ms | $10 | 3 min |
| Latentface Scale | 380 ms | $10 | 3 min |
| Face++ | 1200 ms | $18 | 45 min |
| AWS Rekognition (no swap endpoint) | — | — | — |
| InSwapper local (RTX 3090) | 210 ms | $0* | 3+ hours |
*InSwapper local cost is "free" in direct API terms but assumes you own and maintain the GPU. Amortized hardware + power + maintenance is typically $0.50–$2 per 1K calls depending on your specifics.
AWS Rekognition doesn't offer a face-swap endpoint (face detection and verification only), so it's not comparable on this specific metric. For pure face-similarity matching, Rekognition is competitive but lacks the swap / blend capabilities.
Frequently asked questions
How do I add face swap to my app?
Install the Latentface SDK (pip install latentface or npm install @latentface/sdk), get an API key from the dashboard, and call client.swap(source, target). Most integrations are 3–5 lines of code. See the quickstart at /docs/quickstart for a full example.
What's the cheapest face-embedding API?
Latentface's pricing: $0.005/call for face match, $0.008/call for enhance, $0.01/call for swap, $0.015/call for blend. Free tier at 100 calls/day. Compared to Face++ ($0.018/call equivalent) and AWS Rekognition ($0.001/call for detection only, no swap), Latentface is mid-priced but includes operations the others don't offer.
Is there a Python SDK for face swap?
Yes. pip install latentface gives you a typed Python SDK with sync and async clients, type stubs for IDE autocompletion, and automatic retry on transient failures. See the Python SDK reference above. A TypeScript SDK is also available via npm install @latentface/sdk.
Is Latentface a Hugging Face alternative?
Latentface is complementary to Hugging Face, not a replacement. Our models are published as public HF Spaces, so you can evaluate them there. The Latentface API adds: sub-1s p95 latency (vs HF Spaces queue), per-call pricing (vs HF's pay-per-GPU-time), typed SDKs, and rate-limit infrastructure. For prototyping, HF Spaces is great; for production, the API is production-ready.
How much does face-swap API cost per call?
$0.01/call on the Starter tier, with the enhancement step included. Without enhancement: $0.008/call. Volume discounts apply on the Scale tier and above. See /pricing for the full table.
What's the latency on the free tier?
Same as the Starter tier (p50 620ms for swap). Free tier traffic goes through the same queue and GPU pool — the difference is the 100-calls-per-day quota and a watermark on swap/blend outputs. If you need no watermark or higher quota, Starter at $49/mo is the next step.
Can I run Latentface models on-device?
Not today. The current models are GPU-bound (RTX 3060+ equivalent for acceptable latency) and ship as ONNX weights too large for typical mobile / edge deployment. On-device support is on the 2026 Q4 roadmap, likely starting with a distilled match-only model for iOS / Android.
Do you store my uploaded images?
No. Uploaded images live in server memory for the duration of the request (typically under 1 second) and are discarded after the response is generated. We do not persist image data, and we do not use your uploaded images for model training. Full details at /privacy-policy.
Is there an open-source client?
Both SDKs are open source. Python SDK: github.com/latentface/latentface-python. TypeScript SDK: github.com/latentface/latentface-js. Both are Apache 2.0. Contributions welcome.
Ready to build?
Get your API key — free tier, no credit card →
Free tier includes 100 calls/day across all endpoints. Upgrade when you're ready for production volume. SDKs are open source; Hugging Face Spaces are public for evaluation.
Last reviewed: 2026-04-17. Latentface is a developer-first face-model API operated by OS Designers, Inc. (South Korea). See /docs/api for the full endpoint reference and /pricing for detailed pricing.