Speech to Text With Speakers

Available ActionsEach successful request consumes credits as outlined below.

transcribe_quick^100crtranscribe_standard^150crtranscribe_extended^200cr

Details

Turn any audio recording into clean, searchable text in seconds. Transcribe voice memos, meetings, interviews, podcasts, and webinars with accurate speech recognition that handles accents and background noise. Get plain text for quick reference, SRT or WebVTT subtitles for video captioning, or rich JSON output with word-level timestamps and speaker identification. Choose from three tiers based on recording length — up to 15, 30, or 60 minutes — and optionally enable speaker diarization to label who said what, profanity filtering, and alternative transcripts for maximum accuracy.

Use Cases

Transcribe meeting recordings, Generate subtitles and captions for videos, Convert voice memos to searchable text, Transcribe podcast episodes, Create interview transcripts with speaker labels, Produce SRT or WebVTT subtitle files, Build searchable audio archives, Transcribe webinars and lectures, Analyze customer call recordings, Content repurposing from audio to text

Connect Your Agent In 5 Min

Watch the setup guide for your platform

Or Install Locally

STDIO connector for Claude Code, Codex, Cursor, Zed, and other LLMs that require STDIO or custom connections. This lightweight connector routes requests to https://api.agentpmt.com/mcp. All tool execution happens in the cloud and the server cannot edit any files on your computer.

npm install -g @agentpmt/mcp-routeragentpmt-setup

curl -X POST "https://api.agentpmt.com/products/purchase" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ********" \
  -d '{
    "product_id": "69ba14e4bbfb26a6333b14d3",
    "parameters": {
      "action": "transcribe_quick"
    }
  }'

import requests
import json

url = "https://api.agentpmt.com/products/purchase"

headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer ********"
}

data = {
    "product_id": "69ba14e4bbfb26a6333b14d3",
    "parameters": {
        "action": "transcribe_quick"
    }
}

response = requests.post(url, headers=headers, json=data)
print(response.status_code)
print(response.json())

const url = "https://api.agentpmt.com/products/purchase";

const headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer ********"
};

const data = {
  product_id: "69ba14e4bbfb26a6333b14d3",
  parameters: {
    "action": "transcribe_quick"
  }
};

fetch(url, {
  method: "POST",
  headers,
  body: JSON.stringify(data)
})
  .then(response => response.json())
  .then(data => console.log(data))
  .catch(error => console.error("Error:", error));

const axios = require('axios');

const url = "https://api.agentpmt.com/products/purchase";

const headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer ********"
};

const data = {
  product_id: "69ba14e4bbfb26a6333b14d3",
  parameters: {
    "action": "transcribe_quick"
  }
};

axios.post(url, data, { headers })
  .then(response => {
    console.log(response.status);
    console.log(response.data);
  })
  .catch(error => {
    console.error("Error:", error.message);
  });

Login to view your API and budget keys. The example above uses placeholder values. Sign in to see personalized code with your bearer token.

This tool supports credit-based access for external agents using AgentAddress identities or standard crypto wallets. External agents should use the External Agent API to buy credits with x402 and invoke this tool.

1. Buy Credits

Purchase credits via x402 payment (500 credit minimum, 100 credits = $1).

# Request payment requirements (returns 402 + PAYMENT-REQUIRED header)
curl -i -s -X POST "https://www.agentpmt.com/api/external/credits/purchase" \
  -H "Content-Type: application/json" \
  -d '{ "wallet_address":"0xYOUR_WALLET", "credits": 500, "payment_method":"x402" }'

# Sign the EIP-3009 authorization, then retry with signature header
curl -s -X POST "https://www.agentpmt.com/api/external/credits/purchase" \
  -H "Content-Type: application/json" \
  -H "PAYMENT-SIGNATURE: <base64-json>" \
  -d '{ "wallet_address":"0xYOUR_WALLET", "credits": 500, "payment_method":"x402" }'

2. Create a Session Nonce (nonce used in signed balance/invoke)

curl -s -X POST "https://www.agentpmt.com/api/external/auth/session" \
  -H "Content-Type: application/json" \
  -d '{ "wallet_address":"0xYOUR_WALLET" }'

3. Invoke This Tool

Sign the message with your wallet (EIP-191 personal-sign), then POST to the invoke endpoint.

# Sign this message (wallet MUST be lowercased):
# agentpmt-external
# wallet:0xyourwallet...
# session:<session_nonce>
# request:<request_id>
# action:invoke
# product:69ba14e4bbfb26a6333b14d3
# payload:<sha256(canonical_json(parameters))>

curl -s -X POST "https://www.agentpmt.com/api/external/tools/69ba14e4bbfb26a6333b14d3/invoke" \
  -H "Content-Type: application/json" \
  -d '{
    "wallet_address": "0xYOUR_WALLET",
    "session_nonce": "<session_nonce>",
    "request_id": "invoke-uuid",
    "signature": "0x<signature>",
    "parameters": {
      "action": "your_action",
      "your_param": "value"
    }
  }'

Actions(3)

transcribe_quick^100cr8 params

Transcribe audio up to 15 minutes.

file_idstring

File ID from a prior upload.

public_urlstring

HTTPS URL to a downloadable audio file.

language_codestring

Optional BCP-47 language code such as en-US; defaults to en-US if omitted.

output_formatstring

Output format for the transcription result.

Values:

textsrtvttjson

enable_diarizationboolean

Enable speaker diarization when supported by the audio and model.

enable_word_timestampsboolean

Include word-level timing data in the output.

enable_profanity_filterboolean

Mask profanity in the returned transcript.

max_alternativesinteger

Maximum number of alternative transcripts to return.

Range: 1 - 5

transcribe_standard^150cr8 params

Transcribe audio up to 30 minutes.

file_idstring

File ID from a prior upload.

public_urlstring

HTTPS URL to a downloadable audio file.

language_codestring

Optional BCP-47 language code such as en-US; defaults to en-US if omitted.

output_formatstring

Output format for the transcription result.

Values:

textsrtvttjson

enable_diarizationboolean

Enable speaker diarization when supported by the audio and model.

enable_word_timestampsboolean

Include word-level timing data in the output.

enable_profanity_filterboolean

Mask profanity in the returned transcript.

max_alternativesinteger

Maximum number of alternative transcripts to return.

Range: 1 - 5

transcribe_extended^200cr8 params

Transcribe audio up to 60 minutes.

file_idstring

File ID from a prior upload.

public_urlstring

HTTPS URL to a downloadable audio file.

language_codestring

Optional BCP-47 language code such as en-US; defaults to en-US if omitted.

output_formatstring

Output format for the transcription result.

Values:

textsrtvttjson

enable_diarizationboolean

Enable speaker diarization when supported by the audio and model.

enable_word_timestampsboolean

Include word-level timing data in the output.

enable_profanity_filterboolean

Mask profanity in the returned transcript.

max_alternativesinteger

Maximum number of alternative transcripts to return.

Range: 1 - 5

Usage Instructions

Usage guidance provided directly by the developer for this product.

Speech to Text

Name: Speech to Text With Speakers
Brand: Apoth3osis
SKU: 69ba14e4bbfb26a6333b14d3
Price: 1.00 USD
Availability: InStock

Transcribe audio with one tool and choose the action that matches the upload length.

Tool Call Format

{
  "action": "get_instructions"
}

{
  "action": "transcribe_quick",
  "file_id": "FILE_ID",
  "language_code": "en-US",
  "output_format": "text"
}

{
  "action": "transcribe_standard",
  "public_url": "https://example.com/meeting.m4a",
  "output_format": "vtt",
  "enable_word_timestamps": true,
  "enable_diarization": true
}

{
  "action": "transcribe_extended",
  "public_url": "https://example.com/interview.webm",
  "output_format": "json",
  "max_alternatives": 2
}

Actions

transcribe_quick: audio up to 15 minutes. Price: 100 credits.
transcribe_standard: audio up to 30 minutes. Price: 150 credits.
transcribe_extended: audio up to 60 minutes. Price: 200 credits.

Notes

Provide either file_id or public_url.
public_url must be an HTTPS URL and cannot point to private or internal network addresses.
If language_code is omitted, the tool defaults to en-US.
Supported output formats: text, srt, vtt, json.
Optional controls: enable_diarization, enable_word_timestamps, enable_profanity_filter, max_alternatives.
Subtitle responses include inline subtitle content and may also include stored file links during normal platform invocations.

About this Product

About The Developer

Apoth3osis

★15 stars

Joined Agent Payment: August 14, 2025

We build tools that enable AI agents to excel in the mathematical realm.

Our small team develops experimental and unique solutions in the AI arena, with a strong focus on modular computing for agentic applications and custom model deployment. We have handled projects for a variety of applications across many sectors, from algorithmic trading and financial analysis, to molecular simulations and predictions, to habitat and biodiversity monitoring and wildlife conservation.

Speech to Text With Speakers

Description