Word Recognition API
Word Recognition Assessment
- Please see playground examples to see the differences.
Recognition API Request (POST)
curl https://api.kidsmart.ai/v1/audio/recognition \
-H "x-api-key:$KIDSMART_API_KEY" \
-F "file=@$AUDIO_FILE" \
-F "user_token=$USER_ID" \
-F "reference_text=@$REFERENCE_TEXT" \
-F "model_id=$MODEL_ID"
-H "Content-Type: multipart/form-data"
Name | Type | Description |
---|---|---|
x-api-key | header | Your API authentication key |
file | field | Audio file (WAV format, max 30 seconds duration) |
reference_text | field | Expected word or phrase for recognition |
user_token | field | Unique identifier for the speaker |
model_id | field | Model ID (from Kid Smart AI) |
webhook_url | field | (Optional) URL to receive results via webhook |
Recognition API Response Structure
The API returns a JSON response containing the recognition analysis:
{
"assessment_id": "b2e4df18-fdee-4b07-a687-3ef56abad050",
"reference_text": "plume",
"feedback": "feedback feature coming soon",
"audio_duration": 1.7066666666666668,
"model_id": "continuous_medium",
"user_id": "USER_ID",
"prediction": "plum",
"phoneme_details": [
{"phoneme": "P", "original": null, "timestamp": 1.18},
{"phoneme": "L", "original": null, "timestamp": 1.37},
{"phoneme": "AH", "original": null, "timestamp": 1.43},
{"phoneme": "M", "original": null, "timestamp": 1.63}
],
"correct": false,
"confidence": "High"
}
Field | Description |
---|---|
assessment_id | Unique identifier for this assessment |
reference_text | The expected word/phrase that was tested against |
prediction | The word/phrase that was recognized in the audio |
correct | Boolean indicating if the pronunciation was correct |
confidence | Confidence level of the recognition (High/Medium/Low) |
phoneme_details | If the child uttered the word or phrase incorrectly, the detailed breakdown of recognized phonemes and timing |
audio_duration | Length of the audio file in seconds |
feedback | Additional feedback about the recognition (if any, coming soon) |
If correct == True (the child uttered the word or phrase correctly), the phoneme details are not returned.
New Feature: Webhooks
All audio endpoints now support webhooks for asynchronous result delivery. Add the optional webhook_url
parameter to receive results via POST callback instead of polling:
-F "webhook_url=https://your-domain.com/webhook-endpoint"
When a webhook URL is provided, the API response will include a webhook notification:
{
"id": "e09ecf55-36b5-4936-83f4-ff3439223ed4",
"webhook_notification": "Results will be sent to the provided webhook URL upon completion",
"url": "https://api.kidsmart.ai/v1/audio/recognition/result/e09ecf55-36b5-4936-83f4-ff3439223ed4/"
}
Webhook responses are typically delivered within 30 seconds of the initial request. See webhooks documentation for more details