Word Recognition API

Word Recognition Assessment

Please see playground examples to see the differences.

Recognition API Request (POST)

curl https://api.kidsmart.ai/v1/audio/recognition \
    -H "x-api-key:$KIDSMART_API_KEY" \
    -F "file=@$AUDIO_FILE" \
    -F "user_token=$USER_ID" \
    -F "reference_text=@$REFERENCE_TEXT" \
    -F "model_id=$MODEL_ID" 
    -H "Content-Type: multipart/form-data"

Name	Type	Description
`x-api-key`	header	Your API authentication key
`file`	field	Audio file (WAV format, max 30 seconds duration)
`reference_text`	field	Expected word or phrase for recognition
`user_token`	field	Unique identifier for the speaker
`model_id`	field	Model ID (from Kid Smart AI)
`webhook_url`	field	(Optional) URL to receive results via webhook

Recognition API Response Structure

The API returns a JSON response containing the recognition analysis:

{
  "assessment_id": "b2e4df18-fdee-4b07-a687-3ef56abad050",
  "reference_text": "plume",
  "feedback": "feedback feature coming soon",
  "audio_duration": 1.7066666666666668,
  "model_id": "continuous_medium",
  "user_id": "USER_ID",
  "prediction": "plum",
  "phoneme_details": [
    {"phoneme": "P", "original": null, "timestamp": 1.18},
    {"phoneme": "L", "original": null, "timestamp": 1.37},
    {"phoneme": "AH", "original": null, "timestamp": 1.43},
    {"phoneme": "M", "original": null, "timestamp": 1.63}
  ],
  "correct": false,
  "confidence": "High"
}

Field	Description
`assessment_id`	Unique identifier for this assessment
`reference_text`	The expected word/phrase that was tested against
`prediction`	The word/phrase that was recognized in the audio
`correct`	Boolean indicating if the pronunciation was correct
`confidence`	Confidence level of the recognition (High/Medium/Low)
`phoneme_details`	If the child uttered the word or phrase incorrectly, the detailed breakdown of recognized phonemes and timing
`audio_duration`	Length of the audio file in seconds
`feedback`	Additional feedback about the recognition (if any, coming soon)

If correct == True (the child uttered the word or phrase correctly), the phoneme details are not returned.

New Feature: Webhooks

All audio endpoints now support webhooks for asynchronous result delivery. Add the optional webhook_url parameter to receive results via POST callback instead of polling:

-F "webhook_url=https://your-domain.com/webhook-endpoint"

When a webhook URL is provided, the API response will include a webhook notification:

{
  "id": "e09ecf55-36b5-4936-83f4-ff3439223ed4",
  "webhook_notification": "Results will be sent to the provided webhook URL upon completion",
  "url": "https://api.kidsmart.ai/v1/audio/recognition/result/e09ecf55-36b5-4936-83f4-ff3439223ed4/"
}

Webhook responses are typically delivered within 30 seconds of the initial request. See webhooks documentation for more details

Word Recognition API

Word Recognition Assessment​

Recognition API Request (POST)​

Recognition API Response Structure​

New Feature: Webhooks​

Word Recognition Assessment

Recognition API Request (POST)

Recognition API Response Structure

New Feature: Webhooks