Skip to main content

Fluency Assessment API

The Fluency Assessment API provides fast, accurate analysis of a child's reading fluency. The API processes an audio recording of a child reading a passage and returns detailed analysis including accuracy, speed, and specific error patterns.

Quick Start

curl https://api.kidsmart.ai/v2/audio/fluency \
-H "x-api-key: $KIDSMART_API_KEY" \
-F "file=@$AUDIO_FILE" \
-F "user_token=$USER_ID" \
-F "reference_text=@$REFERENCE_TEXT" \
-F "model_id=$MODEL_ID" \
-H "Content-Type: multipart/form-data"

V2 Features

  • Enhanced phoneme-level analysis
  • Improved accuracy in error detection
  • Detailed phoneme substitution patterns
  • Built-in webhook support

Input Parameters

NameTypeDescription
x-api-keyheaderYour API authentication key
filefieldAudio file (wav), max 2 minutes
reference_textfieldExpected text in UTF-8 format
user_tokenfieldUnique identifier for the speaker
model_idfieldModel ID (from Kid Smart AI)
webhook_urlfield(Optional) URL for receiving results via webhook

Response Structure

The V2 API returns a detailed analysis in JSON format:

Response Structure

The V2 API returns a detailed analysis in JSON format:

{
"audio_duration": 58.97,
"user_id": "USER_123",
"language_code": "EN",
"assessment_id": "391fe358-dcff-45e5-bbe7-7b318b70a5c9",
"input_timestamp": "2025-02-19T20:29:48.173424+00:00",
"summary": {
"equal": 66,
"replace": 35,
"insert": 1,
"delete": 44
},
"wpm": 69.84,
"accuracy_score": 46,
"transcription": "at lunch N EY T T,S friends talked about",
"phoneme_summary":
{"phoneme_replacements": {"M->N": 1, "total_mispronunciations": 1, "AE->EY": 1}, "complete_substitutions": {"total_count": 0, "reference_word_substitution": []}, "unknown_words": {}},
"updated_at": "2025-02-19T20:31:48.173424+00:00"
}

Base Response Fields

FieldDescription
audio_durationThe duration of the audio file in seconds
user_idThe unique identifier for the user
language_codeThe language contained in the audio file being analyzed
assessment_idA unique identifier for the assessment
input_timestampThe UTC timestamp when the input was received
summaryObject containing word match statistics (equal, replace, insert, delete counts)
wpmWords per minute reading speed
accuracy_scoreOverall accuracy score of the reading (0-100 range)
transcriptionPredicted words and phonemes
updated_atThe UTC timestamp of when the analysis was completed

Detailed Analysis Fields

The API also provides detailed analysis through specific fields:

1. Details Object

Contains an ordered list of reading events with timing:

{
"details": [
{
"reference": ["at", "lunch"],
"prediction": ["at", "lunch"],
"start": 1,
"end": 2.36,
"type": "equal"
},
{
"reference": ["matts"],
"prediction": ["nates"],
"type": "replace",
"reference_phonemes": ["M AE T S"],
"aligned_result": [
{"reference": "M", "prediction": "N", "orig_index": 0, "type": "replace", "prediction_type": "phoneme"},
{"reference": "AE", "prediction": "IY", "orig_index": 1, "type": "replace", "prediction_type": "phoneme"},
{"reference": "T", "prediction": "T", "orig_index": 2, "type": "equal", "prediction_type": "phoneme"},
{"reference": "S", "prediction": "S", "orig_index": 3, "type": "equal"}
],
"start": 2.6,
"end": 3.12
}
]
}
FieldDescription
referenceArray of expected words from the reference text
predictionArray of words actually spoken (word utterance predicted by AI)
startStart time of the speech segment in seconds
endEnd time of the speech segment in seconds
typeType of match (equal, replace, insert, delete)
reference_phonemesOur arpabet translation of the reference word phonemes
aligned_resultDetailed breakdown of pronunciation errors (provided only for mismatch of type "replace")
2. Phoneme Summary Object

The phoneme_summary object provides a high-level summary of pronunciation patterns and errors detected during the assessment. It includes: phoneme_replacements: A dictionary where each key is a phoneme substitution pattern (e.g., "M->N" means the phoneme "M" was replaced with "N"), and the value is the count of occurrences. It also includes a total_mispronunciations key for the total number of phoneme-level mispronunciations. complete_substitutions: Information about words that were completely substituted (i.e., the spoken word was entirely different from the reference word). Contains: total_count: The total number of complete word substitutions. reference_word_substitution: A list of objects, each mapping a reference word to the number of times it was completely substituted. unknown_words: A dictionary of words in the prediction that were not recognized in the reference text, with their counts.

"phoneme_summary": {
"phoneme_replacements": {
"M->N": 1,
"AE->EY": 1,
"total_mispronunciations": 2
},
"complete_substitutions": {
"total_count": 0,
"reference_word_substitution": []
},
"unknown_words": {}
}
FieldDescription
phoneme_replacementsDictionary of phoneme substitution patterns and their counts, plus total_mispronunciations.
complete_substitutionsObject with total_count and reference_word_substitution list for complete word swaps.
unknown_wordsDictionary of unrecognized words (words that are not in our phoneme dictionary and their counts. Please contact us if you want to add words to our phoneme dictionary. Currently, if we do not have a word in our phoneme dictionary, if an attempt was made, we count it correct.

V1 (Legacy)

⚠️ Note: V1 is maintained for backward compatibility. New implementations should use V2.

Endpoint

POST https://api.kidsmart.ai/v1/audio/fluency

[Previous V1 documentation content remains the same...]

Best Practices

  1. Audio Quality

    • Use a headset in noisy environments (like typical classrooms)
    • If you cannot make out the words in the recording, neither can Kid Smart AI
    • Ensure clear audio recording
    • Keep recordings under 2 minutes
  2. Assessment Guidelines

    • Allow children opportunities to self-correct while reading
    • Monitor assessment_id for status
    • Use exponential backoff for retries
  3. Error Handling

    • Implement webhook error handling
    • Use exponential backoff for retries
    • Monitor assessment_id for status
  4. Performance Optimization

    • Process results asynchronously

Common Issues & Solutions

  1. Poor Recognition

    • Check audio quality
    • Verify reference text format
    • Ensure proper microphone placement
  2. Slow Processing

    • Use webhook callbacks
    • Optimize audio file size
    • Check network connectivity
  3. Inconsistent Results

    • Standardize recording environment
    • Maintain consistent audio levels
    • Use recommended audio formats

Migration Guide (V1 to V2)

  1. Update endpoint URL to V2
  2. Add webhook support if needed
  3. Update response parsing for new format
  4. Test with sample recordings
  5. Monitor error patterns in new format

For support or questions, contact support@kidsmart.ai