Skip to main content

Fluency Assessment API

The Fluency Assessment API provides fast, accurate analysis of a child's reading fluency. The API processes an audio recording of a child reading a passage and returns detailed analysis including accuracy, speed, and specific error patterns.

Quick Start

curl https://api.kidsmart.ai/v2/audio/fluency \
-H "x-api-key: $KIDSMART_API_KEY" \
-F "file=@$AUDIO_FILE" \
-F "user_token=$USER_ID" \
-F "reference_text=@$REFERENCE_TEXT" \
-F "model_id=$MODEL_ID" \
-H "Content-Type: multipart/form-data"

V2 Features

  • Enhanced phoneme-level analysis
  • Improved accuracy in error detection
  • Detailed phoneme substitution patterns
  • Built-in webhook support

Input Parameters

NameTypeDescription
x-api-keyheaderYour API authentication key
filefieldAudio file (wav), max 2 minutes
reference_textfieldExpected text in UTF-8 format
user_tokenfieldUnique identifier for the speaker
model_idfieldModel ID (from Kid Smart AI)
webhook_urlfield(Optional) URL for receiving results via webhook

Response Structure

The V2 API returns a detailed analysis in JSON format:

Response Structure

The V2 API returns a detailed analysis in JSON format:

{
"audio_duration": 58.97,
"user_id": "USER_123",
"language_code": "EN",
"assessment_id": "391fe358-dcff-45e5-bbe7-7b318b70a5c9",
"input_timestamp": "2025-02-19T20:29:48.173424+00:00",
"summary": {
"equal": 66,
"replace": 35,
"insert": 1,
"delete": 44
},
"wpm": 69.84,
"accuracyScore": 0.4552,
"predicted_text": "at lunch N EY T T,S friends talked about"
}

Base Response Fields

FieldDescription
audio_durationThe duration of the audio file in seconds
user_idThe unique identifier for the user
language_codeThe language contained in the audio file being analyzed
assessment_idA unique identifier for the assessment
input_timestampThe UTC timestamp when the input was received
summaryObject containing word match statistics (equal, replace, insert, delete counts)
wpmWords per minute reading speed
accuracyScoreOverall accuracy score of the reading (0-1 range)
predicted_textPredicted words and phonemes

Detailed Analysis Fields

The API also provides detailed analysis through specific fields:

1. Details Object

Contains an ordered list of reading events with timing:

{
"details": [
{
"reference": ["at", "lunch"],
"prediction": ["at", "lunch"],
"start": 1,
"end": 2.36,
"type": "equal"
},
{
"reference": ["matts"],
"prediction": ["nates"],
"type": "replace",
"phoneme_analysis": "N EY T T,S",
"mispronounced": [
{
"AE": {
"replace": ["EY"],
"confidence": "low"
}
}
],
"start": 2.6,
"end": 3.12
}
]
}
FieldDescription
referenceArray of expected words from the reference text
predictionArray of words actually spoken (word utterance predicted by AI)
startStart time of the speech segment in seconds
endEnd time of the speech segment in seconds
typeType of match (equal, replace, insert, delete)
phoneme_analysisPhonetic analysis of utterance predicted by AI (provided only for mismatch of type "replace")
mispronouncedDetailed breakdown of pronunciation errors (provided only for mismatch of type "replace")
2. Phoneme Summary Object

Analysis of pronunciation patterns. When a word is replaced, there are two common errors. Complete replacement, where the word that was uttered is completely different than the written word. The other error is phoneme, where one or a few of the phonemes were uttered correctly, and others were not. The phoneme replacements are summarized in the "common patters".

{
"phonemeSummary": {
"complete_replacements": { // all of the details of this are in the "details" section of the response
"words": {
"go": 1, // go was completely replaced once
"now": 1 // now was completely replaced once
},
"total_count": 2 // there were 2 words completely replaced
},
"confidence_levels": { // for phoneme replacement, there were 2 phoneme replaced in other words
"AE": { // AE was replaced twice, once with high confidence and onces with low confidence
"low": 1,
"high": 1
},
"AH": { // AH was replaced once, with high confidence
"high": 1
}
},
"common_patterns": {
"AE->EY": 2, // AE was replaced with EY twice
"AH->V": 1 // AH was replaced with V once
}
}
}
FieldDescription
complete_replacements.wordsDictionary of completely replaced words and their counts
complete_replacements.total_countTotal number of complete word replacements
confidence_levelsConfidence levels for each phoneme analysis
common_patternsCommon phoneme replacement patterns observed

V1 (Legacy)

⚠️ Note: V1 is maintained for backward compatibility. New implementations should use V2.

Endpoint

POST https://api.kidsmart.ai/v1/audio/fluency

[Previous V1 documentation content remains the same...]

Best Practices

  1. Audio Quality

    • Use a headset in noisy environments (like typical classrooms)
    • If you cannot make out the words in the recording, neither can Kid Smart AI
    • Ensure clear audio recording
    • Keep recordings under 2 minutes
  2. Assessment Guidelines

    • Allow children opportunities to self-correct while reading
    • Monitor assessment_id for status
    • Use exponential backoff for retries
  3. Error Handling

    • Implement webhook error handling
    • Use exponential backoff for retries
    • Monitor assessment_id for status
  4. Performance Optimization

    • Process results asynchronously

Common Issues & Solutions

  1. Poor Recognition

    • Check audio quality
    • Verify reference text format
    • Ensure proper microphone placement
  2. Slow Processing

    • Use webhook callbacks
    • Optimize audio file size
    • Check network connectivity
  3. Inconsistent Results

    • Standardize recording environment
    • Maintain consistent audio levels
    • Use recommended audio formats

Migration Guide (V1 to V2)

  1. Update endpoint URL to V2
  2. Add webhook support if needed
  3. Update response parsing for new format
  4. Test with sample recordings
  5. Monitor error patterns in new format

For support or questions, contact support@kidsmart.ai