Fluency Assessment API
The Fluency Assessment API provides fast, accurate analysis of a child's reading fluency. The API processes an audio recording of a child reading a passage and returns detailed analysis including accuracy, speed, and specific error patterns.
Quick Start
Latest Version (V2 - Recommended)
curl https://api.kidsmart.ai/v2/audio/fluency \
-H "x-api-key: $KIDSMART_API_KEY" \
-F "file=@$AUDIO_FILE" \
-F "user_token=$USER_ID" \
-F "reference_text=@$REFERENCE_TEXT" \
-F "model_id=$MODEL_ID" \
-H "Content-Type: multipart/form-data"
V2 Features
- Enhanced phoneme-level analysis
- Improved accuracy in error detection
- Detailed phoneme substitution patterns
- Built-in webhook support
Input Parameters
Name | Type | Description |
---|---|---|
x-api-key | header | Your API authentication key |
file | field | Audio file (wav), max 2 minutes |
reference_text | field | Expected text in UTF-8 format |
user_token | field | Unique identifier for the speaker |
model_id | field | Model ID (from Kid Smart AI) |
webhook_url | field | (Optional) URL for receiving results via webhook |
Response Structure
The V2 API returns a detailed analysis in JSON format:
Response Structure
The V2 API returns a detailed analysis in JSON format:
{
"audio_duration": 58.97,
"user_id": "USER_123",
"language_code": "EN",
"assessment_id": "391fe358-dcff-45e5-bbe7-7b318b70a5c9",
"input_timestamp": "2025-02-19T20:29:48.173424+00:00",
"summary": {
"equal": 66,
"replace": 35,
"insert": 1,
"delete": 44
},
"wpm": 69.84,
"accuracy_score": 46,
"transcription": "at lunch N EY T T,S friends talked about",
"phoneme_summary":
{"phoneme_replacements": {"M->N": 1, "total_mispronunciations": 1, "AE->EY": 1}, "complete_substitutions": {"total_count": 0, "reference_word_substitution": []}, "unknown_words": {}},
"updated_at": "2025-02-19T20:31:48.173424+00:00"
}
Base Response Fields
Field | Description |
---|---|
audio_duration | The duration of the audio file in seconds |
user_id | The unique identifier for the user |
language_code | The language contained in the audio file being analyzed |
assessment_id | A unique identifier for the assessment |
input_timestamp | The UTC timestamp when the input was received |
summary | Object containing word match statistics (equal, replace, insert, delete counts) |
wpm | Words per minute reading speed |
accuracy_score | Overall accuracy score of the reading (0-100 range) |
transcription | Predicted words and phonemes |
updated_at | The UTC timestamp of when the analysis was completed |
Detailed Analysis Fields
The API also provides detailed analysis through specific fields:
1. Details Object
Contains an ordered list of reading events with timing:
{
"details": [
{
"reference": ["at", "lunch"],
"prediction": ["at", "lunch"],
"start": 1,
"end": 2.36,
"type": "equal"
},
{
"reference": ["matts"],
"prediction": ["nates"],
"type": "replace",
"reference_phonemes": ["M AE T S"],
"aligned_result": [
{"reference": "M", "prediction": "N", "orig_index": 0, "type": "replace", "prediction_type": "phoneme"},
{"reference": "AE", "prediction": "IY", "orig_index": 1, "type": "replace", "prediction_type": "phoneme"},
{"reference": "T", "prediction": "T", "orig_index": 2, "type": "equal", "prediction_type": "phoneme"},
{"reference": "S", "prediction": "S", "orig_index": 3, "type": "equal"}
],
"start": 2.6,
"end": 3.12
}
]
}
Field | Description |
---|---|
reference | Array of expected words from the reference text |
prediction | Array of words actually spoken (word utterance predicted by AI) |
start | Start time of the speech segment in seconds |
end | End time of the speech segment in seconds |
type | Type of match (equal, replace, insert, delete) |
reference_phonemes | Our arpabet translation of the reference word phonemes |
aligned_result | Detailed breakdown of pronunciation errors (provided only for mismatch of type "replace") |
2. Phoneme Summary Object
The phoneme_summary object provides a high-level summary of pronunciation patterns and errors detected during the assessment. It includes: phoneme_replacements: A dictionary where each key is a phoneme substitution pattern (e.g., "M->N" means the phoneme "M" was replaced with "N"), and the value is the count of occurrences. It also includes a total_mispronunciations key for the total number of phoneme-level mispronunciations. complete_substitutions: Information about words that were completely substituted (i.e., the spoken word was entirely different from the reference word). Contains: total_count: The total number of complete word substitutions. reference_word_substitution: A list of objects, each mapping a reference word to the number of times it was completely substituted. unknown_words: A dictionary of words in the prediction that were not recognized in the reference text, with their counts.
"phoneme_summary": {
"phoneme_replacements": {
"M->N": 1,
"AE->EY": 1,
"total_mispronunciations": 2
},
"complete_substitutions": {
"total_count": 0,
"reference_word_substitution": []
},
"unknown_words": {}
}
Field | Description |
---|---|
phoneme_replacements | Dictionary of phoneme substitution patterns and their counts, plus total_mispronunciations. |
complete_substitutions | Object with total_count and reference_word_substitution list for complete word swaps. |
unknown_words | Dictionary of unrecognized words (words that are not in our phoneme dictionary and their counts. Please contact us if you want to add words to our phoneme dictionary. Currently, if we do not have a word in our phoneme dictionary, if an attempt was made, we count it correct. |
V1 (Legacy)
⚠️ Note: V1 is maintained for backward compatibility. New implementations should use V2.
Endpoint
POST https://api.kidsmart.ai/v1/audio/fluency
[Previous V1 documentation content remains the same...]
Best Practices
-
Audio Quality
- Use a headset in noisy environments (like typical classrooms)
- If you cannot make out the words in the recording, neither can Kid Smart AI
- Ensure clear audio recording
- Keep recordings under 2 minutes
-
Assessment Guidelines
- Allow children opportunities to self-correct while reading
- Monitor assessment_id for status
- Use exponential backoff for retries
-
Error Handling
- Implement webhook error handling
- Use exponential backoff for retries
- Monitor assessment_id for status
-
Performance Optimization
- Process results asynchronously
Common Issues & Solutions
-
Poor Recognition
- Check audio quality
- Verify reference text format
- Ensure proper microphone placement
-
Slow Processing
- Use webhook callbacks
- Optimize audio file size
- Check network connectivity
-
Inconsistent Results
- Standardize recording environment
- Maintain consistent audio levels
- Use recommended audio formats
Migration Guide (V1 to V2)
- Update endpoint URL to V2
- Add webhook support if needed
- Update response parsing for new format
- Test with sample recordings
- Monitor error patterns in new format
For support or questions, contact support@kidsmart.ai