Fluency Assessment API
The Fluency Assessment API provides fast, accurate analysis of a child's reading fluency. The API processes an audio recording of a child reading a passage and returns detailed analysis including accuracy, speed, and specific error patterns.
Quick Start
Latest Version (V2 - Recommended)
curl https://api.kidsmart.ai/v2/audio/fluency \
-H "x-api-key: $KIDSMART_API_KEY" \
-F "file=@$AUDIO_FILE" \
-F "user_token=$USER_ID" \
-F "reference_text=@$REFERENCE_TEXT" \
-F "model_id=$MODEL_ID" \
-H "Content-Type: multipart/form-data"
V2 Features
- Enhanced phoneme-level analysis
- Improved accuracy in error detection
- Detailed phoneme substitution patterns
- Built-in webhook support
Input Parameters
Name | Type | Description |
---|---|---|
x-api-key | header | Your API authentication key |
file | field | Audio file (wav), max 2 minutes |
reference_text | field | Expected text in UTF-8 format |
user_token | field | Unique identifier for the speaker |
model_id | field | Model ID (from Kid Smart AI) |
webhook_url | field | (Optional) URL for receiving results via webhook |
Response Structure
The V2 API returns a detailed analysis in JSON format:
Response Structure
The V2 API returns a detailed analysis in JSON format:
{
"audio_duration": 58.97,
"user_id": "USER_123",
"language_code": "EN",
"assessment_id": "391fe358-dcff-45e5-bbe7-7b318b70a5c9",
"input_timestamp": "2025-02-19T20:29:48.173424+00:00",
"summary": {
"equal": 66,
"replace": 35,
"insert": 1,
"delete": 44
},
"wpm": 69.84,
"accuracyScore": 0.4552,
"predicted_text": "at lunch N EY T T,S friends talked about"
}
Base Response Fields
Field | Description |
---|---|
audio_duration | The duration of the audio file in seconds |
user_id | The unique identifier for the user |
language_code | The language contained in the audio file being analyzed |
assessment_id | A unique identifier for the assessment |
input_timestamp | The UTC timestamp when the input was received |
summary | Object containing word match statistics (equal, replace, insert, delete counts) |
wpm | Words per minute reading speed |
accuracyScore | Overall accuracy score of the reading (0-1 range) |
predicted_text | Predicted words and phonemes |
Detailed Analysis Fields
The API also provides detailed analysis through specific fields:
1. Details Object
Contains an ordered list of reading events with timing:
{
"details": [
{
"reference": ["at", "lunch"],
"prediction": ["at", "lunch"],
"start": 1,
"end": 2.36,
"type": "equal"
},
{
"reference": ["matts"],
"prediction": ["nates"],
"type": "replace",
"phoneme_analysis": "N EY T T,S",
"mispronounced": [
{
"AE": {
"replace": ["EY"],
"confidence": "low"
}
}
],
"start": 2.6,
"end": 3.12
}
]
}
Field | Description |
---|---|
reference | Array of expected words from the reference text |
prediction | Array of words actually spoken (word utterance predicted by AI) |
start | Start time of the speech segment in seconds |
end | End time of the speech segment in seconds |
type | Type of match (equal, replace, insert, delete) |
phoneme_analysis | Phonetic analysis of utterance predicted by AI (provided only for mismatch of type "replace") |
mispronounced | Detailed breakdown of pronunciation errors (provided only for mismatch of type "replace") |
2. Phoneme Summary Object
Analysis of pronunciation patterns. When a word is replaced, there are two common errors. Complete replacement, where the word that was uttered is completely different than the written word. The other error is phoneme, where one or a few of the phonemes were uttered correctly, and others were not. The phoneme replacements are summarized in the "common patters".
{
"phonemeSummary": {
"complete_replacements": { // all of the details of this are in the "details" section of the response
"words": {
"go": 1, // go was completely replaced once
"now": 1 // now was completely replaced once
},
"total_count": 2 // there were 2 words completely replaced
},
"confidence_levels": { // for phoneme replacement, there were 2 phoneme replaced in other words
"AE": { // AE was replaced twice, once with high confidence and onces with low confidence
"low": 1,
"high": 1
},
"AH": { // AH was replaced once, with high confidence
"high": 1
}
},
"common_patterns": {
"AE->EY": 2, // AE was replaced with EY twice
"AH->V": 1 // AH was replaced with V once
}
}
}
Field | Description |
---|---|
complete_replacements.words | Dictionary of completely replaced words and their counts |
complete_replacements.total_count | Total number of complete word replacements |
confidence_levels | Confidence levels for each phoneme analysis |
common_patterns | Common phoneme replacement patterns observed |
V1 (Legacy)
⚠️ Note: V1 is maintained for backward compatibility. New implementations should use V2.
Endpoint
POST https://api.kidsmart.ai/v1/audio/fluency
[Previous V1 documentation content remains the same...]
Best Practices
-
Audio Quality
- Use a headset in noisy environments (like typical classrooms)
- If you cannot make out the words in the recording, neither can Kid Smart AI
- Ensure clear audio recording
- Keep recordings under 2 minutes
-
Assessment Guidelines
- Allow children opportunities to self-correct while reading
- Monitor assessment_id for status
- Use exponential backoff for retries
-
Error Handling
- Implement webhook error handling
- Use exponential backoff for retries
- Monitor assessment_id for status
-
Performance Optimization
- Process results asynchronously
Common Issues & Solutions
-
Poor Recognition
- Check audio quality
- Verify reference text format
- Ensure proper microphone placement
-
Slow Processing
- Use webhook callbacks
- Optimize audio file size
- Check network connectivity
-
Inconsistent Results
- Standardize recording environment
- Maintain consistent audio levels
- Use recommended audio formats
Migration Guide (V1 to V2)
- Update endpoint URL to V2
- Add webhook support if needed
- Update response parsing for new format
- Test with sample recordings
- Monitor error patterns in new format
For support or questions, contact support@kidsmart.ai