Fluency Assessment API

The Fluency Assessment API provides fast, accurate analysis of a child's reading fluency. The API processes an audio recording of a child reading a passage and returns detailed analysis including accuracy, speed, and specific error patterns.

Quick Start

Latest Version (V2 - Recommended)

curl https://api.kidsmart.ai/v2/audio/fluency \
    -H "x-api-key: $KIDSMART_API_KEY" \
    -F "file=@$AUDIO_FILE" \
    -F "user_token=$USER_ID" \
    -F "reference_text=@$REFERENCE_TEXT" \
    -F "model_id=$MODEL_ID" \
    -H "Content-Type: multipart/form-data"

V2 Features

Enhanced phoneme-level analysis
Improved accuracy in error detection
Detailed phoneme substitution patterns
Built-in webhook support

Input Parameters

Name	Type	Description
`x-api-key`	header	Your API authentication key
`file`	field	Audio file (wav), max 2 minutes
`reference_text`	field	Expected text in UTF-8 format
`user_token`	field	Unique identifier for the speaker
`model_id`	field	Model ID (from Kid Smart AI)
`webhook_url`	field	(Optional) URL for receiving results via webhook

Response Structure

The V2 API returns a detailed analysis in JSON format:

Response Structure

The V2 API returns a detailed analysis in JSON format:

{
  "audio_duration": 58.97,
  "user_id": "USER_123",
  "language_code": "EN",
  "assessment_id": "391fe358-dcff-45e5-bbe7-7b318b70a5c9",
  "input_timestamp": "2025-02-19T20:29:48.173424+00:00",
  "summary": {
    "equal": 66,    
    "replace": 35, 
    "insert": 1,   
    "delete": 44    
  },
  "wpm": 69.84,          
  "accuracy_score": 46, 
  "transcription": "at lunch N EY T T,S friends talked about",
  "phoneme_summary": 
    {"phoneme_replacements": {"M->N": 1, "total_mispronunciations": 1, "AE->EY": 1}, "complete_substitutions": {"total_count": 0, "reference_word_substitution": []}, "unknown_words": {}},
  "updated_at": "2025-02-19T20:31:48.173424+00:00"
}

Base Response Fields

Field	Description
`audio_duration`	The duration of the audio file in seconds
`user_id`	The unique identifier for the user
`language_code`	The language contained in the audio file being analyzed
`assessment_id`	A unique identifier for the assessment
`input_timestamp`	The UTC timestamp when the input was received
`summary`	Object containing word match statistics (equal, replace, insert, delete counts)
`wpm`	Words per minute reading speed
`accuracy_score`	Overall accuracy score of the reading (0-100 range)
`transcription`	Predicted words and phonemes
`updated_at`	The UTC timestamp of when the analysis was completed

Detailed Analysis Fields

The API also provides detailed analysis through specific fields:

1. Details Object

Contains an ordered list of reading events with timing:

{
  "details": [
    {
      "reference": ["at", "lunch"],
      "prediction": ["at", "lunch"],
      "start": 1,
      "end": 2.36,
      "type": "equal"
    },
    {
      "reference": ["matts"],
      "prediction": ["nates"],
      "type": "replace",
      "reference_phonemes": ["M AE T S"],
      "aligned_result": [
          {"reference": "M", "prediction": "N", "orig_index": 0, "type": "replace", "prediction_type": "phoneme"}, 
          {"reference": "AE", "prediction": "IY", "orig_index": 1, "type": "replace", "prediction_type": "phoneme"}, 
          {"reference": "T", "prediction": "T", "orig_index": 2, "type": "equal", "prediction_type": "phoneme"}, 
          {"reference": "S", "prediction": "S", "orig_index": 3, "type": "equal"} 
      ],
      "start": 2.6,
      "end": 3.12
    }
  ]
}

Field	Description
`reference`	Array of expected words from the reference text
`prediction`	Array of words actually spoken (word utterance predicted by AI)
`start`	Start time of the speech segment in seconds
`end`	End time of the speech segment in seconds
`type`	Type of match (equal, replace, insert, delete)
`reference_phonemes`	Our arpabet translation of the reference word phonemes
`aligned_result`	Detailed breakdown of pronunciation errors (provided only for mismatch of type "replace")

2. Phoneme Summary Object

The phoneme_summary object provides a high-level summary of pronunciation patterns and errors detected during the assessment. It includes: phoneme_replacements: A dictionary where each key is a phoneme substitution pattern (e.g., "M->N" means the phoneme "M" was replaced with "N"), and the value is the count of occurrences. It also includes a total_mispronunciations key for the total number of phoneme-level mispronunciations. complete_substitutions: Information about words that were completely substituted (i.e., the spoken word was entirely different from the reference word). Contains: total_count: The total number of complete word substitutions. reference_word_substitution: A list of objects, each mapping a reference word to the number of times it was completely substituted. unknown_words: A dictionary of words in the prediction that were not recognized in the reference text, with their counts.

"phoneme_summary": {
  "phoneme_replacements": {
    "M->N": 1,
    "AE->EY": 1,
    "total_mispronunciations": 2
  },
  "complete_substitutions": {
    "total_count": 0,
    "reference_word_substitution": []
  },
  "unknown_words": {}
}

Field	Description
`phoneme_replacements`	Dictionary of phoneme substitution patterns and their counts, plus total_mispronunciations.
`complete_substitutions`	Object with total_count and reference_word_substitution list for complete word swaps.
`unknown_words`	Dictionary of unrecognized words (words that are not in our phoneme dictionary and their counts. Please contact us if you want to add words to our phoneme dictionary. Currently, if we do not have a word in our phoneme dictionary, if an attempt was made, we count it correct.

V1 (Legacy)

⚠️ Note: V1 is maintained for backward compatibility. New implementations should use V2.

Endpoint

POST https://api.kidsmart.ai/v1/audio/fluency

[Previous V1 documentation content remains the same...]

Best Practices

Audio Quality
- Use a headset in noisy environments (like typical classrooms)
- If you cannot make out the words in the recording, neither can Kid Smart AI
- Ensure clear audio recording
- Keep recordings under 2 minutes
Assessment Guidelines
- Allow children opportunities to self-correct while reading
- Monitor assessment_id for status
- Use exponential backoff for retries
Error Handling
- Implement webhook error handling
- Use exponential backoff for retries
- Monitor assessment_id for status
Performance Optimization
- Process results asynchronously

Common Issues & Solutions

Poor Recognition
- Check audio quality
- Verify reference text format
- Ensure proper microphone placement
Slow Processing
- Use webhook callbacks
- Optimize audio file size
- Check network connectivity
Inconsistent Results
- Standardize recording environment
- Maintain consistent audio levels
- Use recommended audio formats

Migration Guide (V1 to V2)

Update endpoint URL to V2
Add webhook support if needed
Update response parsing for new format
Test with sample recordings
Monitor error patterns in new format

For support or questions, contact support@kidsmart.ai

Fluency Assessment API

Quick Start​

Latest Version (V2 - Recommended)​

V2 Features​

Input Parameters​

Response Structure​

Response Structure​

Base Response Fields​

Detailed Analysis Fields​

1. Details Object​

2. Phoneme Summary Object​

V1 (Legacy)​

Endpoint​

Best Practices​

Common Issues & Solutions​

Migration Guide (V1 to V2)​

Quick Start

Latest Version (V2 - Recommended)

V2 Features

Input Parameters

Response Structure

Response Structure

Base Response Fields

Detailed Analysis Fields

1. Details Object

2. Phoneme Summary Object

V1 (Legacy)

Endpoint

Best Practices

Common Issues & Solutions

Migration Guide (V1 to V2)