Pronunciation Assessment API
Pronunciation Assessment
The Verbal fluency assessment product provides fast, accurate analysis of a fluency assessment of a child. The student is recorded reading a passage, then that recording and an ID for the text that was read are passed to Kid Smart AI API.
Inputs:
- Audio file (wav only, specified)
- Expected Phoneme or phonemes (this can be in Arpabet, or in your own phonenic alphabet)
- Discuss with Kid Smart AI your phonemes, we can implement it.
- Model ID
- We can build custom model outputs for your use case, or you can select from one of our options based on confidence (high vs medium) and phoneme segmentation requirements (none vs segmented)
- Please see playground examples to see the differences.
Outputs for Phoneme assessment
Output Description | Details |
---|---|
Is Correct? | Boolean indicating whether the phonemes were correctly pronounced |
Analysis details | Timestamps and the predicted phonemes. If there is uncertainty in the phoneme prediction, multiple phonemes are given |
Feedback | Feedback any errors that were identified what was identified |
The analysis occurs after the submission of the audio file, typically within 30 seconds of submission.
Pronunciation Assessment
Pronunciation API Request (POST)
Step 1curl https://api.kidsmart.ai/v1/audio/pronunciation \
-H "x-api-key:$KIDSMART_API_KEY" \
-F "file=@$AUDIO_FILE" \
-F "user_token=$USER_ID" \
-F "reference_text=@$REFERENCE_TEXT" \
-F "model_id=$MODEL_ID"
-H "Content-Type: multipart/form-data"
Name | Type | Description |
---|---|---|
x-api-key | header | For test purposes. The app key to use to authenticate with the Web Service. |
file | field | The audio file to be analyzed. Audio files should be in WAV format and should be 3-15 seconds in duration |
reference_text | field | The reference phonemes against which the speech contained in the audio file should be analyzed. This should be in the phonetic alphabet of the model specified by the model id. The default phonetic alphabet is Arpabet, but we can customize to your specific phonetic alphabet on demand. |
user_token | field | A unique ID that represents the speaker in the audio file. This should be a non-human readable alphanumeric identifier (such as a UUID) that has meaning to you but not Kid Smart AI. This token can be used to request deletion of a specific user's data in line with our data privacy commitments. |
model_id | field | Model ID (id is given to you by Kid Smart AI). |
If successful, this will return a JSON response similar to the following:
{
"id":"abc123",
"url":"https://api.kidsmart.ai/v1/audio/pronunciation/result/{id}/",
}
Extract the value of the status_uri field and use this to retrieve the result. The processing time depends on the length of the file, its complexity (e.g., audio quality) and connection speed. If the result is not yet available, you will receive a HTTP 404 status code. If you encounter a HTTP 404 you should wait a period of time before retrying.
curl https://api.kidsmart.ai/v1/audio/pronunciation/result/{result_id}/
Pronunciation Response Structure
If the request is successful, the Fluency Web Service will return a JSON response containing the Fluency analysis. At the root of the results object are the following fields:
Field | Description |
---|---|
user_id | The user_id specified in the request. |
assessment_id | The unique identifier for the request. |
analysis_details | The pronunciation results object returned ([see details in results]). |
audio_duration | The duration of the audio file in seconds. |
is_correct | T/F boolean whether the audio contained the reference phoneme(s) |
model_id | The id of the model |
reference_text | The phonemes the audio was tested against |
feedback | If there audio was marked incorrect, then feedback on why it was marked incorrect |
The following is an example of the JSON structure you can expect from Pronunciation. In the example, the reference text is “D EH M” and the child says "D EH M"
Reference Text (Nonsense word "dem") | Child Says |
---|---|
D EH M | D EH M |
{
"assessment_id": "fd66cf8a-7945-47dc-a146-1938155dc858",
"reference_text": "D EH M",
"is_correct": true,
"feedback": "feedback feature coming soon",
"audio_duration": 6,
"analysis_details": [
{
"phoneme": "D",
"timestamp": 1.57
},
{
"phoneme": "EH",
"timestamp": 3.03
},
{
"phoneme": "M",
"timestamp": 3.63
}
],
"model_id": "segmentation_medium",
"user_id": "930fed57-fd29-4d76-86e9-5004ffcb1369"
}