Pronunciation Assessment API

Pronunciation Assessment

The Verbal fluency assessment product provides fast, accurate analysis of a fluency assessment of a child. The student is recorded reading a passage, then that recording and an ID for the text that was read are passed to Kid Smart AI API.

Inputs:

Audio file (wav only, specified)
Expected Phoneme or phonemes (this can be in Arpabet, or in your own phonenic alphabet)
- Discuss with Kid Smart AI your phonemes, we can implement it.
Model ID
- We can build custom model outputs for your use case, or you can select from one of our options based on confidence (high vs medium) and phoneme segmentation requirements (none vs segmented)
- Please see playground examples to see the differences.

Outputs for Phoneme assessment

Output Description	Details
Is Correct?	Boolean indicating whether the phonemes were correctly pronounced
Analysis details	Timestamps and the predicted phonemes. If there is uncertainty in the phoneme prediction, multiple phonemes are given
Feedback	Feedback any errors that were identified what was identified

The analysis occurs after the submission of the audio file, typically within 30 seconds of submission.

Pronunciation Assessment

Pronunciation API Request (POST)

Step 1

curl https://api.kidsmart.ai/v1/audio/pronunciation \
    -H "x-api-key:$KIDSMART_API_KEY" \
    -F "file=@$AUDIO_FILE" \
    -F "user_token=$USER_ID" \
    -F "reference_text=@$REFERENCE_TEXT" \
    -F "model_id=$MODEL_ID" 
    -H "Content-Type: multipart/form-data"

Name	Type	Description
`x-api-key`	header	For test purposes. The app key to use to authenticate with the Web Service.
`file`	field	The audio file to be analyzed. Audio files should be in WAV format and should be 3-15 seconds in duration
`reference_text`	field	The reference phonemes against which the speech contained in the audio file should be analyzed. This should be in the phonetic alphabet of the model specified by the model id. The default phonetic alphabet is Arpabet, but we can customize to your specific phonetic alphabet on demand.
`user_token`	field	A unique ID that represents the speaker in the audio file. This should be a non-human readable alphanumeric identifier (such as a UUID) that has meaning to you but not Kid Smart AI. This token can be used to request deletion of a specific user's data in line with our data privacy commitments.
`model_id`	field	Model ID (id is given to you by Kid Smart AI).

If successful, this will return a JSON response similar to the following:

{
  "id":"abc123",
  "url":"https://api.kidsmart.ai/v1/audio/pronunciation/result/{id}/",
}

Step 2: Retrieve result (Get)

Extract the value of the status_uri field and use this to retrieve the result. The processing time depends on the length of the file, its complexity (e.g., audio quality) and connection speed. If the result is not yet available, you will receive a HTTP 404 status code. If you encounter a HTTP 404 you should wait a period of time before retrying.

curl https://api.kidsmart.ai/v1/audio/pronunciation/result/{result_id}/

Pronunciation Response Structure

If the request is successful, the Fluency Web Service will return a JSON response containing the Fluency analysis. At the root of the results object are the following fields:

Field	Description
`user_id`	The `user_id` specified in the request.
`assessment_id`	The unique identifier for the request.
`analysis_details`	The pronunciation results object returned ([see details in results]).
`audio_duration`	The duration of the audio file in seconds.
`is_correct`	T/F boolean whether the audio contained the reference phoneme(s)
`model_id`	The id of the model
`reference_text`	The phonemes the audio was tested against
`feedback`	If there audio was marked incorrect, then feedback on why it was marked incorrect

API Response

The following is an example of the JSON structure you can expect from Pronunciation. In the example, the reference text is “D EH M” and the child says "D EH M"

Reference Text (Nonsense word "dem")	Child Says
D EH M	D EH M

{
"assessment_id": "fd66cf8a-7945-47dc-a146-1938155dc858",
"reference_text": "D EH M",
"is_correct": true,
"feedback": "feedback feature coming soon",
"audio_duration": 6,
"analysis_details": [
  {
    "phoneme": "D",
    "timestamp": 1.57
  },
  {
    "phoneme": "EH",
    "timestamp": 3.03
  },
  {
    "phoneme": "M",
    "timestamp": 3.63
  }
],
"model_id": "segmentation_medium",
"user_id": "930fed57-fd29-4d76-86e9-5004ffcb1369"
}

Pronunciation Assessment API

Pronunciation Assessment​

Outputs for Phoneme assessment​

Pronunciation Assessment​

Pronunciation API Request (POST)​

Pronunciation Response Structure​

Pronunciation Assessment

Outputs for Phoneme assessment

Pronunciation Assessment

Pronunciation API Request (POST)

Pronunciation Response Structure