The AudioAnalysis task runs all the audio preprocessing tasks that are supported by the audiopreproc module in a single task.
For more information, see [audiopreproc] Module Configuration.
| Parameter | Description | Required |
|---|---|---|
| Type | The task name. Specify AudioAnalysis. |
Yes |
| AppDnnBase | The location of the appResources directory, which contains the DNN and .ian files to use. |
|
| AppFrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
| CtmFile | The speech-to-text transcript produced for the audio file. | Yes |
| EndTime | The end of an audio section to process. | |
| File | The audio file to process. | Yes, if InputType is File. |
| InputType | The type of audio to process (file, binary data, or stream). | |
| Out | The XML file to write the audio analysis results to. | Yes |
| Sfreq | The sample frequency of the audio file to process. | |
| SpeechBias | Whether to bias towards speech (rather than music, noise, or silence) in the identification of audio segments. | |
| StartTime | The beginning of an audio section to process. | |
| SugdInputChannels | The channel layout of the input media file. This parameter does not apply when InputType is Stream. |
|
| SugdInputFrequency | The sampling rate of the input media file. This parameter does not apply when InputType is Stream. |
http://localhost:13000/action=AddTask&Type=AudioAnalysis&File=C:\data\Sample.wav&Out=SampleAnalysis.xml
This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to perform audio analysis on the Sample.wav file and to write the results to the SampleAnalysis.xml file.
The AudioAnalysis log file provides information on several audio quality assessments. For example:
<autnresponse>
<audiopreproc>
<snr>
<mean>20</mean>
<audio_level>66</audio_level>
</snr>
<gain>
<size>35</size>
<energy>69</energy>
</gain>
<max_gain_difference>0</max_gain_difference>
<clipping>
<assessment>no</assessment>
<percent_frames>0</percent_frames>
</clipping>
<categories>
<speech_percent>77.3667</speech_percent>
<silence_percent>7.45</silence_percent>
<noise_music_percent>15.9</noise_music_percent>
</categories>
</audiopreproc>
<resultDeleted>False</resultDeleted>
</autnresponse>
The log file includes information on the following:
The gain level, and the actual energy level. The log file also includes a summary of the maximum difference in decibels between speaker levels across the whole file (<max_gain_difference>). For a good quality waveform where the two speakers speak at a similar gain level, this number can be zero (or at least very low).
An assessment of the amount of clipping in the file, and the number of frames affected. The <assessment> field can hold one of the following values:
|
|
no clipping |
insignificant
|
<= 0.1% of frames |
minor
|
<= 1% of frames |
moderate
|
<= 4% of frames |
heavy
|
> 4% of frames |
You can use the GetResults action to retrieve this information; you do not need to specify a result label.
The AudioAnalysis task also produces an additional audio classification .ctm file. By default, this has the same name as the task token. You can use the GetResults action with the label parameter set to class to retrieve this file.
|
|