Client Background:
An artificial intelligence data service provider which was founded in 2014, providing voice, image, text, video data collection and annotation services.
Task: Speech Analytics + Transcription
Cut a section of clear human speech from the audio and transcribe the audio into text.
Operation instruction:
- Step 1. Listen to the intercepted audio
- Step 2. Select audio category
- Step 3-1. If choose discard classification, submitting this task directly. Do not care about transcription.
- Step 3-2. If choose speech classification, you need to determine whether to intercept the audio or not. And then transcribe the audio.
Audio classes:
There are 2 options for audio classes: 【speech】 and 【discard】, here are the definitions:
a. Speech:
1. You can select a speech part that is in Chinese, and the speech part is clear.
2. Only when you chose speech, you need to transcribe text from the audio.
b. Discard:
1. The entire audio is in non-Chinese language.
2. The entire audio is unclear or non-audible speech.
3. The entire audio is Non-human speech, which includes melodies, singing, animals' sounds, and nature sounds.
4. When you choose to discard, you don't need to do the transcription, just submit and do the next sample.
Cut speech:
1. The unclear part is that you cannot know what the words should be, please cut them out.
2. Do without thinking about the completeness of the sentence, while cutting the audio.
3. Do not select Overlapping speech (2 or more speakers talking simultaneously).
4. Do not select music, melodies, singing, animal, or natural sound.
5. Do not keep the part that is so noisy or unclear so that you cannot hear clearly what the speaker said.
6. The selected speech should start with (and end with) up to 2 modal words, if there are many modal words at the beginning or the end, you need to cut the long modal words part to be short.
7. When the audio content is a conversation with pauses and noises in the middle, you do not need to cut the pause/noise part out, you can keep this part in audio. When you do transcription, you can skip this pause/noise part.
Text transcribe:
a. Spaces are needed between words.
b. No punctuation in text
c. Chinese numbers should be transcribed into the word in Chinese
d. The word is half pronounced, most happens at the start or end of the audio, sometimes in the middle.
e. Repeated words and sentences must be transcribed strictly according to the number of times they get repeated.
f. Modal words need to be transcribed. eg. "ha ha", "hi".
g. The final intercepted audio must contain at least two words (≥2).
Note:
1. Please double check and make sure that the text aligns with the audio before moving on to the next section.
2. Transcribe what you hear, including ungrammaticalities.
3. Transcriptions must be 100% accurate to the cut speech part.

