当前位置：首页 > 编程资源 > 编程问答 >内容正文

编程问答

AXE模式隐私号基于语音流分析的用户接听识别方案

发布时间：2023/12/31 编程问答 32 豆豆

生活随笔收集整理的这篇文章主要介绍了 AXE模式隐私号基于语音流分析的用户接听识别方案小编觉得挺不错的,现在分享给大家,帮大家做个参考.

背景

在使用AXE模式隐私号外呼用户时发现几家隐私号服务提供商并不是都有接通回调可以设置
所以需要设置通用的用户接听识别方案(录音和播报欢迎语等场景)

目的

在接入语音模型训练之前通过波形准确识别嘟嘟嘟和彩铃覆盖90%以上的case

调研

VAD:
语音活动检测(Voice Activity Detection,VAD)又称语音端点检测,语音边界检测。目的是从声音信号流里识别和消除长时间的静音期，以达到在不降低业务质量的情况下节省话路资源的作用，它是IP电话应用的重要组成部分。静音抑制可以节省宝贵的带宽资源，可以有利于减少用户感觉到的端到端的时延。

TarsosDSP:
git 地址: https://github.com/JorenSix/TarsosDSP
TarsosDSP is a Java library for audio processing. Its aim is to provide an easy-to-use interface to practical music processing algorithms implemented, as simply as possible, in pure Java and without any other external dependencies. The library tries to hit the sweet spot between being capable enough to get real tasks done but compact and simple enough to serve as a demonstration on how DSP algorithms works. TarsosDSP features an implementation of a percussion onset detector and a number of pitch detection algorithms: YIN, the Mcleod Pitch method and a “Dynamic Wavelet Algorithm Pitch Tracking” algorithm. Also included is a Goertzel DTMF decoding algorithm, a time stretch algorithm (WSOLA), resampling, filters, simple synthesis, some audio effects, and a pitch shifting algorithm.

回铃音:
表示被叫用户处于被振铃状态，采用频率为450±25Hz的交流电源，发送电平为-10±3dBm，它是5s断续的信号音，即1s送，4s断，与振铃音一致。

彩铃音:
连续不间断的音乐波形

思路

根据对波形的分析从左到右分为三段分别为

“请输入四位分机号以#号键结束”

“振铃嘟嘟嘟”

“用户说话”

所以目的分为三步
4. 跳过特定时长绕过输入分机号的播报
5. 对沉默后的第一段活跃做检测去匹配彩铃特征或者嘟声特征
6. 找到跳出特征的时刻就是用户接听的时刻

代码实现

使用TarsosDSP提供的静音检测能力和频率识别能力
注意要自己引入一下依赖 tarsos包在上面调研的tarsos介绍的git地址里

调用:

public static void main (String[] args){PickUp pickUp = new PickUp("xxx.wav", 8000, 16, 1000, 4500);pickUp.start();System.exit(-1);}

PickUp:

package xxx;import be.tarsos.dsp.AudioDispatcher; import be.tarsos.dsp.AudioEvent; import be.tarsos.dsp.AudioProcessor; import be.tarsos.dsp.SilenceDetector; import be.tarsos.dsp.io.TarsosDSPAudioFloatConverter; import be.tarsos.dsp.io.TarsosDSPAudioFormat; import be.tarsos.dsp.io.UniversalAudioInputStream; import be.tarsos.dsp.pitch.PitchDetectionHandler; import be.tarsos.dsp.pitch.PitchDetectionResult; import be.tarsos.dsp.pitch.PitchProcessor; import java.io.*; import java.util.concurrent.ConcurrentLinkedQueue;public class PickUp {public enum RingbackType {UNCHECK,DU_NORMALITY,DU_OTHER,SONG;}private ConcurrentLinkedQueue<byte[]> audioQueue = new ConcurrentLinkedQueue<byte[]>();private boolean isFinishReadFile = false; // 是否读取完文件private String filePath;private String fileName;private int readLength = 1600; // 100ms音频的字节数private int noinputTimeout = 1000; //跳过开始多少msprivate int silenceMaxTimes = 10; // 以100ms为单位检测连续的多少次静音private float sampleRate = 8000; // 采样率private int sampleSizeInBits = 16; //位深度/*** 用户接听检测* @param filePath 文件路径* @param sampleRate 采样率* @param sampleSizeInBits 位深度* @param noinputTimeout 需要跳过多久时长开始检测* @param silenceTimeout 默认沉默多久结束(兜底)** @date 检测方式:1.嘟嘟嘟采用450HZ的频率检测 2.彩铃采用连续活跃进行检测*/public PickUp(String filePath, float sampleRate, int sampleSizeInBits, int noinputTimeout, int silenceTimeout) {this.filePath = filePath;this.sampleRate = sampleRate;this.sampleSizeInBits = sampleSizeInBits;//根据参数计算100ms音频的字节数this.readLength = (int)sampleRate*(sampleSizeInBits/8)/10;this.noinputTimeout = noinputTimeout;//计算检测几个 100毫秒单位长度this.silenceMaxTimes = (int)silenceTimeout/100;}public void start() {File audioFile = new File(this.filePath);FileInputStream fis;try {audioQueue.clear();fileName = audioFile.getName();isFinishReadFile = false;Thread sttThread = new Thread(vadRunbale);sttThread.start();fis = new FileInputStream(audioFile);byte[] byteArr = new byte[this.readLength];int size;fis.skip(44);while ((size = fis.read(byteArr)) != -1) {audioQueue.add(byteArr.clone());}while (!audioQueue.isEmpty() && !isFinishReadFile) {Thread.sleep(2000);}isFinishReadFile = true;fis.close();while (sttThread.isAlive()) {Thread.sleep(2000);}//在这里回调System.out.println("正常结束");} catch (FileNotFoundException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();} catch (InterruptedException e) {e.printStackTrace();}}private Runnable vadRunbale = new Runnable() {volatile int countHZ = 0;volatile int count450HZ = 0;@Overridepublic void run() {RingbackType ringbackType = RingbackType.UNCHECK;int currentPartTime = 0, silenceTimes = 0, firstActiveTimes = 0, differentCount = 0;try {// 使用tarsos检测静音TarsosDSPAudioFormat tdspFormat = new TarsosDSPAudioFormat(sampleRate, sampleSizeInBits, 1, true, false);float[] voiceFloatArr = new float[readLength / tdspFormat.getFrameSize()];while (!isFinishReadFile) {// 条件是主动结束,并且队列中已经没有数据byte[] data = audioQueue.poll();if (data == null) {Thread.sleep(50);continue;}TarsosDSPAudioFloatConverter.getConverter(tdspFormat).toFloatArray(data.clone(),voiceFloatArr);SilenceDetector silenceDetector = new SilenceDetector();boolean isSlience = silenceDetector.isSilence(voiceFloatArr);//以100ms为单位多次检测静音if ((currentPartTime+=100) >= noinputTimeout) {boolean checkHZ = false;if (isSlience) {if(firstActiveTimes == 0){System.out.println("活动前静音,忽略");continue;}System.out.println("检测到静音"+ringbackType);// 检测连续静音到达最大值结束if(++silenceTimes >=silenceMaxTimes){isFinishReadFile = true;//检测到静音就不需要等待文件读取完成}switch(ringbackType){case UNCHECK:if(countHZ==count450HZ){if(countHZ<=11){ringbackType = RingbackType.DU_NORMALITY;//中国标准为嘟1s 停4ssilenceMaxTimes = 41;}else {ringbackType = RingbackType.DU_OTHER;checkHZ = true;}}break;case DU_OTHER:checkHZ = true;//连续3个打破特征跳出if(countHZ!=count450HZ){differentCount++;count450HZ = countHZ;}else {differentCount = 0;}if(differentCount>=3){isFinishReadFile = true;}//嘟声启动hz检查checkHZ = true;break;case SONG://持续音乐中断isFinishReadFile = true;break;default:break;}} else {System.out.println("活动状态"+ringbackType);switch(ringbackType){case UNCHECK:firstActiveTimes++;//首次活跃大于两秒,判定为音乐if(firstActiveTimes>=20){ringbackType = RingbackType.SONG;}//首次活跃开始启动HZ检查checkHZ = true;break;case DU_NORMALITY://沉默时长小于40if(silenceTimes!=0 &&silenceTimes<35){isFinishReadFile = true;}//不break继续执行case DU_OTHER://连续3个打破特征跳出if(countHZ!=count450HZ){differentCount++;count450HZ = countHZ;}else {differentCount = 0;}if(differentCount>=3){isFinishReadFile = true;}//嘟声启动hz检查checkHZ = true;break;default:break;}//重置静音次数silenceTimes = 0;}//做HZ检查if(checkHZ && !isFinishReadFile){//做HZ判断AudioDispatcher dispatcher = new AudioDispatcher(new UniversalAudioInputStream(new ByteArrayInputStream(data), tdspFormat), data.length, 0);AudioProcessor audioProcessor = new PitchProcessor(PitchProcessor.PitchEstimationAlgorithm.FFT_YIN, 8000, data.length, new PitchDetectionHandler(){@Overridepublic void handlePitch(PitchDetectionResult pitchDetectionResult, AudioEvent audioEvent) {countHZ++;float pitch = pitchDetectionResult.getPitch();System.out.println(pitch+"HZ");if(pitch>445&&pitch<455){count450HZ++;}}});dispatcher.addAudioProcessor(audioProcessor);dispatcher.run();}}}System.out.println(fileName+"退出,位置为"+currentPartTime/10+" "+ringbackType);} catch (Exception e) {e.printStackTrace();}}};}

效果测试

回铃音

每0.1秒打印一次日志频率特征符合预期响1s停4s符合预期

彩铃音

每0.1秒打印一次日志特征识别为音乐特征结束符合实际接听时间(对应上面的彩铃音波形图)

总结

以上是生活随笔为你收集整理的AXE模式隐私号基于语音流分析的用户接听识别方案的全部内容，希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错，欢迎将生活随笔推荐给好友。

上一篇： rdkit Recap、BRICS分子片
下一篇： SVN 合并分支