Intel、唇の動きを読むソフト・・・・

Intel、唇の動きを読むソフト・・・・
http://www.asyura.com/0304/bd25/msg/796.html
投稿者小耳日時 2003 年 5 月 01 日 11:39:26:

Intel、唇の動きを読むソフト
http://headlines.yahoo.co.jp/hl?a=20030429-00000019-zdn-sci

　米Intelは、話し手の唇の動きを読み取ることができる音声認識ソフト「Audio Visual Speech Recognition」（AVSR）をリリースした。

　AVSRは話者の顔と口の動きを追跡し、その動きを音声と合わせることで、騒音の中でも音声コマンドに反応できるデータをコンピュータに提供する。AVSRプログラムは、コンピュータによるビジュアルデータ解釈のためのオープンソースアプリケーションとツールを集めた「OpenCV」コンピュータビジョンライブラリの一部。

　コンピュータ各社は何年にもわたって音声認識ソフトの普及を試みてきたが、コンピュータの処理能力不足とソフトの性能の限界によって伸び悩んでいた。

　しかしこの両方の要因とも急速に変わりつつある。プロセッサのクロックは現在平均で1.5GHz、最先端のものでは3GHzになっている。さらに、音声コマンドで機能するアプリケーションの研究も進歩している。

　こうしたアプリケーション向上の1つの手法が、Intelのようにビジュアルな信号を音声認識手段に組み込むこと。例えばMicrosoft Researchでは、音声コマンドと手の動きを組み合わせてファイルをスクロールしたりウィンドウを動かすアプリケーションの試作品「GWindows」を開発している。

　GWindowsでは、TVモニタに設置したビデオカメラが、手やポインタなどスクリーンの20インチ以内で動く物体をとらえる。このアプリケーションでは手の動き（あるいはポインタの動き）をコンピュータのコマンドとして解釈し、例えば指をウィンドウの上に置いた後に左に動かすと、ウィンドウが左に移動する。音声で「scroll」などのコマンドが与えられると、コンピュータは指と音声コマンドを組み合わせて下にスクロールする。特別な手袋などは必要としない。（ZDNet）
[4月29日11時51分更新]

＊興味を持たれた方は下記へ
　　動画(mpg）で解析インジケーターを見せてくれます。

Visual Interactivity ：Audio-Visual Speech
--------------------------------------------
http://www.intel.com/research/mrl/research/avcsr.htm

Ara V Nefian, Lu Hong Liang, Xiao Xing Liu, Xiaobo Pi

　 The increase in the number of multimedia applications that require robust speech recognition systems determined a large interest in the study of audio-visual speech recognition (AVSR) systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech generation and the need for features that are invariant to acoustic noise perturbation. The speaker independent audio-visual continuous speech recognition system relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region. Further, the visual and acoustic observation sequences are integrated using a coupled hidden Markov model (CHMM) shown in Figure 1. The statistical properties of the CHMM can model the audio and visual state asynchrony while preserving their natural correlation over time. The experimental results show that the current system tested on the XM2VTS database (295 speakers) reduces by over 55% the error rate of the audio only speech recognition system at SNR of 0db (Figure 2).

Figure 1. A coupled HMM used in audio-visual integration

Figure 2. The word error rate (WER) at different signal to noise ratio (SNR) levels for audio-only, video-only and audio-visual speech recognition.

Figure 3. Speech recognition examples for an audio-visual sequence captured in clean (top) and noisy (bottom, SNR = 5db) acoustic conditions (mpeg files).

　次へ　前へ

Ψ空耳の丘Ψ２５掲示板へ

フォローアップ:

投稿コメント全ログ　コメント即時配信　スレ建て依頼　削除コメント確認方法

★阿修羅♪　http://www.asyura2.com/ 　since 1995

　題名には必ず「阿修羅さんへ」と記述してください。
掲示板,ＭＬを含むこのサイトすべての
一切の引用、転載、リンクを許可いたします。確認メールは不要です。
引用元リンクを表示してください。