It, like image recognition (just voice recognition in 2+ dimensions) and action recognition (3+ dimensions), is a very difficult problem.
While better algorithms and faster hardware will always make these technologies improve, there are some who believe there is a ceiling that can never be...