I got my M.Sc. degree in Electrical Engineering from Enginyeria i Arquitectura La Salle, Universitat Ramon Llull (URL), Barcelona, Spain in 2004. In 2010 I completed my Ph.D. studies in also at Ramon Llull.
I was with the Department of Communications and Signal Theory, Enginyeria i Arquitectura LaSalle, as an Assistant Researcher from 2003 to March 2008. I then worked at Phonetic Arts Ltd in Cambridge, UK. This was a videogames company aiming to produce high-quality synthetic speech.
After three exciting years as a researcher at Phonetic Arts Ltd we were acquired and moved to Google in London. I became a technical lead manager of the research team and 5 years later I moved to the Research and machine intelligence team in Google NY.
This is a personal website. The opinions expressed here represent my own and not those of Google.
|2003||BSc on Electrical engineering|
|2005||MSc on Electrical engineering|
|2009||Research engineer at Phonetic-arts, Cambridge, UK|
|2011||Research scientist at Google UK|
|2013||Technical Lead Manager of the TTS research team at Google UK|
|2016||Research and machine intelligence (RMI) at Google NY|
|2017||We deliver AdaNet as part of our research at Google NY|
|2019||Technical lead of the AutoLX team, AutoML|
AdaNet is a framework for ensembling deep neural networks (DNN). There are two main ideas that make AdaNet special:
- We can ensemble any type of DNN (e.g. recurrent and convolutional).
- We work with complexity metrics and so our algorithm has learning guarantees.
Have a look at the ICML’17 publication
The use of array processing can deal with most of the problems found in these situations. Multiple sources (e.g. meetings with various potential speakers) can be spatially filtered using directional techniques to select a speaking user and reject other voices. In situations where there is not a close-distance audio capture, arrays help to filter multiple signal bouncing, distortion from environmental noise and reverberation effects. The benefit obtained is an increase in signal-to-noise-ratio resulting to higher system capacity.
Have a look at the following two reports:
What is a dialogue system?
A dialog system is a computer system intended to converse with a coherent structure. Dialog systems can communicate by using text and speech or a combination of other modes on both the input and output channels.
How do we make a machine learn to speak in a dialogue?
Nowadays, important advances have been done in research for artificial intelligent algorithms or Machine Learning. Although greatly results and applications have been presented, we are still far away from complete artificial solutions. One of the most interesting fields in unsupervised learning is reinforcement learning for the solution of Markov Decision Process frameworks. In this context, we introduce dialogue systems defined like a sequential process in the Markov environment and we apply different reinforced algorithm to automatic learn different dialogue strategies. Documents
My PhD is entitled “HMM-based speech synthesis applied to Spanish and English, its applications and a hybrid approach” and is available online.