I got my M.Sc. degree in Electrical Engineering from Enginyeria i Arquitectura La Salle, Universitat Ramon Llull (URL), Barcelona, Spain in 2004. In 2010 I completed my Ph.D. studies in also at Ramon Llull.

I was with the Department of Communications and Signal Theory, Enginyeria i Arquitectura LaSalle, as an Assistant Researcher from 2003 to March 2008. I then worked at Phonetic Arts Ltd in Cambridge, UK. This was a videogames company aiming to produce high-quality synthetic speech.

After three exciting years as a researcher at Phonetic Arts Ltd we were acquired and moved to Google in London. I became a technical lead manager of the research team and 5 years later I moved to the Research and machine intelligence team in Google NY.

This is a personal website. The opinions expressed here represent my own and not those of Google.

I’m also a co-founder of www.charcuterieshack.com and the blog https://charcuterieideas.wordpress.com. Please visit us!

This website is made with Jekyll.


Year Position
2003 BSc on Electrical engineering
2005 MSc on Electrical engineering
2006 Meteosam project
2007 SALERO project
2008 Diploma of Advanced Studies
2009 Research engineer at Phonetic-arts, Cambridge, UK
2010 PhD
2011 Research scientist at Google UK
2013 Technical Lead Manager of the TTS research team at Google UK
2016 Research and machine intelligence at Google NY


Table of content

Array processing

The use of array processing can deal with most of the problems found in these situations. Multiple sources (e.g. meetings with various potential speakers) can be spatially filtered using directional techniques to select a speaking user and reject other voices. In situations where there is not a close-distance audio capture, arrays help to filter multiple signal bouncing, distortion from environmental noise and reverberation effects. The benefit obtained is an increase in signal-to-noise-ratio resulting to higher system capacity.

Have a look at the following two reports:

Dialogue systems

What is a dialogue system?

A dialog system is a computer system intended to converse with a coherent structure. Dialog systems can communicate by using text and speech or a combination of other modes on both the input and output channels.

How do we make a machine learn to speak in a dialogue?

Nowadays, important advances have been done in research for artificial intelligent algorithms or Machine Learning. Although greatly results and applications have been presented, we are still far away from complete artificial solutions. One of the most interesting fields in unsupervised learning is reinforcement learning for the solution of Markov Decision Process frameworks. In this context, we introduce dialogue systems defined like a sequential process in the Markov environment and we apply different reinforced algorithm to automatic learn different dialogue strategies. Documents

Have a look at my Master thesis and at this report.


My PhD is entitled “HMM-based speech synthesis applied to Spanish and English, its applications and a hybrid approach” and is available online.