About me Brief CV Publications Research

Welcome to Xavi's page

I work in Google NY in the Research and Machine intelligence team since April 2016.

About me

I got my M.Sc. degree in Electrical Engineering from Enginyeria i Arquitectura La Salle, Universitat Ramon Llull (URL), Barcelona, Spain in 2004. In 2010 I completed my Ph.D. studies in also at Ramon Llull.

I was with the Department of Communications and Signal Theory, Enginyeria i Arquitectura LaSalle, as an Assistant Researcher from 2003 to March 2008. I then worked at Phonetic Arts Ltd in Cambridge, UK. This was a videogames company aiming to produce high-quality synthetic speech.

After three exciting years as a researcher at Phonetic Arts Ltd we were acquired and moved to Google in London. I became a technical lead manager of the research team and 5 years later I moved to the Research and machine intelligence team in Google NY.

This is a personal website. The opinions expressed here represent my own and not those of Google.

Brief CV

Year Position
2003 BSc on Electrical engineering
2005 MSc on Electrical engineering
2006 Meteosam project
2007 SALERO project
2009 Research engineer at Phonetic-arts, Cambridge, UK
2010 PhD
2011 Research scientist at Google UK
2013 Technical Lead Manager of the TTS research team at Google UK
2016 Research and machine intelligence (RMI) at Google NY
2017 We deliver AdaNet as part of our research at Google NY
2019 Technical lead of the AutoLX team, AutoML




AdaNet is a framework for ensembling deep neural networks (DNN). There are two main ideas that make AdaNet special:

  • We can ensemble any type of DNN (e.g. recurrent and convolutional).
  • We work with complexity metrics and so our algorithm has learning guarantees.

Have a look at the ICML’17 publication

Array processing

The use of array processing can deal with most of the problems found in these situations. Multiple sources (e.g. meetings with various potential speakers) can be spatially filtered using directional techniques to select a speaking user and reject other voices. In situations where there is not a close-distance audio capture, arrays help to filter multiple signal bouncing, distortion from environmental noise and reverberation effects. The benefit obtained is an increase in signal-to-noise-ratio resulting to higher system capacity.

Have a look at the following two reports:

Dialogue systems

What is a dialogue system?

A dialog system is a computer system intended to converse with a coherent structure. Dialog systems can communicate by using text and speech or a combination of other modes on both the input and output channels.

How do we make a machine learn to speak in a dialogue?

Nowadays, important advances have been done in research for artificial intelligent algorithms or Machine Learning. Although greatly results and applications have been presented, we are still far away from complete artificial solutions. One of the most interesting fields in unsupervised learning is reinforcement learning for the solution of Markov Decision Process frameworks. In this context, we introduce dialogue systems defined like a sequential process in the Markov environment and we apply different reinforced algorithm to automatic learn different dialogue strategies. Documents

Have a look at my Master thesis and at this report.


My PhD is entitled “HMM-based speech synthesis applied to Spanish and English, its applications and a hybrid approach” and is available online.