I am interested in applying cross-disciplinary techniques from the fields of machine learning, signal processing, information theory, econometrics and statistical computing to real-life problems. The approach I take tends to be data-led and not focused on any particular methodology.
The problems I am interested in are generally characterised by the presence of both deterministic and stochastic components, while having a low signal to noise ratio and highly non-linear loss functions. This has the resulting effect that many complex techniques and parametric approaches do not seem to work any better than some trival techniques. In particular, financial data seems to be especially noisy and lacking in structure (unlike natural language processing for example, where there is lots of structure to exploit), which has led to the dominance of linear and empirical trading models.
Given the sequential update nature of Bayes rule, Bayesian methods find natural application to prediction problems. This is an area that looks like it will grow in importance, especially with the growth of computing power. Most of my work is in MATLAB, though I do use some C#.
Some of the academic research I have carried out with colleagues at Cambridge has been commercialized into BMLL Technologies Ltd, a company that specializes in dealing with limit order book data and its analysis.
Time Series Models
Time series analysis can be applied to many real-world problems for which recorded data already exists, for example finance, meteorology and consumer behaviour. Commonly used techniques include linear Gaussian basis functions with discrete or continuous time assumptions, such as hidden Markov models or the Kalman filter.
Many of the most interesting problems in biology have a clear mathematical basis, for example decoding the 3D proteome, understanding the network structure of the brain and reverse engineering the processes of cellular energy production. These wonderfully sophisticated systems present both significant challenges and opportunities for the data scientist.
By communicating with each other in person and by the internet, phone, social media and other networks, humans create a dataset is perhaps unparalleled in richness and complexity. This dataset can be captured, stored and analyzed allowing inferences to be drawn both about individual and group behaviour.
Many of the world's most popular games involve chance. These require the player to separate the underlying information from random patterns and accurately estimate expectations and higher moments. The human brain is known to be very good at such tasks, with computers currently struggling to match humans in this field. I am interested in trying to automate play and predict outcomes based on statistics.