Thiago Oliveira

Thiago de Paula Oliveira

Statistician

About Me

I am Thiago de Paula Oliveira, a statistician at AbacusBio with 14+ years of experience turning experimental, genomic, and performance data into decisions. I specialise in advanced mixed-model and Bayesian analytics, the development of economic and sustainability selection indexes, and the delivery of reproducible analytics products—from R/C++ codebases to Dockerised dashboards. Whether the brief is accelerating genetic gain, improving farm-system resilience, or supporting athlete health, I focus on rigour, transparency, and decision-ready outputs.

Download Short CV Download Long CV

Interests

Statistics and biostatistics
Concordance analysis
Multilevel and forecast models
Generalized mixed-effects models
Longitudinal data
Quantitative genetics & breeding analytics
Agricultural decision-support dashboards
Reproducible analytics pipelines for agri-genomics
Economic and sustainability selection indexes

Education

PhD in Statistics
University of São Paulo
MSc in Statistics
University of São Paulo
BSc in Agricultural Engineering
University of São Paulo

📈 Expertise and Research

I am a statistician with 14+ years of experience turning noisy experimental, genomic, and performance data into decisions. After completing my PhD in Statistics at the University of São Paulo, I specialised in advanced mixed-model/Bayesian analytics and in the development of economic and sustainability selection indices that keep breeding programmes accountable.

As a Consultant Statistician at AbacusBio, I lead cross-functional teams that deliver genetic-evaluation pipelines, automated QC/ETL workflows, and decision dashboards for livestock, crop, and agri-tech partners. That work depends on production-grade code in R/C++/Bash, Docker-based reproducible environments, and early collaboration between domain scientists and data engineers.

Earlier, I held a Marie Skłodowska-Curie COFUND fellowship at the Roslin Institute (University of Edinburgh), built predictive health and sports-analytics products at the Insight Centre (NUI Galway) and Orreco, and lectured in statistics at USP. Along the way I have published across Nature-branded journals, advised national breeding programmes, and mentored teams on delivering transparent, auditable analyses.

Whether the brief is accelerating genetic gain, improving farm-system resilience, or supporting athlete health, my bias is toward rigour, reproducibility, and decision-ready outputs. Browse my recent publications and projects, and get in touch if you would like to collaborate or have a specific challenge in mind.

Featured Publications

Greenhouse Gas Mitigation

Breeding for sustainability: Development of an index to reduce greenhouse gas in dairy cattle

Documents the Canadian GHG selection index for dairy cattle, showing how methane, feed efficiency, and maintenance traits are weighted to cut emissions without sacrificing profit.

Mar 14, 2025

R Package

A Hierarchical Approach for Evaluating Athlete Performance with an Application in Elite Basketball

ON score is a comprehensive athlete rating using mixed-regression and PCA on 4-season NBA data.

Jan 24, 2024

R Package

Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps

Benchmarked the Reads2Map workflows and set best-practice defaults so breeders can build reliable linkage maps from noisy GBS data.

Oct 27, 2023

R Package

Comparison of Markerless and Marker-based Motion Capture Systems using 95% Functional Limits of Agreement in a Linear Mixed-Effects Modelling Framework

Markerless motion capture challenges traditional systems. A mixed-effects model assesses their agreement for reliable human movement analysis.

Jul 15, 2023

Quantitative Genetics

Pedigree-based Animal Models Using Directed Acyclic Graphs

Graphical DAG-based formulations help quantitative geneticists prototype, extend, and explain animal models with transparent assumptions.

Jun 10, 2023

See all publications

Recent Publications

C. Richardson, P. Amer, M. Post, Thiago de Paula Oliveira, K. Grant, J. Crowley, C. Quinton, F. Miglior, A. Fleming, C. F. Baes, F. Malchiodi (2025). Breeding for sustainability: Development of an index to reduce greenhouse gas in dairy cattle. Animal.

PDF Cite DOI Journal article DOI

Thiago de Paula Oliveira, John Newell (2024). A Hierarchical Approach for Evaluating Athlete Performance with an Application in Elite Basketball. Scientific Reports.

PDF Cite Dataset DOI Journal Link

Cristiane Hayumi Taniguti, Lucas Mitsuo Taniguti, Rodrigo Rampazo Amadeu, Jeekin Lau, Gabriel De Siqueira Gesteira, Thiago de Paula Oliveira, Getulio Caixeta Ferreira, Guilherme Da Silva Pereira, David Byrne, Marcelo Mollinari, Oscar Riera-Lizarazu, Antonio Augusto Franco Garcia (2023). Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps. GigaScience.

Cite Code Project DOI Journal article Package link

Kishor Das, Thiago de Paula Oliveira, John Newell (2023). Comparison of Markerless and Marker-based Motion Capture Systems using 95% Functional Limits of Agreement in a Linear Mixed-Effects Modelling Framework. Scientific Reports.

PDF Cite DOI Journal Link

Thiago de Paula Oliveira, Ivan Pocrnic, Gregor Gorjanc (2023). Pedigree-based Animal Models Using Directed Acyclic Graphs. Under consideration in Livestock Science.

Cite Project DOI Preprint

See all publications

Cited by

Recent & Upcoming Talks

Breeding Programme

Quantifying the Drivers of Genetic Change in Plant Breeding

Evaluate the contribution of germplasm origin given the heterotic pool in a maize breeding programme into contributions to additive genetic mean and variance summarized over the years

Sep 21, 2022

Genetic Trend

A method for partitioning trends in genetic mean and variance to understand breeding practices

Partitioning method to quantify the contribution of different groups to genetic variance and its impact in breeding programme

Jul 27, 2022

Ggplot2

Visualization and Data Structure

Discussion on the principles of grammar of graphics and tidy data with application using the tidyverse

Sep 20, 2021

COVID-19

Global Short-Term Forecasting of Covid-19 Cases

Accurate short-term forecasting is thus vital to support country-level policy making during COVID-19 outbreak

Nov 12, 2020

COVID-19

Global Short-Term Forecasting of Covid-19 Cases

Accurate short-term forecasting is thus vital to support country-level policy making during COVID-19 outbreak

Jun 1, 2020

See all events

Benchmarking Kendall's Tau in R and Rcpp

Aug 9, 2025

Implement and benchmark a fast Kendall’s tau-a in C++ via Rcpp against base R, discuss tie handling (tau-b), and when to move from R to C++.

Aug 9, 2025

From prototype to production, choosing the right R and C++ tool in Rcpp

Jul 6, 2025

what each tool solves, when to reach for it, and ready-to-paste code.

Jul 6, 2025

Effects of compression techniques on data read/write performance

Mar 15, 2025

This post explores the performance of various compression techniques in R for reading and writing operations, highlighting file size, speed, and memory usage.

Mar 15, 2025

Comparing data read and write performance in R

Jul 6, 2024

This post explores the performance of various data formats in R for reading and writing operations, highlighting file size, speed, and memory usage.

Jul 6, 2024

Exploring polynomial, fractional polynomial, and spline models

Jan 19, 2024

1 Polynomial models 2 Fractional polynomial models 2.1 Finding optimal power Values in fractional polynomials 3 Spline models 3.1 Example 3.2 Challenges 3.3 Selection process for spline models 4 Citation The ability to accurately model and interpret complex data sets is paramount. This technical exploration delves into three sophisticated modelling techniques: - Polynomial Models, - Fractional Polynomials, and - Spline Models. Each of these models serves as a fundamental tool in the statistical toolkit, enabling us to capture and understand the intricacies of linear and non-linear relationships inherent in real-world data.

Jan 19, 2024

See all

🎓 Connect with an expert statistician

I focus on advanced statistical modelling, economic and sustainability selection indices, interactive dashboards, and reproducible (Dockerised) pipelines that deliver decision-ready insights for agriculture, genetics, and sports performance.

Areas of impact

Agriculture. Design and analyse agronomic and farm-systems experiments, including multi-environment trials and spatial models, to optimise yield, resource use, and sustainability.

Genetics. Build genetic-evaluation pipelines and economic and sustainability selection indices that maximise genetic gain and inform breeding objectives.

Sports analytics. Develop tools and applications that enhance athlete performance through data-driven insights.

Explore my publications, projects, and recent work. If you are interested in collaborating or would like to learn more, please get in touch.

Stay connected and follow my work in statistical modelling and data analysis:

Google Scholar · GitHub

Star