Thiago Oliveira

Thiago de Paula Oliveira

Statistician | Data Quality | Statistical Computing

About Me

I am Thiago de Paula Oliveira, a statistician and statistical computing specialist at AbacusBio with 14+ years of experience developing reliable analytical workflows, reusable R packages, dashboards, and reproducible reporting across agriculture, genetics, public health, and sports analytics. My work centres on data quality, traceable QC/ETL pipelines, structured computational workflows, and decision-ready outputs built with R, C++, Bash, SQL, Docker, GitHub, and high-performance computing. Whether the task is improving analytical reliability, standardising breeding workflows, or delivering practical tools for researchers and stakeholders, I focus on transparency, reuse, and production-ready analysis.

Download Short CV Download Long CV

Interests

Statistics and biostatistics
Concordance analysis
Multilevel and forecast models
Generalized mixed-effects models
Longitudinal data
Quantitative genetics & breeding analytics
Agricultural decision-support dashboards
Reproducible analytics pipelines for agri-genomics
Economic and sustainability selection indexes

Education

PhD in Statistics
University of Sao Paulo
MSc in Statistics
University of Sao Paulo
BSc in Agricultural Engineering
University of Sao Paulo

Expertise and Research

I am a statistician and statistical computing specialist with 14+ years of experience developing reliable analytical workflows, reusable R packages, dashboards, and reproducible reporting across agriculture, genetics, public health, and sports analytics. My work focuses on data quality, analytical reliability, and decision-ready outputs delivered through structured computational workflows, QC/ETL pipelines, and standardised analysis practices.

As a Consultant Statistician at AbacusBio, I deliver statistical and analytical solutions for plant and animal breeding programmes, lead cross-functional delivery of genetic-evaluation pipelines, and build traceable workflows that link raw data, cleaned datasets, model inputs, and final outputs. That work depends on production-grade code in R, C++, Bash, and SQL, Docker-based environments, and clear technical documentation.

Earlier, I held a Marie Sklodowska-Curie COFUND fellowship at the Roslin Institute, worked on public-health and sports analytics projects at the National University of Ireland Galway, and taught statistics and quantitative methods at the University of Sao Paulo. Across those roles I developed software, dashboards, and analytical pipelines designed for transparency, reuse, and auditability.

I combine advanced statistical modelling with reproducible analytical tooling to make complex data more usable for researchers, analysts, and stakeholders. Browse my recent publications and projects, and get in touch if you would like to collaborate.

Featured Publications

Greenhouse Gas Mitigation

Breeding for sustainability: Development of an index to reduce greenhouse gas in dairy cattle

Documents the Canadian GHG selection index for dairy cattle, showing how methane, feed efficiency, and maintenance traits are weighted to cut emissions without sacrificing profit.

Mar 14, 2025

R Package

A Hierarchical Approach for Evaluating Athlete Performance with an Application in Elite Basketball

ON score is a comprehensive athlete rating using mixed-regression and PCA on 4-season NBA data.

Jan 24, 2024

R Package

Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps

Benchmarked the Reads2Map workflows and set best-practice defaults so breeders can build reliable linkage maps from noisy GBS data.

Oct 27, 2023

R Package

Comparison of Markerless and Marker-based Motion Capture Systems using 95% Functional Limits of Agreement in a Linear Mixed-Effects Modelling Framework

Markerless motion capture challenges traditional systems. A mixed-effects model assesses their agreement for reliable human movement analysis.

Jul 15, 2023

Quantitative Genetics

Pedigree-based Animal Models Using Directed Acyclic Graphs

Graphical DAG-based formulations help quantitative geneticists prototype, extend, and explain animal models with transparent assumptions.

Jun 10, 2023

See all publications

Recent Publications

C. Richardson, P. Amer, M. Post, Thiago de Paula Oliveira, K. Grant, J. Crowley, C. Quinton, F. Miglior, A. Fleming, C. F. Baes, F. Malchiodi (2025). Breeding for sustainability: Development of an index to reduce greenhouse gas in dairy cattle. Animal.

PDF Cite DOI Journal article DOI

Thiago de Paula Oliveira, John Newell (2024). A Hierarchical Approach for Evaluating Athlete Performance with an Application in Elite Basketball. Scientific Reports.

PDF Cite Dataset DOI Journal Link

Cristiane Hayumi Taniguti, Lucas Mitsuo Taniguti, Rodrigo Rampazo Amadeu, Jeekin Lau, Gabriel De Siqueira Gesteira, Thiago de Paula Oliveira, Getulio Caixeta Ferreira, Guilherme Da Silva Pereira, David Byrne, Marcelo Mollinari, Oscar Riera-Lizarazu, Antonio Augusto Franco Garcia (2023). Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps. GigaScience.

Cite Code Project DOI Journal article Package link

Kishor Das, Thiago de Paula Oliveira, John Newell (2023). Comparison of Markerless and Marker-based Motion Capture Systems using 95% Functional Limits of Agreement in a Linear Mixed-Effects Modelling Framework. Scientific Reports.

PDF Cite DOI Journal Link

Thiago de Paula Oliveira, Ivan Pocrnic, Gregor Gorjanc (2023). Pedigree-based Animal Models Using Directed Acyclic Graphs. Under consideration in Livestock Science.

Cite Project DOI Preprint

See all publications

Cited by

Loading live citation metrics...

Recent & Upcoming Talks

Breeding Programme

Quantifying the Drivers of Genetic Change in Plant Breeding

Evaluate the contribution of germplasm origin given the heterotic pool in a maize breeding programme into contributions to additive genetic mean and variance summarized over the years

Sep 21, 2022

Genetic Trend

A method for partitioning trends in genetic mean and variance to understand breeding practices

Partitioning method to quantify the contribution of different groups to genetic variance and its impact in breeding programme

Jul 27, 2022

Ggplot2

Visualization and Data Structure

Discussion on the principles of grammar of graphics and tidy data with application using the tidyverse

Sep 20, 2021

COVID-19

Global Short-Term Forecasting of Covid-19 Cases

Accurate short-term forecasting is thus vital to support country-level policy making during COVID-19 outbreak

Nov 12, 2020

COVID-19

Global Short-Term Forecasting of Covid-19 Cases

Accurate short-term forecasting is thus vital to support country-level policy making during COVID-19 outbreak

Jun 1, 2020

See all events

Benchmarking Kendall's Tau in R and Rcpp

Feb 27, 2026

Implement and benchmark a fast Kendall’s tau-a in C++ via Rcpp against base R, discuss tie handling (tau-b), and when to move from R to C++.

Feb 27, 2026

SQL for beginners: an example in field trials

Jan 10, 2026

As a statistician I teach SQL from zero using a realistic agriculture dataset with tables keys joins summaries quality checks and a simple treatment effect analysis.

Jan 10, 2026

From prototype to production, choosing the right R and C++ tool in Rcpp

Jul 6, 2025

what each tool solves, when to reach for it, and ready-to-paste code.

Jul 6, 2025

Effects of compression techniques on data read/write performance

Mar 15, 2025

This post explores the performance of various compression techniques in R for reading and writing operations, highlighting file size, speed, and memory usage.

Mar 15, 2025

Comparing data read and write performance in R

Jul 6, 2024

This post explores the performance of various data formats in R for reading and writing operations, highlighting file size, speed, and memory usage.

Jul 6, 2024

See all

Connect with an expert statistician

I focus on data quality, statistical modelling, reusable analytical tooling, and reproducible workflows that turn complex datasets into reliable, decision-ready outputs.

Areas of impact

Data stewardship. Design reproducible analytical workflows, automated QC/ETL pipelines, and structured computational environments that support reliability and traceability.

Data quality and usability. Build dashboards, software packages, and decision-support tools that improve accessibility, consistency, and interpretability of complex data.

Technical leadership. Lead cross-functional teams, mentor colleagues, and translate quantitative work into practical outputs for researchers, analysts, and stakeholders.

Explore my publications, projects, and recent work. If you are interested in collaborating or would like to learn more, please get in touch.

Stay connected and follow my work in statistical modelling and data analysis:

Google Scholar | GitHub

Star