# Logistic regression with categorical data in Ruby

I had some fun analysing the shelter animal data from kaggle using the Ruby gems daru for data wrangling and statsample-glm for model fitting. In this blog post, I want to demonstrate that data wrangling and statistical modeling is not an area of absolute predominance of Python and R, but that it is possible in Ruby too (though, currently to a much lesser extent).

# My first R package on CRAN

A couple of weeks ago I have released my first R package on CRAN. For me it turned out to be a far less painful process than many people on the internet portray it to be (even though the package uses quite a lot of C++ code via Rcpp and RcppEigen, and even though R CMD check returns two NOTEs). Some of the most helpful resources for publishing the package were:

# "Testing Statistical Hypotheses" and "Theory of Point Estimation" impressions

I spent much of the last two months reading Lehmann & Romano “Testing Statistical Hypotheses” (3rd ed.) and Lehmann & Casella “Theory of Point Estimation” (2nd ed.), abbr. TSH and TPE. The following is a collection of random facts observations I made while reading TSH and TPE. The choice of topics is biased towards application in regression models.

# Statistical linear mixed models in Ruby with mixed_models (GSoC2015)

Google Summer of Code 2015 is coming to an end. During this summer, I have learned too many things to list here about statistical modeling, Ruby and software development in general, and I had a lot of fun in the process!

# A (naive) application of linear mixed models to genetics

The following shows an application of class LMM from the Ruby gem mixed_models to SNP data (single-nucleotide polymorphism) with known pedigree structures. The family information is prior knowledge that we can model in the random effects of a linear mixed effects model.

# P-values and confidence intervals

A few days ago I started working on hypotheses tests and confidence intervals for my project mixed_models, and I got pretty surprised by certain things.

# MixedModels Formula Interface and Categorical Variables

I made some more progress on my Google Summer of Code project MixedModels. The linear mixed models fitting method is now capable of handling non-numeric (i.e., categorical) predictor variables, as well as interaction effects. Moreover, I gave the method a user friendly R-formula-like interface. I will present these new capabilities of the Ruby gem with an example. Then I will briefly describe their implementation.

During the last two weeks I made some progress on my Google Summer of Code project. The Ruby gem is now capable of fitting linear mixed models. In this short blog post I want to give an example, and compare the results I get in Ruby to those obtained by lme4 in R.