This directory contains a program in the C language used in the manuscript "Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data" by D. McParland, C.M. Phillips, L. Brennan, H. M. Roche and I.C. Gormley,  published in Statistics in Medicine in 2018.

The files are set up to fit the 2 component model with an 8 dimensional latent trait. This model was fitted to the LIPGENE data and the results are presented in section 4 of the manuscript. The data used in the paper cannot be provided.

INSTRUCTIONS TO RUN THE PROGRAM
-------------------------------
-------------------------------

The program is compiled using the Makefile which is in the directory. To compile the program type ‘make’ into the terminal. To execute the program after it has been compiled type ‘./MFAMD_main’.

Starting values and other initialisations must be altered in the C files described below.


FILE DESCRIPTIONS
-----------------
-----------------


1. Input (Folder)
-----------------
This folder contains required initial values of model parameters:

* Gamma.txt -- cut off parameters.
* ind.txt -- clustering labels.
* InvPsi.txt -- diagonal matrix of inverse psi parameters for continuous variables.
* lambda.txt -- loadings matrices. 
* ng.txt -- number of observations in each cluster.
* pi.txt -- mixing weights.
* theta.txt -- latent traits.
* Y.txt -- observed data. 
* Z.txt -- latent continuous data.


2. Output (Folder)
------------------
The output from the program is written to this folder. These files can then be imported into another statistical software package where further analysis can be conducted.


3. FileOps.c
------------
This file contains functions to perform various file operations (such as read from or write to). Each function is preceded by a comment detailing its purpose. No alterations should be made to these functions.


4. LinAlg.c
-----------
This file contains functions to perform various linear algebra procedures. Each function is preceded by a comment detailing its purpose. No alterations should be made to these functions.


5. Mem.c
--------
This file contains functions which allocate memory for various different types of objects. Each function is preceded by a comment detailing its purpose. No alterations should be made to these functions.


6. MFAMD_Funcs.c
----------------
This file contains functions which generate samples from the posterior full conditional distributions of the model parameters. A comment preceding each function indicates the posterior full conditional distribution from which samples are drawn.


7. MFAMD_main.c
---------------
This file contains the main function which executes the Metropolis within Gibbs algorithm described in the manuscript. This function calls the functions necessary to fit the mixture of factor analysers for mixed data model.

Many parameters are initialised at the start of the file:
* N -- the number of units/observations in the data set.
* J -- the dimension of the observation vector for each observation.
* D -- the required dimension of the latent continuous data (Z).
* Q -- the dimension of the latent trait (theta) for each unit.
* G -- the number of clusters being fitted to the data.
* CnsIndx -- the number of continuous items in the data set.
* OrdIndx -- the number of continuous items plus the number of ordinal items.
* Burnin -- the number of samples to be disregarded as burn in.
* Thin -- samples from every Thin^th iteration will be saved.
* MaxIter -- the number iterations to be performed not including the burn in period.
* Verb -- the iteration number will be printed to screen every Verb^th iteration


8. MFAMD_structs.c
---------------
This file contains functions to allocate and free memory as required for the structures used in the main function. The structures themselves are declared in MFAMD_structs.h. No alterations should be made to these functions.


9. MiscCFuncs.c
---------------
This file contains functions to perform miscellaneous tasks in C. Each function is preceded by a comment detailing its purpose. No alterations should be made to these functions.


10. RandVarSim.c
---------------
This file contains functions to generate deviates from various parametric probability distributions. Each function is preceded by a comment detailing its purpose. No alterations should be made to these functions.


11. rangen.c
---------------
This file is a freely available set of functions written by Simon Wood in C. A paragraph at the beginning of the file explains its terms of use along with some other information. No alterations should be made to these functions.


12. StatsBase.c
---------------
This file contains functions to perform basic statistical calculations in C. Each function is preceded by a comment detailing its purpose. No alterations should be made to these functions.


13. MFAMD_C_load.R
------------------
This file contains R code which will import and process the output from the C program. The output is arranged into arrays of appropriate size and posterior means are calculated. Rotations to identify the model are performed here. 

