
 
		IMPROVED.PERSONALIZED.SURVIVAL.PREDICTION.OF.PATIENTS.WITH. 
 DIFFUSE.LARGE.B9CELL.LYMPHOMA.USING.GENE.EXPRESSION.PROFILING. 
 . 
 Adrián%Mosquera%Orgueira,%Miguel%Cid%López,%José%Ángel%Díaz%Arias,%Andrés%Peleteiro%Raindo,%Beatriz% 
 Antelo% Rodríguez,% Carlos% Aliste% Santos,% Natalia% Alonso% Vence,% Ángeles% Bendaña% López,% Aitor% Abuín% 
 Blanco,% Laura% Bao% Pérez,% Marta% Sonia% González% Pérez,% Manuel% Mateo% Pérez% Encinas,% Máximo% 
 Francisco%Fraga%Rodríguez%and%Jose%Luis%Bello%Lopez%% 
 % 
 University1Hospital1of1Santiago1de1Compostela,1Santiago1de1Compostela,1Spain1 
 % 
 Background% 
 30U40%% of% patients% with% Diffuse% Large% BUcell% Lymphoma% (DLBCL)% have% an% adverse% clinical% 
 evolution.% The% increased% understanding% of% DLBCL% biology% has% shed% light% on% the% clinical% 
 evolution% of% this% pathology,% leading% to% the% discovery% of% prognostic% factors% based% on% gene% 
 expression% data,% genomic% rearrangements% and% mutational% subgroups.% Nevertheless,% 
 additional%efforts%are%needed%in%order%to%enable%survival%predictions%at%the%patient%level.%This% 
 study% investigated% new% machine% learning% models% of% survival% based% on% transcriptomic% and% 
 clinical%data.% 
 Methods% 
 Gene%expression%profiling%(GEP)%in%2%different%publicly%available%retrospective%cohorts%were% 
 analyzed.% Cox% regression% and% unsupervised% clustering% were% performed% in% order% to% identify% 
 probes%associated%with%overall%survival%on%the%largest%cohort.%Random%forests%were%created% 
 to% model% survival% using% combinations% of% GEP% data,% COO% classification% and% clinical% 
 information.% CrossUvalidation% was% used% to% compare% model% results% in% the% training% set,% and% 
 Harrel’s%concordance%index%(cUindex)%was%used%to%assess%model’s%predictability.%Results%were% 
 validated%in%an%independent%test%set.% 
 Results% 
 233% and% 64% patients% were% included% in% the% training% and% test% set,% respectively.% Initially% we% 
 derived%and%validated%a%4Ugene%expression%clusterization%that%was%independently%associated% 
 with% lower% survival% in% 20%% of% patients.% These% genes% 
 were%TNFRSF9,%BIRC3,%BCL2L11and%G3BP2.% Thereafter,% we% applied% machineUlearning% models% 
 to% predict% survival.% A% set% of% 102% genes% was% highly% predictive% of% disease% outcome,% 
 outperforming% available% clinical% information% and% COO% classification.% The% final% best% model% 
 integrated%clinical%information,%COO%classification,%4UgeneUbased%clusterization%and%50%gene% 
 expression%data%(training%set%cUindex,%0.8404,%test%set%cUindex,%0.7942).% 
 Conclusion% 
 This% study% indicates% that% modelling% DLBCL% survival% with% transcriptomicUbased% machine% 
 learning% algorithms% can% largely% outperform% other% important% prognostic% variables% such% as% 
 disease%stage%and%COO.% 
 % %