Table 1.

Characteristics of the publicly available gene expression data sets

NKI
NKI2
STNO2
NCI
MGH
UPP
STK
VDX
VDX2
UNT
UNC
TBAGD
TBVDX
TAM
No. patients (%)
Total76295118996011015928618078144109198268
Age (y)
    ≤5053 (70)264 (90)37 (31)29 (30)2 (4)25 (22)53 (33)129 (45)50 (28)29 (38)57 (40)62 (56)142 (71)21 (8)
    >5023 (30)31 (10)81 (69)70 (70)58 (96)85 (78)106 (67)157 (55)129 (72)49 (62)79 (55)47 (44)56 (29)247 (92)
Size (cm)
    ≤246 (60)155 (52)19 (17)36 (37)29 (49)72 (66)0278 (97)95 (53)49 (63)104 (73)66 (60)102 (51)110 (42)
    >230 (40)140 (48)94 (80)63 (63)31 (51)38 (35)08 (3)85 (47)29 (37)30 (21)43 (40)96 (49)158 (58)
Nodal status
    Negative57 (75)151 (51)34 (29)46 (47)28 (47)75 (69)0286 (100)180 (100)78 (100)62 (44)109 (100)198 (100)116 (44)
    Positive19 (25)144 (49)79 (67)53 (53)25 (42)29 (27)000075 (53)00143 (54)
Tumor grade
    19 (11)75 (25)11 (10)16 (17)3 (5)31 (29)28 (18)7 (3)17 (10)20 (26)12 (9)17 (16)30 (16)50 (19)
    218 (24)101 (35)49 (42)38 (38)39 (65)53 (49)58 (37)42 (15)81 (45)30 (39)46 (32)44 (41)83 (42)131 (49)
    349 (65)119 (40)53 (45)45 (45)18 (30)25 (22)61 (39)148 (52)56 (32)15 (20)74 (52)42 (39)83 (42)47 (18)
Estrogen receptors
    Negative28 (37)69 (24)31 (27)34 (35)1 (1)16 (15)29 (19)77 (27)47 (27)21 (27)54 (38)26 (24)64 (33)5 (2)
    Positive48 (63)226 (76)82 (70)65 (65)59 (99)92 (84)130 (81)209 (73)132 (73)53 (68)82 (57)78 (72)134 (67)263 (98)
Treatment
    Untreated76 (100)165 (55)22 (19)11 (11)0110 (100)0286 (100)180 (100)78 (100)0109 (100)198 (100)0
    Treated0130 (45)96 (81)88 (89)60 (100)048 (31)000144 (100)00268 (100)
    PlatformAgilentAgilentStanford MicroarraycDNA NCIArcturusAffymetrixAffymetrixAffymetrixAffymetrixAffymetrixAgilentAgilentAffymetrixAffymetrix
    Reference(1)(2)(17)(18)(19)(9)(20)(3)(4)(7)(21)(22)(23)(24)
  • NOTE: Note that some samples are used in several studies. The following study IDs have samples in common: NKI/NKI2 and UPP/STK/UNT/TBAGD/TBVDX/TAM. For all analyses, we removed duplicated patients from small data sets (e.g., NKI) to avoid decreasing the sample size of large data sets (e.g., NKI2).