Skip to main content

Table 1 Baseline characteristics

From: The impact of different imputation methods on estimates and model performance: an example using a risk prediction model for premature mortality

Characteristic

Original

dataset

Complete

case

Mode

imputation

Single

imputation

Multiple

imputation

[min - max]1

FEMALE

(n = 267,000)

n=(221,000)

(n = 267,000)

(n = 267,000)

(n = 267,000)

Premature deaths (%)

1.4

1.3

1.4

1.4

1.4

Follow-up time, mean (SD), years

4.9 (4.5)

4.9 (4.2)

4.9 (4.5)

4.9 (4.5)

4.9 (4.5)

Age Group (%)

     

Missing

0.0

-

-

-

-

18–24

12.7

11.4

12.7

12.7

[12.7–12.7]

25–34

17.4

18.4

17.4

17.4

[17.4–17.4]

35–44

20.6

22.3

20.6

20.6

[20.6–20.6]

45–54

21.3

21.5

21.3

21.3

[21.3–21.3]

55–64

17.0

16.1

17.0

17.0

[17.0–17.0]

65–74

11.1

10.3

11.1

11.1

[11.1–11.1]

Household income quintile

     

Missing

10.9

-

-

-

-

Q1 (lowest)

13.8

15.1

21.1

15.5

[15.5–15.6]

Q2

14.5

16.2

14.5

16.4

[16.3–16.5]

Q3

18.5

20.8

18.5

20.8

[20.8–20.8]

Q4

22.0

24.9

25.6

24.7

[24.6–24.8]

Q5 (highest)

20.3

23.0

20.3

22.5

[22.5–22.6]

Individual education

     

Missing

4.5

-

-

-

-

Less than secondary school graduation

7.1

7.2

7.1

7.6

[7.6–7.6]

Secondary school graduation

10.5

10.8

10.5

11.1

[11.1–11.2]

Post-secondary education (complete and partial)

77.9

82.0

82.4

81.4

[81.3–81.4]

Marital status

     

Missing

0.1

-

-

-

-

Single never married

22.1

20.6

22.1

22.1

[22.1–22.1]

Domestic partner (married/common law)

63.6

64.8

63.7

63.7

[63.6–63.7]

Widowed/separated/divorced

14.2

14.6

14.2

14.2

[14.2–14.2]

BMI categories

     

Missing

3.4

-

-

-

-

< 18.5

3.9

3.8

3.9

4.0

[4.0–4.0]

18.5 to < 25.0

51.1

52.6

54.5

52.7

[52.7–52.7]

25.0 to < 30.0

25.8

26.9

25.8

26.8

[26.8–26.8]

30.0 to < 35.0

10.5

11.1

10.5

10.9

[10.9–11.0]

35.0 to < 40.0

3.5

3.7

3.5

3.6

[3.6–3.6]

≥ 40.0

1.8

1.9

1.8

1.9

[1.9–1.9]

Self-perceived general health

     

Missing

0.1

-

-

-

-

Poor

2.7

2.5

2.7

2.7

[2.7–2.7]

Fair

8.6

8.2

8.6

8.6

[8.6–8.6]

Good

28.3

27.5

28.3

28.3

[28.3–28.3]

Very good

38.0

38.8

38.1

38.0

[38.0–38.0]

Excellent

22.4

23.0

22.4

22.4

[22.4–22.4]

Physical activity2

     

Missing

1.6

-

-

-

-

Active

22.4

22.7

22.4

22.8

[22.7–22.8]

Moderately active

25.4

26.2

25.4

25.7

[25.7–25.8]

Inactive

50.6

51.2

52.2

51.5

[51.5–51.5]

Smoking status

     

Missing

3.1

-

-

-

-

Never smoker

54.0

54.7

57.1

55.7

[55.7–55.7]

Former light smoker

16.6

17.7

16.6

17.1

[17.1–17.2]

Former heavy smoker

5.0

5.3

5.0

5.2

[5.2–5.2]

Current light smoker

18.4

19.3

18.4

19.0

[19.0–19.0]

Current heavy smoker

2.9

3.0

2.9

3.0

[3.0–3.0]

Self-reported chronic conditions

     

Emphysema/COPD

     

Missing

0.1

-

-

-

-

Yes

1.7

1.7

1.7

1.7

[1.7–1.7]

Heart disease

     

Missing

0.1

-

-

-

-

Yes

3.3

3.2

3.3

3.3

[3.3–3.3]

Diabetes

     

Missing

0.1

-

-

-

-

Yes

4.7

4.5

4.7

4.7

[4.7–4.7]

Cancer

     

Missing

0.1

-

-

-

-

Yes

1.8

1.8

1.8

1.8

[1.8–1.8]

Stroke

     

Missing

0.0

-

-

-

-

Yes

0.8

0.7

0.8

0.8

[0.8–0.8]

MALE

(n = 233,000)

n=(195,000)

(n = 233,000)

(n = 233,000)

(n = 233,000)

Premature deaths (%)

2.1

1.9

2.1

2.1

[2.1–2.1]

Follow-up time, mean (SD), years

     

Age Group

     

Missing

0.0

-

-

-

-

18–24

13.6

11.9

13.6

13.6

[13.6–13.6]

25–34

18.5

19.0

18.5

18.5

[18.5–18.5]

35–44

20.9

22.5

20.9

20.9

[20.9–20.9]

45–54

20.5

21.0

20.5

20.5

[20.5–20.5]

55–64

16.8

16.4

16.8

16.8

[16.8–16.8]

65–74

9.9

9.3

9.9

9.9

[9.9–9.9]

Household income quintile

     

Missing

8.2

-

-

-

-

Q1 (lowest)

10.6

11.2

10.6

11.6

[11.6–11.7]

Q2

12.4

13.3

12.4

13.6

[13.6–13.7]

Q3

17.8

19.3

17.8

19.5

[19.4–19.5]

Q4

24.5

26.9

27.2

26.7

[26.7–26.8]

Q5 (highest)

26.4

29.2

31.9

28.6

[28.5–28.6]

Individual education

     

Missing

6.0

-

-

-

-

Less than secondary school graduation

6.2

6.4

6.2

6.8

[6.8–6.8]

Secondary school graduation

10.3

10.8

10.3

11.1

[11.1–11.2]

Post-secondary education (complete and partial)

77.4

82.8

83.5

82.1

[82.1–82.1]

Marital status

     

Missing

0.1

-

-

-

-

Single never married

26.8

24.3

26.8

26.8

[26.8–26.8]

Domestic partner (married/common law)

65.7

68.0

65.8

65.8

[65.7–65.8]

Widowed/separated/divorced

7.4

7.7

7.4

7.4

[7.4–7.5]

BMI categories

     

Missing

1.5

-

-

-

-

< 18.5

1.1

1.0

1.1

1.1

[1.1–1.1]

18.5 to < 25.0

39.6

39.5

41.0

40.1

[40.1–40.2]

25.0 to < 30.0

40.1

41.2

40.1

40.7

[40.6–40.7]

30.0 to < 35.0

13.6

14.0

13.6

13.8

[13.8–13.9]

35.0 to < 40.0

3.1

3.2

3.1

3.1

[3.1–3.1]

≥ 40.0

1.1

1.1

1.1

1.1

[1.1–1.1]

Self-perceived general health

     

Missing

0.1

-

-

-

-

Poor

2.4

2.2

2.4

2.4

[2.4–2.4]

Fair

7.8

7.4

7.8

7.8

[7.8–7.8]

Good

28.4

28.2

28.4

28.5

[28.5–28.5]

Very good

37.9

38.8

37.9

37.9

[37.9–37.9]

Excellent

23.4

23.4

23.4

23.4

[23.4–23.4]

Smoking status

     

Missing

3.5

-

-

-

-

Never smoker

44.4

45.6

47.9

46.0

[46.0–46.1]

Former light smoker

16.7

17.8

16.7

17.3

[17.3–17.3]

Former heavy smoker

9.4

9.9

9.4

9.8

[9.8–9.9]

Current light smoker

20.3

20.8

20.3

21.0

[20.9–21.0]

Current heavy smoker

5.7

5.8

5.7

5.9

[5.9–5.9]

Self-reported chronic conditions

     

Emphysema/COPD

     

Missing

0.1

-

-

-

-

Yes

1.5

1.5

1.5

1.5

[1.5–1.5]

Heart disease

     

Missing

0.1

-

-

-

-

Yes

4.8

4.8

4.8

4.8

[4.8–4.8]

Diabetes

     

Missing

0.1

-

-

-

-

Yes

5.8

5.7

5.8

5.8

[5.8–5.8]

Cancer

     

Missing

0.1

-

-

-

-

Yes

1.5

1.5

1.5

1.5

[1.5–1.5]

Stroke

     

Missing

0.0

-

-

-

-

Yes

0.9

0.8

0.9

0.9

[0.9–0.9]

Arthitis

     

Missing

0.1

-

-

-

-

Yes

11.7

11.5

11.7

11.7

[11.7–11.7]

Alzheimer’s

     

Missing

0.1

-

-

-

-

Yes

0.2

0.1

0.2

0.2

[0.2–0.2]

  1. 1Multiple Imputation created 5 datasets, as such this the range from the minimum percent within each dataset to the maximum percent in each dataset
  2. 2Physical activity was measured using average metabolic equivalent of task (MET) per day derived from a list of leisure-time physical activities (frequency and duration of activity)