Introduction

The enormous incidence of pre-organ failure conditions and metabolic disorders, such as obesity, type 2 diabetes, hypertension, and dyslipidemia, has gained attention worldwide and in Japan. Over time, such conditions and disorders are linked to fatal organ failures, including the heart, lung, pancreas, liver, kidney, muscle, and brain. In this study, we analyzed the big data of the National Database (NDB), which contains information for over 10 million general residents of Japan annually provided by the country’s Ministry of Health, Labour and Welfare (MHLW). Most of the individuals were apparently healthy, but some already had diseases and fatal conditions.

The health-care NDB can be used to provide accurate reference values for specific clinical parameters as well as the disease proportions according to several categories, such as sex, age-group, smoking, alcohol consumption, morbidities, and limited conditions for some parameters. Big data offers advantages when investigating conflicting observations obtained from studies with small samples or rare conditions that are often neglected and overlooked. Further, big data and artificial intelligence are very compatible. Thus, in our analysis, we were able to make accurate predictions of diseases using baseline parameters and conditions. Our assessment of the variables could help in interpreting the results. The NDB contains self-reported medical history, details of health checkups (clinical parameters), disease names (ICD-10 classification), prescribed drugs, and medical remuneration points.

In the light of the above, we examined the current states of cardiometabolic diseases and other conditions* and the underlying mechanisms. In this NDB-K7Ps Study, we investigated the data of the NDB (about specific health checkups and related health-care data) in the seven prefectures (Tokyo, Kanagawa, Saitama, Chiba, Ibaraki, Gunma, and Tochigi) of Japan’s Kanto region (Figure 1). Figure 2 indicated the percentage of medical checkups in Japan and Kanto region.

Figure 1. Location

*Type 2 diabetes, hypertension, dyslipidemia, kidney disease, malnutrition (underweight and obesity), liver diseases estimated through hepatic enzymes, and other health conditions (e.g., hearing loss, restorative sleep, physical inactivity).

Figure 2.
Figure 2. the percentage of medical checkups  

Created by processing "Data of specific health checkups and health guidance" (Ministry of Health, Labour and Welfare) (https://www.mhlw.go.jp/stf/newpage_03092.html)

Methods and analysis

Cross-sectional studies: We undertook a series of population-based cross-sectional studies for 2008–2018. The sample size ranged from 7 million to 10 million people annually. In these cross-sectional studies, individuals aged 40–74 years who were apparently healthy underwent voluntary checkups at assigned health facilities. Owing to the cross-sectional nature of the studies, no conclusions about causalities among diseases, parameters, and lifestyles could emerge.

Cohort studies: For the observational cohort studies of individuals aged 40–64 years, we investigated the potential associations and causalities for various etiologies over the same 10-year period. In the cohort studies, we did not use the data for subjects aged 65 years or over at baseline because the NDB does not enter the data of individuals aged 75 years or over. Abnormal values above or below the reference value can be influenced by the regression toward the mean [1]. To prevent the effect of regression toward the mean with the outcomes, we checked as far as possible for duplicate confirmations or recurrent measurements over 2 consecutive years. Using multidisciplinary analysis (including machine learning, a function of artificial intelligence), we expected to obtain a wide range of novel findings: those we believed could confirm previous indeterminate findings (especially for cardiometabolic diseases and other conditions*) and provide new perspectives for health promotion and disease prevention.

Ethics and dissemination:

Ethical approval was received from the ethical committee for experimental research involving human subjects of Japan Women’s University (No. 513). The protocol was approved in September 2020 by the MHLW (No. 1320). The study results will be disseminated through open platforms, including journal articles, relevant conferences, and seminar presentations.

Available data in this research

Clinical data from specific health checkups (Tables 1 and 2)

Names of diseases diagnosed by doctors classified according to the ICD-10 (Table 3)

Name of disease and ICD-10 code : https://icd.who.int/browse10/2019/en

Administered medicinal substances (pharmaceutical name) (Table 4)

Statistical analysis

SAS-Enterprise Guide (SAS-EG 7.1) in the SAS system, version 9.4 (SAS Institute, Cary,

 North Carolina, USA)

STATA/MP, version 17.0 (Stata Corp LLC, College Station, TX , USA)

Artificial intelligence

Prediction One, artificial intelligence  with the gradient-boosting algorithm XGboost,

explanatory artificial intelligence using permutation feature importance

 (Prediction One, Sony Network Communications Inc., Tokyo, Japan) [2]

SAS-EG Enterprise Minor

References

1. Bland JM, Altman DG. Some examples of regression towards the mean. BMJ. 1994;309(6957):780. doi: 10.1136/bmj.309.6957.780.

2. Sony Network Communications, Prediction One. 2020. Available online: https://www.predictionone.sony.biz (accessed on 3 October 2022).

Financial disclosure: This research received no external funding.

Conflict of interest: The authors declare no conflict of interest.

Informed Consent: Informed consent was not required because of the anonymous data of the MHLW as part of its nationwide program involving the provision of medical data to third parties. We have published the study protocol online (https://www.jwu.ac.jp/unv/education-research/NationalDatabase.html).

Most of the protocol for the NDB-K7Ps: Most of the protocol for the NDB-K7Ps study overlaps with that of our previous study protocol for the Kanagawa Investigation of the Total Check-up Data from the National Database (KITCHEN)—except for disease names, prescribed drugs, and medical remuneration points. Please also refer to the content in the KITCHEN protocol (published by BMJ Open in February 2019).