Strategic Plan Workgroup Reports and Past Plans

Strategic Plan (2016 - 2020)

NIDA has released its Strategic Plan to provide a framework for the research it will support over the next five years. Thank you to everyone who sent comments in response to the Request for Information (RFI) and who participated in one of the three strategic planning workgroups.

Please direct all inquiries to:
Office of Science Policy and Communications
National Institute on Drug Abuse (NIDA)
Telephone: 301-443-6071
Email: NIDAOSPCPlanning@mail.nih.gov

Strategic Planning Workgroup - Big Data

Final Workgroup Proposal - Big Data

Workgroup Proposal

Introduction

Big Data is a technical term used to describe the varying growth and availability of complex data (both structured and unstructured) whose management exceeds the power of traditional data processing resources. It is generally characterized as having the “Three Vs”, high volume, high velocity, and high variety (and occasionally with additional Vs: high veracity and high value). Collection of Big Data is scaling at an unprecedented rate. Data now stream from daily life, from credit cards and televisions, computers and social media; bio- and motion-sensors, GPS and other data capturing devices, including those embedded into smart phones and ‘wearable technology’ such as smart watches. Ecological Momentary Assessments are regularly combined with these and other sensor and GPS data, while Geolocation Momentary Assessment (GMA) is successfully implemented as a rapidly expanding tool in substance abuse research. Data collected from rapidly developing technologies in various biomedical and behavioral fields, including next-generation sequencing, epigenetics, genomics, epidemiology, neuroimaging, and organizational and services research are also growing fast and allowing researchers the unprecedented ability to capitalize on vast amounts of data. These technological advances, however, all face the common challenge of turning complex information into usable and manageable data.

New tools and technologies to capture, process, share and store Big Data are both goals and challenges, as data are also being produced at a rate outpacing the development of storage technologies. Accessing and using Big Data despite the plethora of format types presents a clear challenge, and the sheer volume of Big Data makes data transport a resource-intensive feat even with the fastest communications networks. The complexity of Big Data imposes enormous computational and resource challenges. Keeping Big Data secure and private, likewise, is challenging in an era of system intrusions and soft-espionage. Combining data from various sources and formats, requires implementation of existing data standards to the degree possible, which can be achieved via usage of common data elements, shared ontologies and data dictionaries. Curation and analysis of data, analysis tools (including machine learning and artificial intelligence techniques), and data visualization all pose challenges that need to be addressed. NIDA should not only be cognizant of, but also be prepared for these challenges and take advantage of the resulting opportunities over the course of the next 5-10 years. In this proposal, we consider the opportunities and challenges of Big Data as they pertain to NIDA. The appendix contains a partial list of relevant resources.

Priority 1: Big Data Sharing

Data sharing is an essential and complex component in leveraging Big Data. Harnessing large quantities of data generated world-wide has numerous methodological, ethical and economic advantages, but it requires the neuroscience community to adopt a culture of data sharing and open access to realize this benefit.

Challenges and Opportunities

Key issues for sharing Big Data include providing data according to the “FAIR” principles (Findable, Accessible, Interoperable, and Reusable; https://force11.org/groups/force11-rda-fairsharing-working-group/), including distribution and aggregation mechanisms, storage, and ensuring the security of data and privacy of research subjects. To maximize research investment and value, it is critical to capitalize on the potential of big data using FAIR principles:

Findable: To be useful, Big Data must be easily and efficiently searchable. There are novel models for enhancing the discoverability of data in public and private sectors, of which NIDA should take advantage. The NIH Big Data 2 Knowledge (BD2K) project, which is developing a pilot solution (Data Discovery Index, DDI), may present NIDA with a great opportunity in this regard. Once data sets are found, they should be able to be interrogated for various scientific applications (e.g., specific sets of genes, chromatin modifications, brain regions, etc.)
Accessible: Management of multiple credentials may be a deterrent for users, but can be resolved through consolidation of credentials across data sources. NIDA should take advantage of activities in the extramural community that aim to improve the consent process, including the possible development of a “universal consent”. Security and privacy are extremely important issues; the extant NIH security regulations appear to suffice to secure the data, although constant vigilance is required. Regarding privacy, NIDA should consider solutions proven successful for various data sources (e.g., dephasing imaging data for open access). Accessibility also includes issues such as persistent storage, authenticated access and usability, which point to metadata and other relevant issues.
Interoperable: A key challenge to interoperability is the identification of common data elements (CDE, see glossary) as required at the level of the data element itself as well as the associated clinical data. Effective development/identification of CDEs requires expertise in the relevant research domains. Unfortunately, the current lack of consensus on optimal data elements for various domains of cognitive, behavioral and psychiatric function, presents a significant challenge in this regard. In particular, it fosters continued duplications of efforts in the creation of assessment instruments/nomenclatures and limits the ability to aggregate data from different studies without the need for tedious data-mapping efforts. To this end, securing the support of the user community and defining their role in the process is paramount to success. NIDA needs to be at the forefront of efforts to homogenize CDEs. A great resource for NIDA is extant NIH CDE repositories and other similar resources (see Appendix Section 1).
Reusable: The need for usability is generally met by having followed the first three requirements of the FAIR model, plus assuring that the data are sufficiently well described, enjoy the necessary levels of richness of metadata and provenance, and are standardized enough such that they can be utilized in future research, preferably with minimal human effort.
- As a first step in this effort, NIDA is developing the NIDA “Addictome” Portal, which will provide data coordination, visualization and analysis tools that can be used by the scientific community to mine and visualize multi-scale data sets in a user-friendly, 4D-framework. This portal moves NIDA towards creating a big data resource generated through investigator-initiated studies to enable data mining and identify emergent opportunities across seemingly disparate data sets. The Addictome Portal is a very promising platform that can increase efficiency and provide a process by which NIDA can adopt Big Data Science as an integrated component of the research portfolio.
- Another challenge in sharing of Big Data is the transfer time and cost. Single-site storage and analysis platforms, such as the NIH Commons, which utilizes compliant cloud services, may be one approach to mitigate this issue. Cloud computing minimizes data transfer cost/time, and can be accomplished through colocation of data and analysis software, and by providing pre-packaged computing environments facilitating use by researchers. Complementing this solution, are mechanisms that limit the need for data transfer (e.g., federated solutions, the sharing of intermediate/processed forms of the data). Each of these models has their relative advantages and merit exploration.
- Additionally, there is a need to create a culture valuing data sharing through incentives, and impactful attribution as has occurred in the genetics community. NIDA should support various practices and approaches that encourage equal value being placed on citations of an investigator’s shared data and citations of research articles. For example, (1) pilot studies such as the “Commons Credits” provided through the BD2K, should be evaluated for its efficacy, (2) NIDA should support extant plans for data citation and data sharing at NIH and elsewhere, and (3) encourage the education of NIH study sections and university promotion committees as to the considerable value of data depositions and citations.

Priority 2: Big Data Capture & Formats

Big Data capture requires research workflows and complementary technologies that allow data types and formats to be recorded by investigators in a form that enables subsequent re-analysis and integration with other data. The following considerations regarding data capture and formatting are key: capturing tools hardware and software; data type, format, and descriptive metadata; intended data use; and the ability to adapt to new analysis opportunities unforeseen at the time of data capture.

Challenges and Opportunities

The use of common data formats and elements, ontologies, data dictionaries, and public application programming interfaces (APIs) are extremely important, especially as available datasets expand and increase in complexity.

NIDA should leverage the existing interoperability resources at NIH and in extramural communities and only engage in developing its own formats, CDEs, or ontologies only where they do not already exist. As the goal of Big Data is collaboration, NIDA is encouraged to participate in and contribute to trans-NIH efforts, including the Trans-NIH Biomedical Informatics Coordinating Committee (BMIC) CDE Working Group and the BD2K Standards Coordinating Center (SCC).
Data format and capture need thoughtful consideration because they rapidly evolve. Ideally, all data should be formatted and annotated so that they can be effectively and efficiently used with minimal effort by researchers. To facilitate the application of natural language processing and machine learning, and techniques for achieving a streamlined and automated solution, the data producer must employ standardized data and metadata annotation, which must be provided with the dataset.
NIDA should facilitate and promote the use of open formats (vs. proprietary or closed formats) and the sharing and exchange of data (see Appendix Section 2: The NeuroData Without Borders Initiative). NIDA should also provide researchers with resources and technologies to discover and use these formats and application programming interfaces (APIs). It is also imperative to provide resources for training investigators to incorporate effective data capture as part of their research workflow.
NIDA must also consider big data that are emerging from technological developments, such as electronic health record systems (EHRs) and social media. Interoperable EHRs will become an important means of capturing and standardizing clinical research data which can be merged with, e.g., genome sequence data. EHR data from the nation’s healthcare systems provide opportunities to ascertain demographic, co-morbid, and complex phenotypes including substance use disorders. These systems enable big data capture involving large numbers of patients that can be aggregated through interoperable EHRs and through big data science methods. This synthesis would enable NIDA researchers to correlate genotypic data with clinical phenotypic data and accelerate big-data analyses to inform Precision Medicine. It would also better integrate drug abuse efforts within primary care settings where most screening and initial intervention is needed. NIDA is also encouraged to support advanced artificial intelligence solutions for analyzing clinical and non-clinical semi-structured or unstructured data relevant to NIDA’s mission. Social media is emerging as a promising data source for epidemiological research and the methods and ethics of capturing and utilizing this big data are quickly evolving.

Priority 3: Data Curation, Storage, Analytics and Visualization

Data curation, storage, analytics and visualization resources are critical for quality control and to maximize data use and reuse. These topics relate to the user experience, and often involve graphical user interfaces (GUIs). The quality of the user’s experience in using the data from data repositories predicts the success of any data sharing initiative. Consultation with library scientists who have expertise in data curation is recommended.

Challenges and Opportunities

Curation: Data curation, including monitoring data quality and introduction of error, is seminal to ensure the quality of the data, and thus the quality of the user experience in sharing and using data. Curation failure is one of the largest threats to the sustainability of current data sharing initiatives. To counteract the threat of “garbage in, garbage out”, it is essential to use mechanisms that support the sustainability of databases in general; these include subscription fees, pay per use, “freemium” (where users pay to upgrade an extended service) and government support. Curation requires additional personnel, however, the aforementioned methods may help to underwrite curation cost.
Storage: Large storage capacity is needed to accommodate the growth of primary, pre- and post-processed data as well as the analyzed results and related software. Different models—such as a federated network, centralized archives and cloud based or yet to emerge dynamical storage systems—should also be considered. Storage solutions from other domains may be instructive for NIDA. Understanding usage patterns, raw capacity, bandwidth needs and cyclical demands (and the extent to which different types of data are utilized) will inform how to store data and for how long. The format and accessibility of data may be modified to reduce storage costs as data progress from highly utilized to obsolete. The private sector has been dealing with these issues longer and at a greater scale, and these solutions should be considered.
Understanding data usage can inform the types and formats of data to store and maintain, and for how long. To this end tracking data usage in big systems can help to anticipate users’ current and future data analytical needs. However, such needs will always be changing. As technologies change, data collection methods should remain fluid to permit the capture of any relevant data type. Long term active management and curation of any dataset will be essential to keeping it relevant and useful.
It is important to note that the generation of consensus metrics and guidelines for data quality is a prerequisite for data curation. Several fields (e.g., neuroimaging) are yet to converge on such a consensus, limiting hopes for effective data curation at the present time. Future investment in the determination of consensus metrics and guidelines is essential.
Analytics: To maximize value, analytics that cross scientific disciplines, data types, and levels of analysis are paramount. For example,
- Future researchers may find value from the combined analysis of imaging, genetics, and behavioral data analysis from the same individual. Overlaying sequencing data across species is another valued analysis.
- There is a need for more than the creation of a data library: tools are needed that allow access to diverse but related datasets, from different researchers, for the purposes of alternate analyses and new kinds of analysis.
- Methods need to be developed to understanding and model complex and high-dimensional data, such as that that will emerge from large, complex studies such as the PATH study or the ABCD study. These datasets of the future will require constant curation, attention and annotation.
- To mitigate some of the difficulties in Big Data transfer, the data storage resource can also provide a computational functionality via standardized virtual machines with common analysis tools and pipelines. Computational tools that can operate in a distributed fashion, including on users’ devices, while maintaining data privacy, could be especially useful.
Visualization: More advanced techniques/tools would improve our ability to more rapidly visualize data. Currently data analysis and visualization techniques are being developed in disparate fields, but there are many opportunities for analytic advances in one field to be applied to others. To encourage development in this area, NIDA should:
- Encourage development and use of sophisticated machine learning tools/techniques such as hierarchical aggregation, for viewing large and complex data.
- Engage collaboration with experts who may not be part of the NIH community (e.g., video game developers, data visualization experts, and behavioral and social science researchers)

Summary and Recommendations:

To maximize the potential of big data, scientists and users from diverse areas need to be able to find data easily and to use them in new ways. Integrating data across the continuum from basic research to health care, including big data science, is critical to advancing the Institute of Medicine's vision of a "Learning Health Care System" and that of the President’s Precision Medicine Initiative. The Big Data Working Group endorses Big Data science as an area of high priority for NIDA to pursue, and believes that the NIDA Addictome provides a model process for Big Data implementation. The Big Data Working Group reiterates the following recommendations:

NIDA should adhere to widely-used practices or use existing technologies and resources such as common data elements, formats, data dictionaries and ontologies, and only create new ones when they do not exist or insufficiently address requirements of substance abuse research.
NIDA should actively look for the options and solutions that the NIH BD2K project is planning to make available such as the NIH Commons and Data Discovery Index (DDI).
NIDA should take advantage of activities in the extramural and scientific research communities that highly impact conduct of research, such as various practices and approaches toward a “universal consent” or efforts that encourage placing high value on citations of an investigator’s shared data and citations of research articles.
NIDA needs to be at the forefront of efforts for the development of platforms that allow data to be easily integrated from diverse sources that are replicable, validated, standardized and that can be repurposed for future research. The Addictome is a critical model in addressing these needs.

Appendix: Resources

Section 1: Data Sharing

Some of the available resources are:

NIH Commons and commercial cloud services
NIH Big Data to Knowledge Initiative (BD2K)
Neuroscience Information Framework (NIF) , BioCaddie/Data Discovery Index
National Addiction and HIV Data Archive Program (NAHDAP)
1000 Functional Connectomes Project/International Neuroimaging Data-sharing Initiative (INDI)
OpenFMRI.org
Connectome Coordinating Facility
Preprocessed Connectomes Project (PCP)
The Collaborative Informatics and Neuroimaging Suite (COINS)
National Database for Autism Research (NDAR)
Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC)
Observational Health Data Sciences and Informatics (OHDSI)
White House Report to President on Big Data and Privacy, 2014
Culture: Research Data Alliance, Force11
Nature Scientific Data
GigaScience

Section 2: Data Capture and File Formats

Some of the available resources are:

CDEs, Standards: NCBO, Biosharing repository of standards and formats, MIAME, Minimum Information about an Electrophysiology Experiment, INCF Task Force: Neuroimaging Data Sharing Task Force, INCF Task Force: Requirements for storing electrophysiology data, Neuroscience Information Framework Ontologies - Standardized (NIFSTD) , NeuroLex, caDSR, NIAID IMMPORT, NIH Toolbox, Clinical Data Interchange Standards Consortium, Submission Data Standards Team, Clinical Data Acquisitions Standards Harmonization, Minimum Information for Biological and Biomedical Investigators, The Ontology for Biomedical Investigators, NEMO, BFO, http://www.obofoundry.org/, Bioontology.org, MINI – Minimum Information for a Neuroscience Investigation, NINDS epilepsy, spinal cord injury and TBI CDEs, NeuroNames, Terminology services: CTS2, LexEVS, OntoQuest, ISO/IEC 11179 metadata standard.
NIH Common Data Element (CDE) Resource Portal (https://cde.nlm.nih.gov/home)
NIDA CTN Common Data Elements (https://cde.nida.nih.gov/ )
PhenX Toolkit (https://www.phenxtoolkit.org/index.php)
W3C HCLS Dataset Description (https://www.w3.org/2001/sw/hcls/notes/hcls-dataset/)
W3C Datacube vocabulary (https://www.w3.org/TR/vocab-data-cube/)
Data Standardization Efforts
NeuroData without Borders (https://www.nwb.org/)
Nifti

Section 3: Data Curation, Storage, Analytics, and Visualization

Gigadb, https://www.re3data.org/, CHORUS and SHARE, NIH BRICS, Scidrive, Intermine, NLM Listing of Databases
Domain-specific Data Storage: Mouse Genome Informatics, NITRC, NDAR, LONI, FITBIR, ModelDB, Channelpedia, Allen Brain Atlas, Biositemaps, Gene Wiki, https://www.wikipathways.org/index.php/WikiPathways, Open Source Brain, https://databrary.org/, NIAID IMMPORT, https://www.brainmap.org/index.html
Standardized cloud virtual machine to facilitate analysis: NITRC-CE (https://aws.amazon.com/marketplace/pp/B00DLI6VAQ) neuroimaging analysis tools and CloudBioLinux\

Glossary

Common Data Elements (CDEs) are standardized terms for collecting data across clinical research projects and resources. A CDE consists of both a precisely defined question and an enumerated set of possible values for responses (answers) and is intended for use in multiple clinical studies or resources, such as data repositories and patient registries. CDEs consisting of individual question/answer pairs can be combined into more complex questionnaires, survey instruments, or case report forms. CDEs offer substantive benefits to the biomedical research enterprise in terms of interoperability and data integration, repurposing, and sharing. More widespread use of CDEs can accelerate the start-up of new research project by providing a set of established data elements from which investigators can select. CDEs can improve the quality of data collection by fostering the use of data collection instruments which have been validated or vetted by expert groups. They can also improve big-data science by facilitating the comparison of results across research studies and enable the aggregation and analysis of data from multiple studies to provide new insight and/or greater statistical power.

CDEs are only as useful as the intended user community perceives them to be and the extent to which they are flexible enough to accommodate the diversity of research and rate of change. Therefore, securing the support of the user community and defining their role in the process of development is paramount to the successful development and application of CDEs. Furthermore, effective development/identification of CDEs requires expertise in many different domains. Expertise in the relevant research domains is necessary to identify measures of primary interest and assess their validity and viability in both research and practical settings. Such expertise can come from researchers, clinicians, and other health professionals, all of whom bring unique perspectives. Common Data Elements can also be determined through comparison of existing databases and data sets to identify fields that are invariably common to multiple platforms. Expertise in bioinformatics is necessary to develop or select data elements that are consistent with existing data standards, including those used in clinical care settings and electronic health records (EHRs), to define data elements in specific measurable terms, and to express data elements in ways that are both syntactically and semantically interoperable. Representatives of the patient community (e.g., patient advocates) can also bring valuable expertise and perspective, e.g., in identifying the measures of greatest interest to patients and in considering practical issues of data collection and administration.

When identifying data elements for inclusion in CDEs, it is preferable to select data elements that have been tested to establish their validity, reliability, sensitivity, and specificity to the condition of interest. Efforts should be made to validate data elements across the populations of interest, taking into consideration characteristics such as genetic information, race/ethnicity, socioeconomic status, or geographic areas that may be involved in a study.

Staff

NIDA co-chairs: Roger Little, PhD and Massoud Vahabzadeh, PhD
External Scientific Matter Experts: Christopher Chute, MD, DrPH; Maryanne Martone, PhD; Michael Milham, MD, PhD; Michael Neale, PhD; Eric Nestler, MD, PhD; Arthur Toga, PhD
NIDA staff: Ericka Boone, PhD; Philip Bourne, PhD; Maureen Boyle, PhD; Udi Ghitza, PhD; Steve Gust, PhD; Vani Pariyadath, PhD; Tom Radman, PhD; Joni Rutter, PhD; Tisha Wiley, PhD

Strategic Planning Workgroup - Complex Patients

Final Workgroup Proposal - Complex Patients

Workgroup Proposal

Introduction

The central concepts of the Workgroup’s recommendations are:

Substance Use Disorders (SUDs) commonly co-occur with psychiatric and medical conditions, disruptions in neurological, cognitive, self-regulatory, and other impaired behavioral and biological processes, and impoverishments and obstacles associated with economic, educational, employment, and other social-environmental conditions. Substance use problems which may not meet criteria for a SUD diagnosis are a major concern themselves, as well as a complicating condition of other psychiatric, behavioral, and medical conditions.

The most accurate description of at least some patterns of the SUD phenotype is a characterization involving a complex of disorders, dysfunctions and deficient conditions. Precise description of SUDs and substance use problems must be understood from a developmental and developmental psychopathology approach, considering the trajectory of substance use initiation, addiction, recovery, and relapse as they relate to developmental stages, transitions, and influences. Further, the high prevalence of psychiatric comorbidity involving SUDs — or at least substance use problems — in addition to the high prevalence of impaired behavioral development and developmental transitions of individuals with SUDs or other substance use problems strongly suggest that common underlying substrates may typically be involved.

There is general agreement that the dynamic interplay between problematic substance use and SUDs, on the one hand, and the complex of associated detrimental factors, on the other, has important implications for the status of the affected individuals and the interventions which may improve their well-being. Despite this, much SUD-focused research does not seek to address these interactions, and often deliberately attempts to screen out or control for the complexities, treating them as noise masking the signal.

Further, insufficient consideration of the complex relationships between coexisting SUDs and other impairing conditions has limited the development of maximally effective interventions. This has been magnified by inadequacies in the health care systems that do not provide services that are responsive to the multiple interconnected problems experienced by most patients with SUDs and substance use problems.

The Precision Medicine Initiative, implemented primarily by the NIH, envisions a near future in which treatment and prevention protocols take into account individual differences and needs as well as individual variability in genes, environment, and lifestyle. The workgroup endorses the view that this is a worthwhile ultimate clinical aim of what we hope will be a concerted effort to investigate the heterogeneity of people with SUDs and substance use problems. That critical clinical goal can only be reached by the careful accrual of systematic research to characterize the nature of the complexity; to evaluate the impact of various complex factors on substance use problems and disorders and vice versa; to develop and assess clinical methods for addressing complex issues in clinical practice; and to seek and evaluate approaches to optimize health care organizational and system considerations for effective integration of these methods.

The workgroup identified three research priority areas that can significantly increase our understanding of the complexities of people with substance use disorders and the continuum of substance use problems. Within each area, more specific research priorities are described in bullets expounding on the three areas; they are listed in order of importance and feasibility for each area. The bullets listed in bold type were ranked by the Workgroup as having the highest overall priority for Complexities research. The Workgroup ranked the underlined bullets as being in the second tier of overall priority with those in plain type being in third tier. However, all bullets represent priorities that the Workgroup endorses.

Priority 1

Conduct research to characterize substance-using individuals with complex conditions that relate to multiple dysfunctions and problems and study their different trajectories of substance use initiation, addiction, recovery, and relapse. This includes those with diagnosable substance use disorders, as well as individuals with emerging, concurrent, and manifest substance use problems that do not meet criteria for diagnosis of a clinical disorder.

Approaches

Investigate the influence of genetic and environmental factors, rearing environment and family function, maternal and parental stress, and social determinants of addiction, and identify the implications of these influences on optimal interventions and outcomes. Seek opportunities to take advantage of existing research platforms and infrastructure, such as the Adolescent Brain Cognitive Development (ABCD) study.
Investigate developmental trajectories of SUDs, different patterns of influences on different trajectories, and the optimally effective interventions and outcomes.
Use epidemiological, population based, clinical, and case method research to identify and investigate the common patterns of complex disorders and problems experienced by individuals with SUDs and the continuum of substance use problems.
Further develop and evaluate methods to screen for and assess SUDs and developing or subclinical substance use in general and pediatric-adolescent medical and dental settings, emergency department, urgent care, and hospital settings, specialty care settings including child and adolescent psychiatry, psychology, and educational-developmental assessment, schools and colleges, and criminal justice settings. Determine the barriers to adoption of screening and approaches to further implementation and to facilitate the adoption of validated screening/assessment methods, conduct research to evaluate the effectiveness of the methods in improving SUD outcomes and other health and quality of life outcomes, and conduct implementation research to assess provider, organization, and system factors that can impact adoption.
Establish models for phenotyping that recognize and capture the common underlying causal substrates that can contribute to the development of both SUDs and other impairments and determine the implications of these influences on optimal interventions and outcomes. Incorporate findings and models from developmental psychopathology research to further this goal.
Use epidemiological, population based, clinical, and case method research to identify and investigate the role of the use of multiple and/or different combinations of substances and their impact on the continuum of substance use problems, optimally effective interventions, and outcomes.
Translate multiplicity of conditions and mechanisms involved into developing methods to screen for and clinically assess the substance-using population, across the continuum of substance use problems.

Priority 2

Explore, evaluate, and implement novel approaches for characterizing the complex phenotypes of people with SUDs and substance use problems.

Approaches

Consider the development of a phenotypic system based on a framework of functional domains, similar to the NIMH RDoC framework and the NIAAA AARDoC framework. Integrate results of basic research and neurocircuitry findings into the framework that would encompass molecular factors, and link those to neurobiological diagnosis.
Determine the extent and ways in which the most common patterns of SUDs and substance use problems indicate that their fundamental character and phenotype involves a complex of multiple problems including disorders, dysfunctions, and social- environmental obstacles and deficits.
Conduct secondary analyses of existing datasets, such as the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) dataset, data from the Clinical Trials Network (CTN), and data collected as part of NIDA funded investigator grants in order to characterize the constellations of multiple factors that are present in complex phenotypes of people with SUDs and substance use problems.
Conduct research to develop methods for characterizing non-treatment-seeking populations of individuals with problematic substance use behaviors, in order to better understand these populations’ patterns of function and substance use trajectories.
Collaborate with Collaborative Research on Addiction at NIH (CRAN) partners and other agencies and non-federal partners to develop and implement optimal methods for harmonizing and aggregating biomedical big data across large studies and clinical datasets, to enable capturing domain-level factors that fit into more systematic characterization of patients’ complex phenotypes. Continue to investigate and encourage the adoption of common data elements into electronic health record systems to facilitate the harmonized documentation of substance use information and other related data. Continue to work with other federal partners to support research that cuts across the domains of other Institutes and agencies.
Extend existing animal models of substance use and addiction and/or animal models of other conditions or environmental factors for use in the study of interactions among drug exposure and addiction, other co-occurring conditions, and environmental influences, mapping the overlap and defining boundaries where possible.

Priority 3

Recognizing the complexities of SUD patients requires reconsideration of the standards, qualifications, and practices necessary for effective service systems, appropriate interventions, successful implementations, and competent treatment professionals. Building upon the research conducted to address Priority 1 and Priority 2, but proceeding without delay where possible, engage in a multi-pronged research and implementation effort aimed at improving the treatment and prevention services available and accessible to patients with SUDs and complex complicating conditions and other factors.

Approaches

Conduct research to determine how effectively interventions in current use address the complex needs of patients with SUDs; continue to study existing approaches for simultaneously addressing SUDs and co-occurring conditions. Investigate the relative effectiveness of interventions at various levels of substance use (mild, regular, daily).
Develop approaches and algorithms for solo and combined pharmacologic and behavioral treatment methods in patients with co-occurring conditions and comorbidities that would be the basis for personalized treatment of addiction. Utilize the progress in neurobiology/neurocircuitry and pharmacogenetics of addiction.
Investigate the extent to which SUD treatment providers (and general health care providers) are well trained and prepared to assess and address their patients’ co-occurring conditions and other problems; conversely, investigate how well trained and prepared other health care and social services providers are to assess and address their patients’ substance use disorders and problems, and investigate the impact of the level of training and preparation on patient outcomes.
Assess the effectiveness of linkage and referral protocols to develop efficient models for connecting patients with the specific medical and psychiatric care, and social services, required.
Investigate relevant implementation factors that can affect the adoption of novel treatment and prevention approaches, including issues of increased clinician effort, diversion and overstretching of other health care resources, use of technological aids, and family and community support.
Investigate the longer term outcomes of different treatment paradigms and approaches (pharmacological, psychotherapeutic, structural changes, combined, 12 step, etc.), and, in addition, outcomes of different approaches in relationship to the provider professional characteristics and qualifications.
Conduct research on implementation of new interventions in multiple types of systems.
In order to ameliorate inadequacies identified in investigations of the impact of provider qualifications and treatment approaches on treatment outcomes, conduct research to develop, evaluate, and improve training courses and programs aimed at various professional groups on treatment of SUDs and the continuum of substance use problems including their intersection with other conditions and complexities.
Conduct research to develop and evaluate novel holistic treatment approaches, including integrated interventions, coordinated care, linkage and referral protocols.
Transition to developing pharmacologic and other treatment approaches to addiction in pathophysiologically homogenous groups of patients versus those currently defined by subjective behaviorally-based descriptive approaches.
Set up frameworks to implement personalized combined interventions (e.g., combined pharmacological and behavioral therapy, polypharmacy) addressing the multiplicity of co-occurrences and comorbidities of addiction phenotypes.
Investigate the impact of changes in service provision as a result of changes in health care insurance and financing (e.g. ACA).

Staff

NIDA co-chairs: Meyer Glantz, PhD and David Liu, MD
External Scientific Matter Experts: Kathleen Brady, MD, PhD; Joseph Guydish, PhD, MPH; Lisa Metsch, PhD; Jenae Neiderhiser, PhD; Edward Nunes, Jr, MD; John Rotrosen, MD; Constance Weisner, DrPH, MSW
NIDA staff: Will Aklin, PhD; Maureen Boyle, PhD; Emily Einstein, PhD; Jacques Normand, PhD; Karran Phillips, MD, MSc; Tonya Ramey, MD, PhD; Geetha Subramaniam, MD; Dave Thomas, PhD; Susan Volman, PhD

Strategic Planning Workgroup - GEDI

Final Workgroup Proposal -GEDI

Workgroup Proposal on Genes x Environment x Development Interplay (GEDI)

Introduction

Substance use disorders (SUDs) are complex conditions that develop over time and are characterized by increasing escalation in use defined by stages of initiation, abuse, and addiction that are often experienced repeatedly due to cycles of withdrawal and relapse. Not all individuals who initiate drug use progress to addiction. Genetic epidemiology suggests that these different stages are influenced by the environment, the age of a person when drug exposure takes place, and their genetic vulnerability. Initiation and dependence share some common genetic factors but unique genetic factors also underlie the different stages of substance use, as well as individual vulnerability to particular substances. Elucidation of genetic factors and epigenetic regulation of gene expression influenced by the environment are essential to understanding the biological basis of substance use disorders. Ultimately, understanding the interplay of genetic and environmental factors that influence the SUD trajectory over the course of human development should contribute greatly to developing effective and personalized prevention and treatment interventions for SUDs.

Priority 1: Improved Methods for Gene Identification

The primary challenge of GxExD research is to determine how to optimally address or detect relatively small genetic effects that each variant contributes to the overall heritability, and then examine their interplay with changing environments, and across human development. Examining all three variables at once in a single study is extraordinarily difficult because of the problem of multiple comparisons and need for extraordinarily large sample sizes; moreover, the ability to use established samples for such studies is hampered by the challenges of data harmonization. Genome wide association studies (GWAS) have been one of the most productive methods for identifying genetic variants associated with disease which is essential for understanding their underlying biology and interplay with environmental factors. When individual markers do not achieve statistical significance in GWAS, polygenic risk scores based on the weighted sum of variants derived from a training set may be useful in predicting risk and interactions with environmental exposures in smaller cohorts and in epidemiological studies where environmental exposure is well measured and can be developmentally stratified. The same strategy with smaller cohorts can be followed once confirmed genetic variants are identified.

Approaches

Increase investment in human molecular genetics for large sample genome wide association studies (GWAS) to identify genetic variants that contribute to SUDs.
Support increased sample sizes for adequately powered GWAS studies. The current standard for genome-wide significance is 5 x 10^-8 under the assumption of 1 million independent statistical tests. (The alternative of using a smaller number of candidate SNPs has not been successful for most complex traits).
Use of novel quantitative and polygenic genetic methods and statistical models to integrate the GWAS efforts with phenotype identification, with examination of environmental effects, and with examination of more complex genetic effects such as developmental changes.
In parallel to the GWAS efforts, support longitudinal studies in population-based samples and perhaps twin families that collect biospecimens for a variety of genomics to examine the impact of development over the life span.
Integrate genotype data with post-mortem brain epigenomic and gene expression data from addiction cases and controls. This approach may leverage existing archives of normal brains and resources such as the Genotype-Tissue Expression (GTEx) program but will need to develop data from brains of those with addictions.
Support studies that utilize new technologies that provide unprecedented detail of SUD genetic variants within a 3D spatial context. For example, use of next generation sequencing in family- based designs may identify rare variants in chromosomal regions shown to be associated with SUDs. Coupling Hi-C technology with next generation sequencing not only reveals the origin of the gene variants, but also the 3D organization of the chromosomal region within the context of the genome.
Functionally validate and characterize newly identified gene variants using genome editing technologies such as CRISPR/Cas9. This technology enables new opportunities for characterizing the function of newly identified human gene variants by creating animal knockins which can be further explored using in vivo imaging.

Resources needed

Complete genotyping of samples in NIDA Genetics repository.
Promote access to large sample sizes containing tens of thousands of subjects with serious forms of addiction, ethnic diversity, and different patterns of comorbidity.
Support for methods research to design strategies to analyze complex traits which will increase the pay-off of the data being collected.

Priority 2: Epigenetic Approaches

Behavioral models of addiction focus primarily on the behavioral responses elicited by drugs as a way to measure drug reward; however, a more precise knowledge of the biochemical and molecular pathways that produce these behavioral changes is required for understanding the development of addiction. Recent epigenetic studies link drug reward to gene transcription regulation by chemical modification of both DNA and DNA-associated proteins. This has opened a new and exciting avenue for addiction research. Recent advances in epigenome-editing using techniques such as CRISPR/Cas9 provide the necessary tools to influence reward phenotypes by precise in vivo manipulation of epigenetic states at specific loci within distinct neuronal populations.

Approaches

Encourage case/control epigenome-wide association studies (EWAS) using postmortem brain tissue. The cell-type specificity of epigenomes necessitates that etiologic studies investigate brain tissues rather than peripheral tissues.
Validate animal epigenetic findings in human post-mortem brain tissue and determine their similarities and differences. These studies are expected to reveal the role of epigenetic mechanisms in substance abuse and to provide a firm foundation for new epigenetic therapeutics that can target epigenetic enzymes and pathways.
Genome wide association studies (GWAS) have largely identified variants in non-coding regions. Integration of ‘Omics information (such as DNA modifications, histone modifications, transcriptome, regulatory RNAs) with GWAS, will improve discovery power and greatly assist in interpretation of the functional implications of the GWAS findings, leading to the valid and potentially more powerful methods for identifying causative alleles.
Encourage studies examining additional regulatory mechanisms of the genome, such as noncoding RNA.
Further develop non-invasive imaging technology to measure epigenetic changes in the brain.
Collect peripheral samples from human longitudinal studies (ABCD Study) to investigate dynamic changes over time (epigenomic, non-coding RNA, metabolomic, proteomic, or transcriptomic) associated with the development of SUDs for biomarker development.
Use of pluripotent stem cells to link molecular information during neuronal development as a complement to postmortem data approaches utilizing tissue collected across the lifespan from both human and animal specimens. Identification of epigenetic changes during development are particularly challenging for human studies. Studies in rodents and non-human primates can provide information on how the epigenome changes with time and environmental exposure.

Resources Needed

Significant investment in brain banks of addicted individuals
Investment in bioinformatics resources to analyze sequencing and other data arising from epigenetic studies in animals and humans.
Better technologies to obtain epigenome-wide information from one or very few cells of the same type.

Priority 3: Improved Phenotyping

The heterogeneity of substance use phenotypes and the yet-unidentified human genetic variation contributing to SUDs both offer challenges to GxExD research. Optimizing phenotyping strategies enhances the possibility of identifying genes, relevant environmental risk and protective factors, and the developmental stages associated with SUD behavior.

Approaches

Standardize phenotyping across genetic studies as much as possible. This may involve developing a core set of variables to be applied to all NIDA genetic studies. Use of the PhenX tool kit measure may help address this standardization.
Support efforts that utilize relatively homogeneous or severely affected SUD phenotypes, which may provide greater statistical power for discovery.
Rather than initially conducting costly in-depth environmental assessments, encourage brief phenotyping strategies to facilitate gathering the large samples necessary to detect small effects.
Follow up confirmed discoveries from GWAS and animal studies with smaller, longitudinally followed samples that include deep phenotyping of SUD behavior and relevant environmental variables.
Utilize biomarkers associated with drug use or metabolism.
Leverage existing resources such as those coming from the ABCD study, uniform electronic patient records, or existing large population cohort studies.

Resources Needed

Access to large samples of genotyped individuals with SUD-relevant phenotypes as well as longitudinally studied, comprehensively assessed samples of children and adolescents that make possible evaluation of gender, ethnicity, and different forms of addiction.
Improve behavioral measures so that findings can be translated between human and animal models.

Priority 4: Improved Characterization of Environmental Influences

Numerous environmental factors have been found to correlate with elevated risk for initiating substance use and the development of SUDs; however, it is both challenging and important to distinguish which of these factors play a causal role in the trajectories toward SUDs and which are markers of risk, so that malleable environmental factors can be correctly identified and appropriate interventions implemented during sensitive developmental periods. One complicating factor is the role of gene-environment correlation, wherein genetic factors contribute to the likelihood that an individual will be exposed to particular types of environments. Environmental factors with the strongest evidence for a role in SUD trajectories and gene-environment interplay include: prenatal substance exposure, early stress, child maltreatment, peer influences, parental monitoring, and early initiation of substance use.

Studying GxExD currently requires large sample sizes, due to the small effect sizes of available gene variants. To address this need using existing samples, data need to be combined across studies, which is much more challenging and resource-intensive than is often appreciated.

Approaches

Enhance power by combining data from multiple studies; approaches include data federation, data harmonization, and integrated data analysis. A less stringent approach would be to combine data based on the construct rather than specific questions.
For future studies, it may be helpful to use common measures, such as those in the Consensus Measures for Phenotypes and eXposures (PhenX) toolkit; initially brief or proxy measures such as zip code or educational attainment; and new mobile technologies for self-report and for physiological measurements.
Measuring the level of exposure to environmental variables, and the timing of environmental exposure, are also important.
Population-based approaches, including twin and national registries, offer advantages including size, representativeness, ability to control for familial processes, and sometimes detailed environmental measures.
Animal models may be useful to pinpoint salient environmental exposures and critical developmental timing of exposure.

Resources needed

Investment in new mobile technologies that quantitatively measure the environment and biomarkers for substance abuse.
Invest in efforts to integrate data across multiple samples and address the challenges inherent in data integration in order to promote large-scale collaborative GxExD studies.

Priority 5: Integration of Animal and Human Studies

Genetic and epigenetic studies in both animals and humans have begun to provide insight for understanding the neurobiological mechanisms associated with risk for and development of substance use disorders (SUDs); however, linking individual findings from animal models with particular phenotype(s) exhibited in humans remains challenging. Unraveling the complex etiology of SUDs requires mechanistic insights provided by animal models that are not possible in human subjects. New approaches, new populations, and new genomics tools are allowing the identification of genetic and epigenetic factors more quickly than ever before. Currently, the field is in the very early stages of identifying genetic and epigenetic factors that are relevant to addiction. Much less is known about the common circuits that exist across model organisms that contribute to risk for and development of SUDs. Moving forward, greater integration of findings generated in both human and animal studies is required to elucidate the common genetic and biochemical pathways involved in SUDs to enable the development of therapies for this largely undertreated population.

Approaches

Increase the use of quantitative measures in human studies to enable better integration of human genetic variation with deep insights from animal genetic and genomic studies. The use of the following quantitative continuous traits will be helpful in translating animal and human genetics: amount of drug ingested, frequency of use, length of abstinence; somatic and affective symptoms of drug withdrawal, preference or sensitivity for non-drug rewards, novelty preference or novelty seeking, increased incentive motivation for reward-related stimuli; sensitivity to develop escalation of drug taking; impulsivity, poor cognitive flexibility (e.g., reversal learning, set shifting, etc.); resistance to punishment during drug-seeking persistent responding in the absence of drug; heightened relapse and reinstatement; enhanced stress reactivity; disrupted circadian rhythms. Moreover, traits shown to be heritable in animals (e.g. impulsivity) that more closely underlie the biological mechanism provide increased power and reduce the need for ever increasing sample sizes.
Map quantitative genetic traits in inbred and outbred rodents to examine the degree to which various traits or phenotypes share a common genetic network.
Support genetic analysis of complex traits in inbred strains of rodents where environmental variables and timing of exposure can be controlled, providing invaluable insight into GxE.
Genetically mapping substance abuse-related phenotypes in model organisms may suggest candidate genes to test in human populations and provide powerful insights into the mechanisms of substance abuse.
Support translation of epigenetic findings gathered from addiction studies in animal models to humans.

Cross-cutting Priorities

Data Sharing

There is a critical need to share the data collected by various research teams to synchronize research efforts, reduce duplicative investment, and increase statistical power to generate GxExD findings. Specifically,

There is a critical need for identifying and sharing of phenotypic data for GxE studies. The dbGaP database includes some phenotypes, but they represent only a small set of the data collected. NIDA needs to think about which phenotypes embody the core characteristics necessary for allowing more investigators to use them across other studies and to conduct replication studies involving GxE interactions.
Provide incentives to encourage investigators to share data by providing the necessary resources required for preparing data for sharing and offer researchers credit for sharing.
Data integration (which requires sharing and data harmonization) is necessary to build large datasets to increase our chances of new discoveries and replication of these studies.
Data must be curated in a way that meets both confidentiality and full access requirements.

Training

GxExD studies require great breadth and depth of knowledge in genetics, development, bioinformatics, epidemiology, digital technology, and research design. These challenges may be met by

Encouraging collaborations among individuals from different fields (inter-individual, interdisciplinary collaborations rather than intra-individuals interdisciplinary training). Designing a program for single investigators to attain the appropriate breadth and depth of knowledge in all areas would require lengthy training that could be more efficiently achieved through collaborative networks of scientists possessing depth of knowledge in a given discipline.
Supporting both training and R grants in methods development for the analysis of rich and informative developmental samples with GxE interaction and covariation.
Support meetings/workshops that train scientists to use resources to analyze GxExD samples.
Support meetings that encourage interaction between investigators that use animal and human models to analyze GxExD.

Statistical Methods

GxExD studies require a combination of methods development, bioinformatics, software development, and training in this methodology and the corresponding software. Methods development funding mechanisms are critically needed to create these vital tools needed to propel the field forward.
SBIR/STTR funding for small businesses may provide an avenue for accessing software and corresponding training generated by other teams to GxExD studies, making them broadly accessible to the field.
Increased support for secondary analyses of existing data sets.

Leveraging Technologies and Innovations from other Fields

Develop nanotechnologies that can target specific cell types to correct aberrant function through genetic or epigenetic modifications.
Wearable sensors; mobile technology that tracks data collected from sensors and includes geographic location tracking technology; use of social media and web site tracking
Model the successes achieved in identifying genes involved in schizophrenia for gene identification for other outcomes.
Harness methods from other fields to protect data confidentiality, handle missing data, model complex data (including methods for causal inferences), secure data dissemination, federation, and integration.
Adopt advances in sequencing and other genetic technologies: 1) various polygene methods – GCTA, LD-regression, Polygenic Risk scores; 2) advancements in classes of gene annotations (Encode, expression arrays etc.) that can move beyond single variant markers – that may have very low power in the absence of really large samples; and 3) developments in inexpensive data collection.
Utilize expression, methylation, and connectome databases and iPSCs

Benchmarks to Measure Success

The effectiveness of implementing the research priorities and corresponding approaches outlined in this proposal can be measured as follow:

Investigator metrics: tracking the number of new projects and peer-reviewed publications addressing each research priority
Data sharing: tracking number of newly shared data sets and citation counts for each
Generation of new data: tracking the number of new samples genotyped under NIDA’s existing samples (Smokescreen project); the number of new, replicated genetic discoveries; the increase in consistency of research findings among different labs and across animal models and related human phenotypes; and monitoring the quality of data collection (cooperation rates)

Impact on Public Health

The priorities outlined above represent approaches that are expected to bolster primary addiction research findings from the GxExD field over the next five years. An important and concurrent priority of NIDA should be to focus on building translational bridges between basic GxExD researchers with individuals who can transform that knowledge into more effective prevention and treatment programs. Effective implementation of these programs is key to reducing the burden of SUDs and directly improving public health.

Staff

NIDA co-chairs: Jonathan Pollock, PhD and Naimah Weinberg MD
External Scientific Matter Experts: E. Jane Costello, PhD; Danielle Dick, PhD; William Iacono, PhD; Eric Johnson, PhD; Kenneth Kendler, MD; John Rice, PhD; and Gustavo Turecki, MD, PhD
NIDA staff: Maureen Boyle, PhD; Harold Gordon, PhD; Raul Mandler, MD; Michele Rankin, PhD; Joni Rutter, PhD; and John Satterlee, PhD

Past Strategic Plan (2010-2015)

View the past Strategic Plan (2010-2015)

December 2015

2016-2020 NIDA Strategic Plan Strategic Plan Workgroup Reports and Past Plans

Strategic Plan (2016 - 2020)

Strategic Planning Workgroup - Big Data

Workgroup Proposal

Introduction

Priority 1: Big Data Sharing

Challenges and Opportunities

Priority 2: Big Data Capture & Formats

Challenges and Opportunities

Priority 3: Data Curation, Storage, Analytics and Visualization

Challenges and Opportunities

Summary and Recommendations:

Appendix: Resources

Section 1: Data Sharing

Section 2: Data Capture and File Formats

Section 3: Data Curation, Storage, Analytics, and Visualization

Glossary

Staff

Strategic Planning Workgroup - Complex Patients

Workgroup Proposal

Introduction

Priority 1

Approaches

Priority 2

Approaches

Priority 3

Approaches

Staff

Strategic Planning Workgroup - GEDI

Workgroup Proposal on Genes x Environment x Development Interplay (GEDI)

Introduction

Priority 1: Improved Methods for Gene Identification

Approaches

Resources needed

Priority 2: Epigenetic Approaches

Approaches

Resources Needed

Priority 3: Improved Phenotyping

Approaches

Resources Needed

Priority 4: Improved Characterization of Environmental Influences

Approaches

Resources needed

Priority 5: Integration of Animal and Human Studies

Approaches

Cross-cutting Priorities

Data Sharing

Training

Statistical Methods

Leveraging Technologies and Innovations from other Fields

Benchmarks to Measure Success

Impact on Public Health

Staff

Past Strategic Plan (2010-2015)

2016-2020 NIDA Strategic Plan
Strategic Plan Workgroup Reports and Past Plans