Open for Research: COVID-19 Literature Dataset

Open for Research: COVID-19 Literature Dataset

It’s more important than ever to come together, as companies, non-profits, governments, scientists, and clinicians, to bring our best information and technologies to bear on challenges with COVID-19.

Today, we announced a collaboration with colleagues to create the COVID-19 Open Research Dataset (CORD-19) from a coalescence of scientific articles about the coronavirus group of viruses for use by the worldwide research community. CORD-19 contains over 29,000 scholarly articles for COVID-19 and the coronavirus family more broadly, with full text available for over 13,000 of the articles.

The motivation behind the CORD-19 effort is to make research and discovery more efficient—and to accelerate progress toward solutions to the pandemic. The machine-readable dataset was constructed with colleagues at the National Library of Medicine (NLM), the Allen Institute for AI, Georgetown University, the Chan Zuckerberg Initiative, Kaggle and the White House Office of Science and Technology Policy (OSTP). Microsoft contributed the indexing and mapping of thousands of articles worldwide. We’ll continue to update the index to provide the global research community with a unified, continually updated resource that brings together all that we know about COVID-19. 

A key aspect of aggregating scientific literature into a valuable unified data resource is gaining access to the full content of articles—including permissions to analyze the content with computational tools. Many medical articles are tucked behind paywalls. Even when text is made available, publishers may not provide researchers with the rights to perform machine analysis and datamining. Much has been going on behind the scenes to open up the literature on the coronavirus family and on COVID-19 to create this kind of machine-readable resource.

It’s my hope that the machine-readable content will stimulate advances in computing methods that can help investigators to develop deeper understandings and approaches to addressing the COVID-19 pandemic. Developing tools to help scientists to do research and synthesize new understandings has been a long-term aspiration in AI. Work has been underway over years on methods that can answer questions, analyze and summarize the content of numerous scientific papers, assess the credibility of clinical trials, generate and test hypotheses, and guide experimentation. As examples of prior work on machine reading in biomedicine, research scientists at Microsoft have explored the use of natural language analysis and machine learning to analyze thousands of biomedical papers to construct a representation of cellular regulatory networks and then to leverage the representation to generate recommendations for cancer therapies.

It has been gratifying to see the fabulous cross-organizational teamwork that led to the creation of CORD-19. I’m excited to see what multiple communities of passionate and creative investigators will do with the resource.  

Great to work if I can help.Please tell to me.

Like
Reply
Fabio Castro

Aplicativo ECG CALC na ECG CALC

4y

I am developing a medical diagnostic application. The application can also serve to detect and give alerts of pandemics such as COVID. Test video: https://youtu.be/jD3ZK6XZfms

Like
Reply

Thank you for taking a positive step forward, I believe you will be making a difference. I always liked the idea of 10,000 points of light. This will be one of them.

Like
Reply
Rhaissa V.

Sr. Product Manager | Mentor | Former Netflix

4y

Alisson Ramos also great ref - you may already know it

Like
Reply
Ankam Lokesh Kumar

Student at RAJIV GANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES, RK VALLEY

4y

I have innovative idea on developing vaccine for Corona Please help me get into research and save millions of people. (X-protien) start mechanism on white blood cells in presence of virus synthesis the NO(Nitric oxide). It stimulates the INF-alpha mRNA. It increases the immunity. And prevents the replication of virus. *Warning (it stops cell growth adjacent to the virus). Later Use of (Y-protien) stimulates macrophages, t-cells and throughout the virus out from body.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics