datasets

S2AND Data

Introduction The purpose of this post is to evaluate the structure of the training data used in running the S2AND algorithm; understanding the file formats will help us insert our own, unique data for author disambiguation Folder Structure Here is what the full data directory looks like: As you can see, the S2AND repo provides five folders for five different test runs of the algorithm, namely: Aminer ArnetMiner Inspire Kisti Medline We also have access to the production-level, pretrained S2AND model that’s ready for us to plug in and use:...

Unified Medical Language System (UMLS)

The UMLS Explained What is the UMLS used for? Let’s talk about the Unified Medical Language System, or UMLS for short. The UMLS is a tool that helps people in the medical field understand and communicate with each other better. Imagine you and your friend both speak different languages. It would be pretty hard to have a conversation, right? Well, in the medical field, there are lots of different words and terms that mean the same thing, depending on who you’re talking to....

MedQA-USMLE

Introduction This post will discuss the MedQA-USMLE dataset. For more details, check out the original published paper. What is USMLE? The United States Medical Licensing Examination (USMLE) is simply a professional exam that all aspiring physicians must take in order to practice medicine in the United States. The layout of the exam is involves 3 steps: An 8-hour block consisting of 280 multiple-choice questions (Day 1) A 9-hour block consisting of 316 multiple-choice questions (AKA, CK)....

Diseases Database (DDB)

Introduction One of the hardest parts of any data science project is gathering good data. Fortunately, many online data repositories exist for us to use! The Diseases Database (DDB) is one such source. The Diseases Database, developed and maintained by Medical Object Oriented Software Enterprises LTD, is a cross-referenced index of human disease, medications, symptoms, signs, and abnormal investigation findings, intended for medical practitioners and students. Rather than in hierarchies of anatomical, physiological, or pathological systems, it serves as a way of classifying medical concepts along clinical axes, such as cause/effect, risk factors, interactions, etc....