S2AND Inference on Custom Data

Introduction In this post, we actually run the saved model we have on file on the our own custom dataset. The Code S2AND comes with a production model that has already been pretrained on their own collection of data. To run the production model on your unique datasets, simply run the code below: import pickle # reload model with open("data/production_model.pkl", "rb") as _pkl_file: clusterer = pickle.load(_pkl_file) dataset_name = 'fake' #this points to folder with generated data DATA_DIR = os....

May 2, 2023

Specter Embeddings for Author Disambiguation

Introduction The purpose of this post will be to show you how to generate Specter Embeddings specifically for the S2AND author-disambiguation algorithm. Specter embeddings are one of the most important features used in S2AND. The developers of the specter embedding models have shared their model here, freely available for anyone to use. My implementation of the code can be found in this repo. The code Without too much to waste, here’s the fully code that will generate specter embeddings for a give papers....

May 2, 2023
Cover

S2AND Data

Introduction The purpose of this post is to evaluate the structure of the training data used in running the S2AND algorithm; understanding the file formats will help us insert our own, unique data for author disambiguation Folder Structure Here is what the full data directory looks like: As you can see, the S2AND repo provides five folders for five different test runs of the algorithm, namely: Aminer ArnetMiner Inspire Kisti Medline We also have access to the production-level, pretrained S2AND model that’s ready for us to plug in and use:...

April 18, 2023
S2AND Algorithm

S2AND: Allen AI Paper Summary

Introduction The purpose of this post is to provide readers a quick recap on the S2AND paper by Allen AI. Readers are encouraged to review the original paper for technical details they wish to further investigate. Additionally, Daniel King and Sergey Feldman have done a great job documenting S2AND development in a separate Medium post by Sergey Feldman. For a post demonstrating the implementation of S2AND, feel free to read the second part of this post....

March 24, 2023