Bayesian Statistics the Fun Way

Part 1: Introduction to Probability Chapter 1: Bayesian thinking and everyday reasoning The central theme of this chapter: Data informs beliefs; belief should not inform data Let’s consider two world views in regard to data: (Ideal framework): “the probability of data given my hypothesis and experience in the world” or “how well do my beliefs explain the world?” (not ideal): “the probability of my beliefs given the data and my experience in the world” or “how well what I observe supports what I believe” The first case is where we change our beliefs according ot he data...

July 2, 2023

Parallel Gpu

Context I am training GPT2 on document classification. I’m doing this by following the two script examples below: // Example of GPT2 Fine-tuning WITH classification head https://gmihaila.github.io/tutorial_notebooks/gpt2_finetune_classification/ Example of GPT2 Fine-tuning WITHOUT classification head (generative fine-tuning) https://towardsdatascience.com/guide-to-fine-tuning-text-generation-models-gpt-2-gpt-neo-and-t5-dc5de6b3bc5e Command Line If you are running a training script that takes no arguments, then you can run the following: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 script.py Otherwise, you can also include the argparse for your script, provided that your script is updated to take in arguments....

June 21, 2023

Chart.js Line Charts

In this post, we will discuss using Chart.js for line plotting. Examples from this post can be found in this repo. Reading data from a CSV file Here the full function we use to read in the data: function parseCSV(csv) { const lines = csv.split("\n"); const headers = lines[0].split(","); const data = { y_value: [], ym_value: [] }; for (let i = 1; i < lines.length; i++) { const currentLine = lines[i]....

June 15, 2023

Nodes in High Performance Computing

What is a node? A node in HPC refers to a single computational unit within a larger cluster or supercomputer. It typically consists of one or more processors (CPUs) and may include other components such as memory, storage, and accelerators like Graphics Processing Units (GPUs) or Field Programmable Gate Arrays (FPGAs). HPC nodes are designed for high-performance parallel computing and are often part of a larger cluster or supercomputer infrastructure....

May 30, 2023

Command L: Searching with my keyboard

I am now one step closer to not needing my mouse! I’m working directly on my laptop at Barnes and Nobel, and it gets difficult to have to constantly adjust my bluetooth keyboard to access the mouse when I want to search something on the web. So far, on my keyboard, I can Switch tabs from VSCode to Safari/Chrome (cmd + tab) Open a new browser tab (cmd + t) Open a new browser window (cmd + n) switch between different windows/instances of an app or browser window (cmd + ~) Close a window to tab (cmd + w) Copy text (cmd + c) Pate text (cmd + v) But none of these tricks really mattered if I end up moving my mouse a few pixels to access the search bar of a browser....

May 27, 2023

Devops for the Desperate

PART I: INFRASTRUCTURE AS CODE, CONFIGURATION MANAGEMENT, SECURITY, AND ADMINISTRATION Chapter 1: Setting Up a Virtual Machine Infrastructure as Code (IaC) and Configuration Management (CM). IaC is the process of using code to describe and manage infrastructure (VMs, network switches, and cloud resources like AWS). The benefit of IaC is the ease of deployment because applications are built and tested the same way in every delivery pipeline. The drawbacks of IaC is that provisioning can take some time and organization, which may not be needed if you’re doing simple code tests on your local machine; it’s a tradeoff between deployment and creating scrappy MVPs...

May 27, 2023

Linux Distro Check

TL;DR There are 2 practical methods to check which linux distro your machine or remote server is using: Method 1 uname -a This will produce something that looks like, Darwin abc123 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar 6 21:00:17 PST 2023; root:xnu-8796.101.5~3/RELEASE_X86_64 x86_64 The information suggests that we are using the Darwin operating system with a kernel version of 22.4.0. Darwin is the core operating system that serves as the foundation for macOS and iOS....

May 26, 2023

Book Review: The Linux Command Line

Introduction This is my personal review of the book “The Linux Command Line” by William Shotts. I provide the table of contents, and provide summaries on chapters that I personally found interesting. Table of Contents Part 1: Learning the Shell Most of these chapters show basic commands and syntax. You may find some useful ones that are new to you, but for the most part, this is a pretty basic run through....

May 25, 2023

Git Remote Add: Managing Remote Repositories

Introduction Did you know: You can manage your project by pushing it to different repos. Pushing to GitHub What do I mean by this? Say on our desktop, you have a project called “Hello.py”. Let’s say it’s a basic python script that prints hello world. print("hello world") Your options for uploading and saving this code is traditionally through github. So, when you’re ready to upload your code, you may type:...

May 23, 2023

Linux Commands: df

Checking disk space To check disk space on your machine or remote server, simply run: df -h . If you include the period, you make get an output that looks similar to this: Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/disk3s5 228Gi 165Gi 44Gi 80% 2088008 457632960 0% /System/Volumes/Data If you do not include the period, your output may look something like this: Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/disk3s1s1 228Gi 14Gi 44Gi 25% 501730 457632600 0% / devfs 208Ki 208Ki 0Bi 100% 718 0 100% /dev /dev/disk3s6 228Gi 4....

May 10, 2023