My Specific Use Cases and Zettel Types

Many discussions about knowledge management systems talk about things in abstract. I’ve found this especially true of discussions around Zettelkasten. One consequence of the abstract discussion is that newcomers have difficulty seeing how they can implement something like ZK. Furthermore, without specific use cases and examples, it’s difficult for people to provide feedback or critiques.

So, I want to start this thread to show some real notes of mine and also to explain their purposes. I have different notes for different purposes and uses. I welcome any and all feedback, and I encourage others to start threads of their own showing specific examples of their notes and how they use them and why.

I should also start with the caveat that these notes are my notes and don’t try to adhere to any Zettelkasten orthodoxy. Not all notes are my own rendering of ideas. Many are collections pointing to other people’s ideas or words. The key is that I try to group and cluster things together and densely link them. I try to keep my notes as atomic as possible and build up hierarchies from more basic notes to more overarching notes. So all this is to say, my uses will likely not satisfy ZK purists, but that’s fine. My goal is not purity but effectiveness for my ends, which is probably true for most all users of any kind of personal knowledge system.

2 Likes

Collecting Definitions

One of my most basic uses of my notes is to collect multiple instances of a thing in one place. I do this in order to (1) have references for later use when I am producing some kind of scholarly output (2) to enable synthesis of a consensus from the instances, (3) to generalize (induce) some insight from the collected instances.

Here are some specific examples:


In this note below, I have titled the note [Aphasia], which is a neurological sign. I’ve collected two definitions from multiple sources in the same note. My purpose is that when I need to define this term, I will have multiple examples to choose from to find their similarities and differences.

Aphasia

#neurology #neurological-examination

Definition

Aphasia is an inability to comprehend or formulate language because of damage to specific brain regions. Wikipedia

Aphasia, or dysphasia, is a defect in language processing caused by dysfunction of the dominant cerebral hemisphere. Because aphasia is a disorder of language and not a simple sensory or motor deficit, both spoken language and written language are affected. [Blumenfeld]


Here is another example. I started a page for [[Definitions of Small Fiber Neuropathy]]. Many different sources have definitions of small fiber neuropathy, and they differ. This page [[Definitions of Small Fiber Neuropathy]] can serve as a collecting bin for these definitions for me to be able to get an overview of the ways people define the disease.

Definitions of Small Fiber Neuropathy

“Small fibre neuropathies are a heterogeneous group of disorders affecting thinly myelinated Aδ-fibres and unmyelinated C-fibres”

Terkelsen, A., Karlsson, P., Lauria, G., Freeman, R., Finnerup, N., Jensen, T. (2017). The diagnostic challenge of small fibre neuropathy: clinical presentations, evaluations, and causes The Lancet Neurology 16(11), 934 - 944.](https://dx.doi.org/10.1016/s1474-4422(17)30329-0)

“Small-fiber polyneuropathy refers to widespread preferential damage to the small-diameter somatic and autonomic unmyelinated C-fibers and/or thinly myelinated A-delta fibers.”

Oaklander, A., Nolano, M. (2019). Scientific Advances in and Clinical Approaches to Small-Fiber Polyneuropathy: A Review. JAMA Neurology 76(10), 1240-1251. https://dx.doi.org/10.1001/jamaneurol.2019.2917

“Small-fiber polyneuropathy’ (SFPN), also known as small-fiber neuropathy, refers to those polyneuropathies that preferentially affect peripheral neurons with the thinnest axons, including the unmyelinated C-fibers, thinly myelinated A-δ somatosensory axons and the sympathetic and parasympathetic neurons.”

Liu, X., Treister, R., Lang, M., Oaklander, A. (2018). IVIg for apparently autoimmune small-fiber polyneuropathy: first analysis of efficacy and safety Therapeutic Advances in Neurological Disorders 11(), 175628561774448. https://dx.doi.org/10.1177/1756285617744484

Collecting Examples

I have several notes where I group exemplars of a thing. I use this for inspiration and reference for when I am trying to achieve a specific goal or task.


In this note, I’ve collected examples of what I consider to be high quality bioinformatics code. When it is time for me to write some code, I have a list of examples to study on how I might carry out my own analyses. Learning to do somethign is about imitating good examples. That’s why I keeep these.

uid: [[20200313170342]]
title: Exemplary Bioinformatics Code (List)
tags: #coding #computing #learning #exemplary
origin_id: [[20200312180103]]
origin_title: Data Management (Main)

20200313170342 Exemplary Bioinformatics Code (List)

Description

These are examples of good code to learn from.

Single Cell Analyses

Making Workflows

  • Scripts to install as a Bioconda package for making workflows -EBI
    • In order to wrap Seurat’s internal workflow in any given workflow language, it’s important to have scripts to call each of those steps, which is what this package provides.
    • This version of seurat-scripts uses native conversions to Loom (thoroughly tested), SCE and AnnData.
    • Comments:
      • Great example of how to modularize code for pipelines using commono workflow languages (e.g. Nextflow, Snakemake)

Overview Notes/Hub/Structure Notes

This kind of note pulls together multiple more basic notes, acting as a table of contents or index.

Here are some examples


This is my overview note of Zettelkasten itself

20181102131702 Zettelkasten (Overview)

Key Principles of a Zettelkasten

Principle of Atomicity or “One Note, One Card”

  • [[20181102132830]] Principle of Atomicity - Zettelkasten Blog - Christian Tietze
  • [[20181104091519]] One Fact, One Card - Taking Note Blog - Mann
  • [[20181104084226]] One thought per note - Dan Sheffler - Zettelkasten

Zettelkasten - Method

  • Engage with a source
    • Most often a book, but any text. Could also be a video, audio or in-person discussion.
    • The resource could also be your own mind. Original thoughts are also totally appropriate for a zettelkasten.
    • While engaging, you can be making marginal notes, highlighting text, making separate notes on a piece of paper.
  • Input to the zettelkasten
    • Ideas from the source need to make their way into the Zettelkasten.
    • This is when you make zettels.
  • Connect zettels to other zettels in the Zettelkasten
    • This is probably the most important part. You want to see how knowledge is connected. Relational processing is how we develop understanding.

Purposes of a Zettelkasten

  • [[20181102131520]] The Three Purposes of a Zettelkasten - Sascha Fast

Software for implementing a Zettelkasten

  • [[20181102133027]] Software for Zettelkasten (Overview)

Discussions

Summaries of Zettelkasten


This note links out to many different tools for performing [Single Cell Analysis], which is something I do in my work as a biomedical researcher.

20190908094601 Single Cell Analysis Tools (List)

Library Structures and Technologies

  • [[20190908094501]] Single Cell RNA-seq (scRNA-seq) Library Structure (Teichman Lab)

Multi-purpose tool suites

  • [[20190914103324]] Seurat (Tool)
  • [[20191128191050]] scanpy

Normalization

TODO

Cell Annotation

  • [[20190909200426]] Annotation of Cell Types from Single Cell RNA-seq Data (Tools)

Cell Marker Selection

  • [[20191128185353]] Tools for Cell Type Marker Selection from Single Cell RNA-seq (Main)

Doublet Finding and Removal

  • [[20190909214128]] Doublet Tools for Single Cell Genomics

Ambient RNA Cleanup

  • [[20190909204433]] Ambient mRNA cleanup for scRNA-seq (Tools)

Demultiplexing

  • [[20190909203644]] Demultiplexing tools for scRNA-seq

Pipelines

  • [[20181101115627.1]] scRNA_seq pipeline withg FACS, Smart-seq and Nextera

Intercellular Communication

  • [[20200226132715]] CellPhoneDB
  • [[20200331082756]] CytoTalk
  • [[20200605065614]] SingleCellSignalR
  • [[20200718144649]] NicheNet

TCR/BCR Analysis

  • [[20200305113516]] TRUST4 (Single Cell Tool)

Cluster Comparison

  • [[20200305114423]] Dune (Single Cell Tool)

Clustering

  • [[20200317101801]] Single Cell Clustering Tools (List)

Pathway Analysis

  • [[20200318112337]] Pathway Analysis (Tools)
  • [[20200328221709]] MTGO-SC (Module detection via Topological information and Gene Ontology (GO) knowledge) for Single Cells (Tool)

Gene Regulatory Networks

  • [[20200324132728]] SCENIC (tool)

Spatial Organization

  • [[20200401151451]] novoSpaRc

Dataset Integration

Cross-Species

  • [[20200318114136]] Cross-Species Comparison of Single Cell Data (Main)

Multi-modal Integration

  • [[20190914103324]] Seurat (Tool)
  • [[20200318125651]] Harmony (Tool)

Across Datasets

  • [[20190308114139]] CONOS - Method

Trajectory Analysis

  • [[20200318125514]] Single Cell Trajectory Analysis (Main)

Compositional Analysis

  • [[20200321120041]] scdney (Tool)

Comparison of multiple single cell datasets

  • [[20200328091420]] ClusterMap (Tool)
  • [[20200328093809]] DA-seq for Differential Abundance (Tool)

Collecting Different Instances of a Thing (Topical)

Collecting multiple instances of a thing. The purpose here again is to support induction of some generalization about a topic or new insight. By collecting multiple examples, you can begin to see patterns or core shared features, and to appraise similarities and differences.


Here I have a note about different kinds of scientific questions. I link out to other zettels that discuss each different author/source’s view on types of scientific questions.

uid: [[20181101130453]]
title: Types of Scientific Questions (Overview)
tags: #overview #scientific_method #problem_selection #becoming_a_life_scientist
origin_id: [[20180329192442]]
origin_title: The Art of Asking Good Scientific Questions (Planning your Scientific Journey - iBiology)

20181101130453 Types of Scientific Questions

Kinds of Science Question

  • [[20180329192442]] The Art of Asking Good Scientific Questions (Planning your Scientific Journey - iBiology)
  • [[20181102165915]] Types of Experiments in Molecular and Cellular Cognition - Research Maps - Silva
  • [[20190305223142]] Types of Scientific Questions - Experimental Design for Biologists (Glass)
  • [[20190321195003]] Types of Experiments - David Sweatt

Kinds of Data Analysis Question

  • [[20181101133451.4]] Six types of specific data analysis question Leek and Peng
  • [[20190220202704.2]] The three classes of tasks in data science are description, prediction, and causal inference

And here is what one of those zettels actually looks like:

20181101133451.4 Six types of specific data analysis question - Leek and Peng

Summary:

Six types of specific data analysis question - Leek and Peng

Six types of specific data analysis question Leek and Peng

  • Descriptive
  • Exploratory
  • Inferential
  • Predictive
  • Causal
  • Mechanistic

Quote:

Any specific data analysis can be broadly classified into one of six types (see the figure).

Citekey: [#Leek:2015bp]
Reference: Leek, B.J., and Peng, R.D. (2015). What is the question? Science 347, aaa6146–1315.


Here is another collection type note that centralizes all my notes about project directory structures. Multiple people have different takes on this topic, and I want to see them all and group them together. Over time, I might add to this note, giving my own thoughts on the best or most effective practice based on looking at all these instances. My own opinion can develop over time and doesn’t need to be set in place when creating the note. That’s an advantage of this kind of incremental note system.

20190919154922 Computational Projects Directory Structures - Best Practices (Main)

  • [[20190228100634]] Directory Structure for RNA-seq projects - Harvard - Introduction to RNA-Seq using high-performance computing
  • [[20190305111846]] Project Directories and Directory Structures - Bioinformatic Data Skills
  • [[20190919155011]] Basic folder structure for bioinformatics - Shanguanyu.com
  • [[20190919161602]] Computational Project Structure - Data Carpentry
  • [[20190213180321]] Summary - Wilson:2017iy - Good enough practices in scientific computing - PLoS Comput Biol
  • [[20190928171542]] A Quick Guide to Organizing [Data Science] Projects - Jake Feala
  • [[20191103082853]] ProjectTemplate ®
  • [[20191116165918]] Cookcutter Data Science
  • [[20191116171036]] Manage your Data Science project structure in early stage - TowardsDataScience
  • [[20190507100848]] RMarkdown Driven Development (RmdDD) - Emily Riederer
    • Section on project structure
  • [[20200309193338]] How to organise a bioinformatics analysis project @jessenleon

Scientific Literature Snippet Note

When I read scientific articles, I extract findings, one by one. These findings can then be routed out to different notes for different purposes. I created a script actually to do this automatically from my highlights in PDFs.


title: ‘20180119101200.4 SP GRP Neurotensin and NK are mostly nonoverlapping in lamina I-III’
tags: #interneuron #spinal_cord #reference #dorsal_horn #marker_characterization
date: 2018-01-19

Summary:

SP GRP Neurotensin and NK are mostly nonoverlapping in lamina I-III

Quote:

Excitatory interneurons arising from the dILB population are neurochemically heterogeneous.6,25,88 Our previous studies,26,27 together with the present results, demonstrate a complex pattern of intersection of neurochemical markers. Four neuropeptides (SP, GRP, neurotensin, and NKB) are expressed in largely nonoverlapping populations, with the neurotensinand NKB-expressing cells being mainly included among the PKCg neurons in lamina IIi.

Citekey: [#GutierrezMecinas:2017di]
Reference: Gutierrez-Mecinas, M., Bell, A.M., Marin, A., Taylor, R., Boyle, K.A., Furuta, T., Watanabe, M., Polgár, E., and Todd, A.J. (2017). Preprotachykinin A is expressed by a distinct population of excitatory neurons in the mouse superficial spinal dorsal horn including cells that respond to noxious and pruritic stimuli. Pain 158, 440–456.

Here is what it looked like in the paper itself

Idea Notes

When I have new ideas for things, I make a note with the prefix “Ideax” (I’m forgetting where I learned this… Might have been from MPU podcast). I then link out to other zettels in my system that support the idea


uid: [[20200402182445]]
title: Ideax - Scientific Documentation (Lab Notebook) Workflow in R - labbookr
tags:
origin_id: [[20200402180637]]
origin_title: Biocoder (Language)

20200402182445 Ideax - Scientific Documentation (Lab Notebook) Workflow in R - labbookr

Idea

Create an open-source, future-proof lab notebook system that is not web-based but is local.
Inspiration from [[20200321112824]] workflowr (tool), which provides a consistent filesystem and interface with R markdown and Git. It creates a website that allows one to host on Git and also move between pages nicely. I could easily see this extended to a full lab notebook system

Ultimately, it would be an integration of multiple tools.

Features

  • R markdown pages
  • Consistent file system templates for experiments, projects.
  • Automatic Version Control
  • Automatic generation of a web page
  • Integration with a materials management system
    • Database
    • [[20200402183114]] Airtable
  • Multiple users?
  • Data management

Tools to use together

Documentation

  • [[20200321112824]] workflowr (tool)Workflowr
  • R Studio
  • Jupyter notebook (if desired)

Making Protocols

  • [[20200403095632]] diagrammeR (R package)
    • Use spreadsheets to programmatically create and visualize protocols
    • Use [[20200402194958]] Best Practices for Creating Biomedical Protocols as guide

Data and Materials Management

  • [[20200402183114]] Airtable
  • Spreadsheets
  • [[20200402183347]] baRcodeR (R Package)

Version Control

  • Git and Github

Reproducible Computational Environment

  • Docker

File Management

  • Dropbox
  • Filesystem

Writing workflow

  • R Markdown and [[20200402192319]] Bookdown (R package)

What I’ve done

I’ve been using a system of [[20200323083626]] R markdown notebooks in files and folders to make an [[20200402182558]] Electronic Lab Notebook

Fact Notes

Depending on one’s line of work, there is often a need to have access to pure factual information. This is usually the case when you want to make an argument and need support for it, or if you need to perform a task and need a specific piece of information to do the action. There is debate about whether and how to include pure facts in a Zettelkasten system. For me, they are necessary and this is how I do it.


This is a quote from a source. Why did I make this a note? Well, the next time I go to write a grant or paper about neuropathy and I want to make a statement about the prevalence of the problem, I need factual support. This is it. Anyone involved in academic research will appreciate the recurring need to have access to some fact that you use to support an argument. Making it a zettel note helps me to quickly re-access the fact, rather than having to fish inside PDFs. How would I even search for something like this?

uid: [[20200428203313]]
title: Estimates of Idiopathic Neuropathy Despite Workup
tags:
origin_id: [[20200428203051]]
origin_title: Idiopathic Neuropathy

20200428203313 Estimates of Idiopathic Neuropathy Despite Workup

The major causes of undiagnosed neuropathies were impaired glucose metabolism, CIDP, and monoclonal gammopathies. Despite thorough evaluation 32.7% remained idiopathic. [^Farhad_2016]

[^Farhad_2016]: Farhad, K., Traub, R., Ruzhansky, K., Brannagan, T. (2015). Causes of neuropathy in patients referred as “idiopathic neuropathy”: Causes of Neuropathy Muscle & Nerve 53(6), 856-861. https://dx.doi.org/10.1002/mus.24969

Code Recipe Notes

In the course of doing complex work, there are many constituent sub-tasks that one needs to do. And, chances are, you will need to do these subtasks many times in the future. So it is worth capturing what you did, how you did it, and then also logging any changes or variations in future use. I do this increasingly with code analyses. These code notes have been one of the most fruitful re-uses of notes in my whole system. They pay tangible benefits often.


*Here is an example of a code snippet for work I do in R with a popular package called Seurat.

20200320165550 Create multiple UMAP plots with different resolutions programmatically - Seurat

SeuratObjects can contain clusters with different resolution parameters

The standard resolution metadata looks like “RNA_snn_res.2"”

Goal/Purpose

  • Make UMAP plots for each of the resolutions contained in a SeuratObject
  • Plot them together

resolutions <- seurat_object@meta.data %>% select(starts_with("RNA")) %>% colnames()

UMAP_list <- list()
for (res in resolutions){
  print(res)
  res_stem <- res %>% stringr::str_extract(pattern = "_res\\.[0-9]\\.[0-9]|_res\\.[0-9]")
  Idents(seurat_object) <- res
  options(repr.plot.height = 4, repr.plot.width = 6)
  p <- DimPlot(seurat_object, reduction = "umap", label = TRUE, pt.size = .1) + NoLegend()
  UMAP_list[[res]] <- p
}

patchwork::wrap_plots(UMAP_list)

And, as you can imagine, these code notes can be collected into a structure note, to show how and where they fit into the overall process. See below for that note


20190914103324 Seurat (Tool)

Description

Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. Seurat aims to enable users to identify and interpret sources of heterogeneity from single-cell transcriptomic measurements, and to integrate diverse types of single-cell data.

Source

Seurat Scripts from Others

Usage

Object Interaction

  • [[20200327201306]] Adding Metadata to Seurat Objects

Visualization

FeaturePlot

DotPlot

Violin Plot

Working with other tools

  • [[20191128185407]] COMETSC (see Seurat data export)
  • [[20200526154751]] Data conversion between Scanpy and Seurat

Code Snippets

  • [[20200320165550]] Create multiple UMAP plots with different resolutions programmatically - Seurat
  • [[20200326222426]] Calculate the number of clusters from a Seurat object containing multiple resolutions
  • [[20200327203740]] Plot multiple PCs on FeaturePlot from Seurat using map()
  • [[20200603065613]] Create multiple Seurat objects from multiple datasets
  • [[20200603065716]] Load multiple count files using directory list into Seurat

Relatedly, I’ve also begun to keep ‘Cookbooks’ to provide an index for all the code snippets. Here is an example for My Unix Cookbook


20200321125848 My Unix Cookbook

String Manipulation

  • [[20200321125744]] how to tab separate variables in unix (SO)
  • [[20200321130031]] how to concatenate string variables into a third? (SO)

System Processes

  • [[20200403170739]] Print exit status of last executed command ($?) - Unix

Running Jobs at Specific Times

  • cron is used to set jobs (scripts) to run at specific times on Unix
    • See: [[20200321123018]] cron (Unix)

Miscellaneous

  • [[20200328145248]] Add a tab to the beginning of the first line of a text file from command line
  • [[20200401085947]] File modification as trigger to run shell commands
  • [[20200409144948]] Find size of specific directory on Unix
  • [[20200419181550]] Find multiple files and move to new folder (Unix)

Working with rows and columns

  • [[20200328145521]] Subsetting columns and rows in files from the command line
  • [[20200328145705]] Printing out selected columns of text from the command line

Networking

  • [[20200330090153]] How to check if port is in use on Linux or Unix - nixCraft

File Transfer

  • [[20200330172628]] Downloading remote files from Dropbox using wget onto Unix system
  • [[20200402093046]] rsync to copy files locally and also checksum

Control Flow

[[20200401160010]] Bash Loop Through a List of Strings
[[20200403183109]] If statements in Unix

I/O

  • [[20200403202845]] Redirect output to stdout and to a file using tee
  • [[20200404183051]] What does <<< mean in unix? Here string
  • [[20200404183832]] Dev null usage in Unix

SSH

  • [[20200409075943]] OpenSSH-Server (Unix)

Here is an example note from this cookbook

uid: [[20200328145521]]
title: Subsetting columns and rows in files from the command line
tags:
origin_id: [[20200321125848]]
origin_title: My Unix Cookbook

20200328145521 Subsetting columns and rows in files from the command line

Filtering rows is easy, for example with AWK:

cat largefile | awk 'NR >= 10000 && NR <= 100000 { print }'

Filtering columns is easier with CUT:

cat largefile | cut -d '\t' -f 10000-100000

Usage

2020-06-28

Used together

cat NSForest_input2.txt | cut -f 1-10 | awk 'NR >= 1 && NR <= 10 { print }' | cat -T

Book Notes and Reverse Outlines of Books

Here is a note from a statistics book I was reading

uid: [[20200309231632]]
title: Poisson distribution - Analysis Biological Data 3e
tags:
origin_id: [[20200216155234]]
origin_title: Analysis of Biological Data - Whitlock and Schluter - 3rd Edition (Main)

20200309231632 Poisson distribution - Analysis Biological Data 3e

Definition

The Poisson distribution describes the number of successes in blocks of time or space, when successes hapen indepnedent of each other and occur with equal probability at every instant in time or point in space

Formula

$Pr[X:successes] = \large{\frac{e^{-\mu}\mu^{X}}{X!}}$

where $\mu$ is the mean number of independent successes in time or space

Properties

  • Variance in the number of successes per block of time (the square of the standard deviation) is equal to the mean ($\mu$)
    • In an observed frequency distribution, if the variance is greater than the mean, then the distribution is clumped.
      • Ratio of sample variance:sample mean > 1 indicates clumping
      • If ratio is < 1, then successes are dispersed.

Usage

  • Useful tool for asking whether events or objects occur randomly in continuous time and space
    • Poisson distribution is a reasonable expectation for certain biological counts
    • For biologist, Poisson is a model for how successes may be distributed in time and space in nature
    • When data do not fit the model, this suggests that interesting biological processes are at play
  • The main use of the Poisson distribution in biology is to provide a null hypothesis to test whether successes occur “randomly” in time or space.
    • However, we do not usually know $\mu$. Therefore, the mean rate is estimate from the sample data and used in place to calculate the expected frequencies under the null hypothesis (i.e. that a process follows a random Poisson model)

Contrasts

  • Non-random distribution of successes
    • Clumped: closer together than expected by chance
      • May arise when the presence of one success increases the probability of other successes occurring nearby
    • Dispersed: spread out more evenly than expected by chance
      • Happens when occurrence of one success decreases probability of the next success.

Testing randomness with the Poisson distribution and $\chi^2$

  • To formally test if data follow a Poisson distribution, one can use the [[20200309214504]] Chi-square goodness-of-fit test and calculating the $\chi^2$ statistic.
    • Calculate the expected values under the Poisson model
    • Calculate the $\chi^2$ statistic
    • Determine the degrees of freedom
      • Note: Because the mean ($\mu$) is estimated from the data, an additional degree of freedom is lost.

Here is the outline of that book that I built up with these more granular notes

uid: [[20200216155234]]
title: Analysis of Biological Data - Whitlock and Schluter - 3rd Edition (Main)
tags: #book #statistics #Whitlock_Schluter_ABD_3e #overview #book_summary
origin_id: [[20181022184518]]
origin_title: Scientific Mentorship

20200216155234 Analysis of Biological Data - Whitlock and Schluter - 3rd Edition (Main)

Chapter 1 - Statistics and Samples

1.3 Types of Data and Variables

  • [[20200216155525]] Data - Analysis Biological Data 3e
  • [[20200216155338]] Variables - Analysis Biological Data 3e
  • [[20200216155610]] Categorical Variables - Analysis Biological Data 3e
  • [[20200216160537]] Categorical Data - Analysis Biological Data 3e
  • [[20200216160735]] Numerical data - Analysis Biological Data 3e
  • [[20200216161113]] Major use of statistics is to find associations - Analysis Biological Data 3e
  • [[20200216161230]] Explanatory and Response Variables - Analysis Biological Data 3e

1.4 - Frequency Distributions and Probability Distributions

  • [[20200216160243]] Frequency - Analysis Biological Data 3e
  • [[20200216155746]] Frequency distribution - Analysis Biological Data 3e
  • [[20200216155857]] Probability Distribution - Analysis Biological Data 3e

1.5 Types of studies

  • [[20200216161515]] Experimental vs. Observational Studies - Analysis Biological Data 3e
  • [[20200216162758]] Confounding variables - Analysis Biological Data 3e
  • [[20200216163842]] Experimental Artifact - Analysis Biological Data 3e

Chapter 2 - Displaying Data

  • [[20200216164939]] Contingency Tables - Analysis Biological Data 3e
  • [[20200216164539]] Recommended graphical methods for displaying associations between variables and differences between groups - Analysis Biological Data 3e
  • [[20200216165044]] How to make data files - Analysis Biological Data 3e

Chapter 3 - Describing Data

3.1 Arithmetic Mean and Standard Deviation

  • [[20200216165625]] Descriptive statistics - Analysis Biological Data 3e
  • [[20200216170641]] Sample Mean - Analysis Biological Data 3e
  • [[20200216171049]] Variance and standard deviation - Analysis Biological Data 3e
  • [[20200216172027]] Rounding descriptive statistics - Analysis Biological Data 3e
  • [[20200216172226]] Coefficient of variation - Analysis Biological Data 3e
  • [[20200216173523]] Effects of changing measurement scale - Analysis Biological Data 3e

3.2 Median and Interquartile Range

  • [[20200305163938]] Median - Analysis Biological Data 3e
  • [[20200305164853]] Interquartile Range - Analysis Biological Data 3e
  • [[20200305170015]] Box plot - Analysis Biological Data 3e

3.3 How measures of location and spread compare

  • [[20200305170455]] Mean vs Median - Analysis Biological Data 3e

3.4 Cumulative frequency distribution

  • [[20200305171059]] Percentiles and quantiles - Analysis Biological Data 3e
  • [[20200305171304]] Cumulative relative frequency - Analysis Biological Data 3e

3.5 Proportions

  • [[20200305172252]] Proportions - Analysis Biological Data 3e

Chapter 4 - Estimating with uncertainty

  • Goal of biologists is usually more than describing the resulting data, but rather data are collected so that something may be discovered about the larger population from which the sample came.
  • The descriptive statistics measured are used to estimate parameters of the population.
    • Such estimation is possible when the sample is a random sample
      • A sample is random when it has equal chance of being selected and individuals are sampled independently.

4.1 The sampling distribution of an estimate

  • [[20200305173545]] Estimation - Analysis Biological Data 3e
  • [[20200305173805]] Sampling distribution - Analysis Biological Data 3e

4.2 Measuring the uncertainty of an estimate

  • [[20200305175413]] Standard error - Analysis Biological Data 3e

4.3 Confidence Intervals

  • [[20200305180309]] Confidence intervals - Analysis Biological Data 3e

4.4 Error bars

  • [[20200305181536]] Error bars - Analysis Biological Data 3e

Interleaf 2 - Pseudoreplication

  • [[20200305182154]] Pseudoreplication - Analysis Biological Data 3e

Chapter 5 - Probability

5.1 The probability of an event

  • [[20200307155153]] Random Trial - Analysis Biological Data 3e
  • [[20200307155327]] Events and outcomes - Analysis Biological Data 3e
  • [[20200307151721]] Probability - Analysis Biological Data 3e

5.2 Venn Diagrams

  • [[20200307154925]] Venn Diagram - Analysis Biological Data 3e

5.3 Mutually exclusive events

  • [[20200307155548]] Mutually exclusive events - Analysis Biological Data 3e

5.4 Probability Distributions

  • [[20200307155750]] Probability distribution - Analysis Biological Data 3e

5.5 Either this or that: adding probabilities

  • [[20200307160517]] Addition rule (probability) - Analysis Biological Data 3e

5.6 Independence and the Multiplication Rule

  • [[20200306160601]] Independence (probability) - Analysis Biological Data 3e
  • [[20200306160746]] Multiplication Rule (Probability) - Analysis Biological Data 3e

5.7 Probability trees

  • [[20200307161135]] Probability tree - Analysis Biological Data 3e

5.8 Dependent Events

  • [[20200307161548]] Dependent events - Analysis Biological Data 3e

5.9 Conditional probability and Bayes’ theorem

  • [[20200307161836]] Conditional probability - Analysis Biological Data 3e
  • [[20200307162105]] Law of Total Probability - Analysis Biological Data 3e
  • [[20200307163249]] Sampling without replacement - Analysis Biological Data 3e

Chapter 7 - Analyzing Proportions

  • The key to estimation and hypothesis testing is an undertanding of the sampling distribution for a proportion.

7.1 The binomial distribution

  • [[20200308091719]] Binomial distribution - Analysis Biological Data 3e

7.2 Testing a proportion: the binomial test

  • [[20200308133534]] Binomial test - Analysis Biological Data 3e

Chapter 8 - Fitting Probability Models to Frequency Data

8.1 - $\chi^2$ Goodness-of-fit test: the proportional model

  • [[20200309214340]] Proportional model - Analysis Biological Data 3e
  • [[20200309214504]] Chi-square goodness-of-fit test - Analysis Biological Data 3e
  • [[20200309221437]] Degrees of freedom - Analysis Biological Data 3e
  • [[20200309225613]] Probability models in biology - Analysis Biological Data 3e

8.3 Goodness-of-fit tests when there are only two categories

  • [[20200309231524]] Chi square with two categories - Analysis Biological Data 3e

8.4 Random in space or time: the Poisson distribution

  • [[20200309231632]] Poisson distribution - Analysis Biological Data 3e

Chapter 9 - Contingency Analysis - Associations between categorical variables

  • [[20200310115615]] Contigency table - Analysis Biological Data 3e
  • [[20200310115905]] Contingency analysis - Analysis Biological Data 3e

9.1 Associating two categorical variables

  • An association between teo categorical variables implies that the two variables are not independent.

9.2 Estimating association in 2x2 tables: relative risk

  • [[20200310120148]] Relative risk - Analysis Biological Data 3e

9.3 Estimating association in 2 x 2 tables: the odds ratio

  • [[20200310214016]] Odds - Analysis Biological Data 3e
  • [[20200310220804]] Odds ratio - Analysis Biological Data 3e

9.4 The Chi-square contingency test

  • [[20200310225151]] Chi-square contingency test - Analysis Biological Data 3e

9.5 Fisher’s exact test

  • [[20200310231733]] Fisher’s exact test - Analysis Biological Data 3e

Chapter 14 - Experimental Design

  • [[20200216161846]] Randomization - Analysis Biological Data 3e

Chapter 15 - ANOVA

15.1 The analysis of variance

  • [[20200322090654]] ANOVA in a nutshell - Analysis Biological Data 3e

Hi Alex,

Thank you for sharing your structure, this is really useful and insightful!

I was wondering what is your process to enter your notes in your ZK system? Do you take fleeting notes and dedicate some time to transform them into permanent notes in your system? Do you write them as you go?

When I see your outline note for the Whitlock and Schluter book, I cannot help but be afraid by the amount of time this must have taken you…

Thank you so much if you can comment that!

@victor Thanks for your question. Glad my sharing my notes is helpful.

I was wondering what is your process to enter your notes in your ZK system? Do you take fleeting notes and dedicate some time to transform them into permanent notes in your system? Do you write them as you go?

I don’t really make the fleeting vs permanent distinction. If I’m working on something that is worth knowing again in the future, it goes into the ZK. To me, there isn’t much a downside of putting more stuff in. I do revisit notes and add more or link more, but I don’t do it as a dedicated practice. So, to your question, most of the time, I add notes as I go along (for example, as I’m reading a book or paper). I’ve thought about putting it off till a later time (making marks in a book and coming back later to process), but the truth is, I probably won’t do that most of the time, so it’s more of a certainty for me to get something in right at the moment, rather than to have a huge backlog that I don’t touch.

When I see your outline note for the Whitlock and Schluter book, I cannot help but be afraid by the amount of time this must have taken you…

Stuff like that takes time, but that’s just the cost of doing knowledge work. You could also write fewer notes. But if it’s something worth knowing and retrieving again in the future, I capture it.

What applications are you thinking about for your notes system?

Interesting, thanks!

That’s a great point.

I’m using Vim. I implemented my own plugin for this, but I never found the motivation to improve it to the point that it became frictionless. So I’ll switch to vimwiki and vim-zettel.

What about you?