For the 6 mo dataset, we sequenced 19 organoids from 4 bioreactors: 4 organoids from bioreactor 1 (Org1A, 3,547 cells; Org1B, 3,463 cells; Org1C, 3,698 cells; Org1D, 2,811 cells), 3 organoids from bioreactor 2 (Org2A, 2,238 cells; Org2B, 3,159 cells; Org2C, 2,708 cells), 4 organoids from bioreactor 3 (Org3A, 4,225 cells; Org3B, 2,557 cells; Org3C, 3,614 cells; Org3D, 11,061 cells) and 8 organoids from bioreactor 4 (Org4A, 1,656 cells; Org4B, 1,663 cells; Org4C, 1,795 cells; Org4D, 2,151 cells; Org4E, 3,407 cells; Org4F 7,443 cells; Org4G, 2,905 cells; Org4H, 2,788 cells). For the 3 mo dataset, we sequenced 12 organoids from bioreactor 6 (8,478 cells) and bioreactor 7 (6,924 cells).
Droplets containing single cells and barcoded micro-particles were generated and processed as described in
11 (link). Briefly, droplets were collected and beads were recovered and processed for immediate reverse transcription. The resulting cDNA was amplified, fragmented and further amplified using the Nextera XT DNA library preparation kit. Sequencing was performed on the Illumina NextSeq 500.
Clustering of cells derived from 6-month organoids was performed using the Seurat R package
12 (link), with some modifications from the procedure described previously
11 (link). Clustering was done in two iterative rounds of principal components analysis (PCA). First, digital gene expression matrices were column-normalized and log-transformed. Cells with fewer than 400 expressed genes were removed from analysis. A set of variable genes was then identified by binning the average expression of all genes into 300 evenly sized groups and computing the median dispersion (variance divided by the mean) in each bin. Genes were selected for inclusion in PCA that had higher than twice the median dispersion, minus the minimum value (final set: 1,568 genes). The edges of a nearest neighbor graph were generated by computing the fraction of shared nearest neighbors amongst cells in the first 20 PC dimensions using the approximate nearest neighbors package (ANN) in R (CRAN), setting the k parameter to 25 (“BuildSNN” function in Seurat). A first round of clustering with the Louvain modularity-based community detection algorithm
39 (link) set at a resolution of 0.01 was used to generate a total of 10 first-round clusters (“FindClusters” function in Seurat). The largest 50% of the cells from each of these clusters was again subjected to gene selection and PCA. These PCs were evaluated for statistically significant gene expression signals using the Jackstraw method
11 (link),40 (link) (“JackStraw” function in Seurat). At most 15 PCs were used in this second round of clustering by Louvain, with the resolution parameter set at 3. The resulting clusters were compared pairwise for differential expression, as in
11 (link), and clusters with fewer than 10 genes differentially expressed by more than 2-fold were merged, producing 202 clusters. For analysis of organoid-to-organoid variability, organoids were excluded from a given cluster if they contributed less than 1% of the cells in that cluster.
Correlation analysis between gene expression in a dataset of human fetal cortex
22 (link) against the astrocyte cluster (c2) and the identified subclusters of the forebrain cluster (c4) was performed using the log average expression of a set of 104 genes, identified by taking the top 10 most differentially expressed genes for each cluster pair in the published fetal cortex dataset (some of which overlapped) as the most informative for distinguishing the reported endogenous cell classes of the cortex. We then constructed expression profiles for the six organoid cell groups and measured the correlation of gene expression levels for the 104 endogenous genes, comparing each of the endogenous cortical cell classes to each of the organoid cell groups. For the retinal subclusters (subclusters of c5), we repeated this correlation analysis against a dataset of P14 mouse retina
11 (link), using 110 genes, identified by taking the top 10 most enriched genes from the 11 major cell classes (horizontal cells, retinal ganglion cells, amacrine cells, photoreceptors, bipolar cells, and six glial retinal types) in the mouse dataset, and correlating against the expression profiles of the orthologous human genes in the six organoid retinal cell groups.
Quadrato G., Nguyen T., Macosko E.Z., Sherwood J.L., Yang S.M., Berger D., Maria N., Scholvin J., Goldman M., Kinney J., Boyden E.S., Lichtman J., Williams Z.M., McCarroll S.A, & Arlotta P. (2017). Cell diversity and network dynamics in photosensitive human brain organoids. Nature, 545(7652), 48-53.