A game-changining achievement in genome science has identified over 140,000 distinct DNA loops in our cells, providing chromosome fold resolution that is among the best ever created. In this ongoing research that is part of the 4D Nucleome Consortium, it not only reveals how our genome is arranged in three-dimensional space but is also combined with deep learning algorithms that can predict how our genome is folded by merely looking at our DNA.

However, instead of being just X-shaped structures, chromosomes actually have complicated three-dimensional structures that fold within the cell nucleus. These intrachromatic structures that are mediated by protein molecules such as histone and cohesin molecules contribute towards the generation of chromatin loops that can bring together distal regions of the DNA molecules. The study was set out to solve the challenge of DNA contacts in two cell lines that are found in the human organism. These cell lines include H1 Embryonic Stem Cells and Immortalized Foreskin Fibroblasts.
Data integration was done with the aid of an Integrative Genome Modeling platform, which generated a total of 1,000 single-cell 3D genome structures per cell type. The structures were derived from 141,365 loops found within stem cells and 146,140 loops found within fibroblasts. These loops contained a broad spectrum of regulatory contexts such as enhancer-promoter pairs, as well as boundaries harbored by insulators. The loops contained topologically associated domains as well as compartmentalization’s. These two components play crucial roles within loops to determine three-dimensional genome structures.
This map has been possible due to the application of the latest techniques of chromosome conformation capture techniques like Hi-C and Micro-C, as well as other multi-way interaction studies like SPRITE and GAM. Each of these techniques has provided the detail at the level of megabase compartments as well as kilobase-size loops. Co-hesin rings and CTCF insulators, or condensin rings, have been shown to play a crucial role in chromatin loops as well as the higher-order compaction of chromatin.
Aside from their work on structure mapping, they also designed and trained deep learning models using transformer models like UNADON to predict chromatin structure solely on the basis of raw DNA sequence and epigenetics data. The models utilized multi-head attention mechanisms to analyze patterns over a megabase range and showed advantages over convolutional models and gradient boosting decision trees for individual cell type and cross-cell type predictions. The UNADON model reported median values for the Pearson correlation coefficient to be 0.97 for nuclear speckle proximity (SON TSA-seq) and 0.92 for lamina association (LMNB TSA-seq) predictions, even in unobserved cell types such as IMR-90 fibroblasts.
The predictive power of these models relies upon the learned patterns of the sequence motifs as well as the chromatin marks H3K4me3, H3K27ac, and H3K9me3, which are predictive of nuclear localization. With the help of the methods of integrated gradients, it has been verified that both the feature maps of DNA k-mer as well as the feature of histone modifications have predictive power with respect to nuclear localization. Analyses at the protein level also offered further insights.
Embryonic stem cells contained both mitotic cohesin (RAD21) and meiotic cohesin (REC8) complexes, both binding together with condensin (SMC4) complexes, sharing sites with the CTCF protein. The knockdown experiments further supported that a reduction in cohesin protein levels could influence the dynamics between factors that have a role in creating loops and factors that have a role in chromosome compaction, including stemness genes NANOG and OCT4. RAD21 knockdown also elevated the level of protein complexes between REC8 and SMC3, suggesting a dynamic adaptation that seeks a balanced composition between cohesin proteins.
This move is a twofold improvement. The maps have an empirical inventory that contains a plethora of circuits that encode a specific architecture based upon the specific cell type, with properties that also have predictive capacity based solely upon variations that have forecast capabilities based merely upon the genome architecture itself. This is a gigantic opportunity for genomics researchers who would investigate the potential between certain variations in non-coding regions with specific variations that have affected a improper architecture.

