The project
The 4D Nucleome Data Portal is the public data system for the 4D Nucleome Network. It supports search, browsing, metadata-driven data management, bulk downloads, integrated visualization, and access to genomics and microscopy datasets generated by the consortium and relevant external sources.
I was the Lead Senior Software Developer for the 4DN Data Coordination and Integration Center during my Harvard Medical School chapter, working in Peter J. Park's lab in the Department of Biomedical Informatics. The practical software problem was to make complex nuclear-architecture data usable: model the metadata, manage submitted data and files, support QA workflows, expose search and API surfaces, and help researchers find and visualize the right datasets.
The system shape
The public Nature Communications paper describes the portal infrastructure as a data system built around PostgreSQL metadata stored as JSON, the Python/Pyramid SnoVault database layer, the FourFront front-end, Elasticsearch search, AWS S3 file storage, and a RESTful API. That is the kind of product/platform work I led: data modeling, backend services, search, file infrastructure, submission workflows, and usable interfaces for scientific users.
The portal also had to handle microscopy data alongside sequencing data. That matters because microscopy metadata and file workflows are different from standard genomics pipelines, and the product had to make those differences manageable for curators and researchers.
Public proof
The 4DN DCIC About page lists me as an alumni Lead Senior Software Developer. The Nature Communications 4DN Data Portal paper lists me as a named author. The Tibanna Bioinformatics paper also lists me as an author and identifies Soohyun Lee and Jeremy Johnson as joint first authors.
The code availability trail is public. The paper points to the 4DN DCIC GitHub organization, including FourFront for the data portal, SnoVault for the object-relational database layer, Tibanna for scalable cloud pipeline execution, and Common Workflow Language pipelines used by the DCIC.
What carried forward
This work sharpened the same instincts that matter in AI platforms: metadata quality, workflow clarity, searchability, provenance, access boundaries, QA, visualization, and the discipline required to make advanced data usable by people outside the engineering team.