4KLSDB

DataCV @ CVPR 2026

A Large-Scale Dataset for 4K Image Restoration and Generation

Zihao Zhu1 Kuan-Ru Huang1 Zhaoming Xu1 Renjie Li1 Bo Wu1 Ruizheng Bai1 Mingyang Wu1 Sayak Paul2 Zhengzhong Tu†,1

¹Texas A&M University ²Hugging Face
† Corresponding author

arXiv Website GitHub Hugging Face

Example images from 4KLSDB spanning nature, urban scenes, people, food, CGI, artwork, and more. 4KLSDB contains 129,484 carefully curated native-4K training images, 2,000 validation images, and 1,984 test images. The dataset is designed to support both native-4K image restoration (super-resolution) and 4K text-to-image generation research.

Abstract

High-resolution datasets are essential for advancing super-resolution (SR) and text-to-image (T2I) diffusion research. However, current publicly available datasets lack both the native 4K resolution and the extensive scale necessary for training state-of-the-art models. To address this gap, we introduce a 4K Large Scale Dataset and Benchmark (4KLSDB), a large-scale, diverse dataset consisting of 129,484 carefully curated 4K resolution images spanning multiple categories such as nature, urban scenes, people, food, artwork, and CGI, alongside distinct validation and test sets containing 2,000 and 1,984 images respectively. Images were sourced from established open datasets including Photo Concept Bucket, LAION-2B, and PD12M. 4KLSDB underwent rigorous multi-stage automated filtering and annotation pipelines involving both human annotators and Large Multimodal Models (LMMs) to ensure high aesthetic quality and dataset consistency. We demonstrate 4KLSDB's effectiveness by training representative super-resolution and diffusion models, observing significant improvements in performance on native 4K benchmarks. Comprehensive experiments illustrate a positive correlation between training on true 4K resolution data and improved fidelity in image restoration, especially at 4K resolution.

129,484

Train Images

2,000

Validation

1,984

Test

3840+ px

Native 4K

Dataset

4KLSDB is the first publicly released native-4K dataset that scales to over 100k images and supports both image restoration and generation.

Dataset	#Train	#Val	#Test	Max Res.	Native 4K
DIV2K	800	100	100	2K	✗
LSDIR	84,991	1,000	1,000	2K	✗
DIV8K	1,500	100	100	8K	✓†
DiffusionDB	14,000,000	–	–	1024×1024	✗
HQ-Edit	~200,000	–	–	900×900	✗
4KLSDB (Ours)	129,484	2,000	1,984	4K	✓

† DIV8K contains some 8K-resolution images, but its total scale remains limited for large-scale training.

Curation Pipeline

A robust multi-stage filtering and quality-control pipeline combining rule-based checks, LMM-based aesthetic scoring, and human vetting.

Overview of the 4KLSDB filtering pipeline. An initial raw image pool is progressively refined through automated filters and a final manual inspection stage to obtain a high-quality, aesthetically aligned 4K dataset. Resolution-based pre-filtering enforces a minimum dimension of 3840 px and a $3840\times2160$ pixel budget. Q-Align is used to obtain quality and aesthetic scores, retaining the top 80%. Laplacian variance and Sobel-patch flatness ratio further remove overly flat, blurry, or low-texture samples. Two human annotators then review every remaining image, yielding the final native-4K dataset.

Benchmark Results

Native-4K supervision consistently boosts both classical SR and real-world SR models.

Classical Super-Resolution on 4KLSDB Test Set

Model	×4		×8		×16
	PSNR↑	SSIM↑	PSNR↑	SSIM↑	PSNR↑	SSIM↑
HiT-SR (pretrained)	24.50	0.6839	22.25	0.6394	19.47	0.5741
HiT-SR (4KLSDB)	29.27	0.7896	24.75	0.6928	23.69	0.6414
SwinIR (DF2K)	24.11	0.6738	20.96	0.5915	19.20	0.5684
SwinIR (4KLSDB)	28.79	0.7774	25.89	0.6877	23.69	0.6376
MambaIR (pretrained)	25.92	0.7259	21.51	0.6382	19.47	0.5741
MambaIR (4KLSDB)	30.92	0.8216	23.84	0.7195	23.69	0.6414

Real-World Super-Resolution (4KLSDB Test Set)

Method	Scale	PSNR↑	SSIM↑	LPIPS↓	DISTS↓	FID↓
OSEDiff	×4	27.36 / 27.50	0.7511 / 0.7568	0.2863 / 0.2546	0.1604 / 0.1431	28.07 / 28.35
OSEDiff	×8	23.86 / 24.10	0.6021 / 0.6188	0.5463 / 0.4252	0.1833 / 0.1448	19.56 / 17.74
OSEDiff	×16	22.65 / 22.69	0.6213 / 0.5966	0.6571 / 0.4866	0.2861 / 0.2170	51.76 / 33.97
SeeSR	×4	27.01 / 28.25	0.6996 / 0.7340	0.5231 / 0.4511	0.1407 / 0.1272	38.95 / 33.88
SeeSR	×8	24.10 / 24.50	0.6510 / 0.6713	0.5117 / 0.4628	0.1607 / 0.1551	77.46 / 74.46
SeeSR	×16	24.02 / 24.43	0.6810 / 0.7001	0.5594 / 0.5197	0.1699 / 0.1640	77.41 / 74.40

Each cell shows baseline / 4KLSDB fine-tuned. Bold marks the better result.

4K Text-to-Image Generation (Sana fine-tuned on 4KLSDB)

Model	pCLIPScore ↑	pNIQE ↓
Sana (baseline)	28.62	5.21
Sana + 4KLSDB	29.27	4.63

User Study (Sana + 4KLSDB vs. Sana baseline)

Overall ↑	Detail ↑	Realism ↑	Artifact ↑	Alignment ↑
57.34%	60.89%	74.27%	64.40%	52.29%

Double-blind pairwise win rate of 4KLSDB-fine-tuned Sana over Sana baseline.

Qualitative Comparisons

Swipe or use arrows to browse comparisons across SR and T2I tasks.

Real-SR: SeeSR vs. SeeSR + 4KLSDB

From top to bottom: LR input, baseline SeeSR, and SeeSR fine-tuned on 4KLSDB (ours). Insets show sharper structures and more realistic local details.

T2I: Sana vs. Sana + 4KLSDB

Identical prompts and inference settings. Fine-tuning on 4KLSDB produces sharper boundaries and more coherent high-frequency textures in zoomed regions.

Downloads

Dataset, code, and pretrained checkpoints are all openly released.

Dataset

129,484 train · 2,000 val · 1,984 test native-4K images with captions.

HuggingFace

Classical SR Checkpoints

HiT-SR / SwinIR / MambaIR fine-tuned on 4KLSDB for ×4/×8/×16.

HiT-SR SwinIR MambaIR

Real-World SR Checkpoints

OSEDiff / SeeSR fine-tuned on 4KLSDB blind-degradation pipeline.

OSEDiff SeeSR

4K T2I Checkpoint

Sana fine-tuned on 4KLSDB for 4096×4096 text-to-image generation.

Sana 4K

Code

Training, inference, and one-click evaluation scripts for every model.

GitHub Quick Start

Paper

4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation.

arXiv PDF BibTeX

BibTeX

@misc{zhu20264klsdblargescaledataset4k,
      title={4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation},
      author={Zihao Zhu and Kuan-Ru Huang and Zhaoming Xu and Renjie Li and Bo Wu and Ruizheng Bai and Mingyang Wu and Sayak Paul and Zhengzhong Tu},
      year={2026},
      eprint={2605.24762},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.24762},
}