Jiuxiang Gu | Jiuxiang Gu

Latest

ADoPD: A Large-Scale Document Page Decomposition Dataset (ICLR 2024)
Aims: All-inclusive multi-level segmentation for anything (NeurIPS 2024)
Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers (arXiv 2024)
Customization Assistant for Text-to-image Generation (CVPR 2024)
DocScript: Document-level Script Event Prediction (COLING 2024)
Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond (arXiv 2024)
Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic (arXiv 2024)
SOHES: Self-supervised Open-world Hierarchical Entity Segmentation (ICLR 2024)
Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers (arXiv 2024)
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective (arXiv 2024)
A Critical Analysis of Document Out-of-Distribution Detection (EMNLP 2023)
DocEdit: language-guided document editing (AAAI 2023)
Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances (NNACL 2023)
LayerDoc: layer-wise extraction of spatial hierarchical structure in visually-rich documents (WACV 2023)
Learning the visualness of text using large vision-language models (EMNLP 2023)
Llavar: Enhanced visual instruction tuning for text-rich image understanding (arXiv 2023)
Lrm: Large reconstruction model for single image to 3d (ICLR 2023)
MGDoc: Pre-training with multi-granular hierarchy for document image understanding (EMNLP 2023)
Reflection-tuning: Data recycling improves llm instruction-tuning (ACL (Findings) 2023)
Ca-ssl: Class-agnostic semi-supervised learning for detection and segmentation (ECCV 2022)
Delving into out-of-distribution detection with vision-language representations (NeurIPS 2022)
DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis. (INTERSPEECH 2022)
Doctime: A document-level temporal dependency graph parser (NNACL 2022)
Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval (CVPR 2022)
FedKC: Federated knowledge composition for multilingual natural language understanding (ACM Web 2022)
Improving the reliability for confidence estimation (ECCV 2022)
Learning adaptive axis attentions in fine-tuning: Beyond fixed sparse attention patterns (ACL 2022)
Meta spatio-temporal debiasing for video scene graph generation (ECCV 2022)
Open world entity segmentation (TPAMI 2022)
Open-vocabulary instance segmentation via robust cross-modal pseudo-labeling (CVPR 2022)
Tigan: Text-based interactive image generation and manipulation (AAAI 2022)
Towards language-free training for text-to-image generation (CVPR 2022)
UNISON: Unpaired cross-lingual image captioning (AAAI 2022)
User-Entity Differential Privacy in Learning Natural Language Models (Big Data 2022)
Exploiting semantic embedding and visual feature for facial action unit detection (CVPR 2021)
Multi-scale aligned distillation for low-resolution detection (CVPR 2021)
Selfdoc: Self-supervised document representation learning (CVPR 2021)
Towards interpreting and mitigating shortcut learning behavior of NLU models (NNACL 2021)
Unsupervised cross-lingual image captioning (AAAI 2021)
Finding it at another side: A viewpoint-adapted matching encoder for change captioning (ECCV 2020)
Resilient load restoration in microgrids considering mobile energy storage fleets: A deep reinforcement learning approach (PESGM 2020)
Self-supervised relationship probing (NeurIPS 2020)
Scene graph generation with external knowledge and image reconstruction (CVPR 2019)
Unpaired image captioning via scene graph alignments. In 2019 IEEE (ICCV 2019)
Watch It Twice: Video Captioning with a Refocused Video Encoder (ACM Multimedia 2019)
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models (CVPR 2018)
NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text. (TRECVID 2018)
Recent advances in convolutional neural networks (Pattern Recognition 2018)
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning (AAAI 2018)
Unpaired image captioning by language pivoting (ECCV 2018)
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction (Neurocomputing 2018)
An empirical study of language cnn for image captioning (ICCV 2017)