Alert button
Picture for Zejiang Shen

Zejiang Shen

Alert button

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Jan 31, 2024
Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo

Viaarxiv icon

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

Aug 24, 2023
Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Zejiang Shen, Luca D'Amico-Wong, Quan Le, Pablo Querubin, Leander Heldring

Figure 1 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Figure 2 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Figure 3 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Figure 4 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Viaarxiv icon

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

Jun 01, 2023
Catherine Chen, Zejiang Shen, Dan Klein, Gabriel Stanovsky, Doug Downey, Kyle Lo

Figure 1 for Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Figure 2 for Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Figure 3 for Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Figure 4 for Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Viaarxiv icon

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

Apr 05, 2023
Zejiang Shen, Tal August, Pao Siangliulue, Kyle Lo, Jonathan Bragg, Jeff Hammerbacher, Doug Downey, Joseph Chee Chang, David Sontag

Figure 1 for Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks
Viaarxiv icon

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

Mar 25, 2023
Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney, Aniket Kittur, Hyeonsu Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita Rao, Paul Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, Daniel S. Weld

Figure 1 for The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Figure 2 for The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Figure 3 for The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Figure 4 for The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Viaarxiv icon

The Semantic Scholar Open Data Platform

Jan 24, 2023
Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David Graham, Fangzhou Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin, Haokun Liu, Kyle Lo, Jaron Lochner, Kelsey MacMillan, Tyler Murray, Chris Newell, Smita Rao, Shaurya Rohatgi, Paul Sayre, Zejiang Shen, Amanpreet Singh, Luca Soldaini, Shivashankar Subramanian, Amber Tanaka, Alex D. Wade, Linda Wagner, Lucy Lu Wang, Chris Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Madeleine Van Zuylen, Daniel S. Weld

Figure 1 for The Semantic Scholar Open Data Platform
Figure 2 for The Semantic Scholar Open Data Platform
Figure 3 for The Semantic Scholar Open Data Platform
Figure 4 for The Semantic Scholar Open Data Platform
Viaarxiv icon

Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities

Jun 23, 2022
Zejiang Shen, Kyle Lo, Lauren Yu, Nathan Dahlberg, Margo Schlanger, Doug Downey

Figure 1 for Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Figure 2 for Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Figure 3 for Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Figure 4 for Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Viaarxiv icon

Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

Mar 16, 2022
Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug Downey

Figure 1 for Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Figure 2 for Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Figure 3 for Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Figure 4 for Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Viaarxiv icon

Incorporating Visual Layout Structures for Scientific Text Classification

Jun 21, 2021
Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey

Figure 1 for Incorporating Visual Layout Structures for Scientific Text Classification
Figure 2 for Incorporating Visual Layout Structures for Scientific Text Classification
Figure 3 for Incorporating Visual Layout Structures for Scientific Text Classification
Figure 4 for Incorporating Visual Layout Structures for Scientific Text Classification
Viaarxiv icon