David Kauchak ("Dr. Dave")

Professor
Computer Science Department
Pomona College
224 Edmunds
Claremont, CA 91711

david.kauchak pomona edu

Spring 2024 office hours:
 Mon 2:30-4pm
 Wed 9-10:30am
 Fri 9-10am
  and by appointment

Spring 2024 courses:
  CS140 - Introduction to Algorithms (sections 1 and 2)


Older teaching:


My main research interests lie in natural language processing (NLP), particularly applied NLP. My current research focuses on text simplification, which aims to reduce the complexity of text while maintaining the content.

Text simplification data sets


Python Sea and Space Images

Resources for the paper "A Course-long Information Retrieval Project"

We built a search engine called "Bursti" in Fall 09 in the information retrieval course (cs160). Check it out. To find out more about how it works see the white papers.

Some fun in the Fall09 intro course studying strings. Less enthused 10 min. later :)

Older projects

Winter 2005: Statistical Machine Translation Tutorial Resources

Useful software, data, etc

Publications

Stefanos Stoikos, David Kauchak, Douglas Turnbull, and Alexandra Papoutsaki (2023). Cross-Language Music Recommendation Exploration. In Proceedings of the International Conference on Multimedia Retrieval (ICMR).

Arif Ahmed, Gondy Leroy, Han Ya Lu, David Kauchak, Jeff Stone, Philip Harber, Stephen A. Rains, Prashant Mishra, Bhumi Chitroda (2023). Audio delivery of health information: An NLP study of information difficulty and bias in listeners. In Procedia Computer Science.

Gondy Leroy, David Kauchak, Diane Haeger and Douglas Spegman (2022). Evaluation of an Online Text Simplification Editor Using Manual and Automated Metrics for Perceived and Actual Text Difficulty. Journal of the American Medical Informatics Association (JAMIA) Open.

Phuong Nguyen and David Kauchak (2022). Complex Word Identification in Vietnamese: Towards Vietnamese Text Simplification. In Proceedings of the Workshop on Multilingual Information Access (MIA).

David Kauchak, Jorge Aparicio and Gondy Leroy (2022). Improving the Quality of Suggestions For a Text Simplification Tool. In Proceedings of American Medical Informatics Association Informatics Summit.

Gondy Leroy, David Kauchak, and Nicholas Kloen (2021). Incidence and impact of missing functional elements on information comprehension using audio and text. In American Medical Informatics Association (AMIA) Annual Fall Symposium.

Max Schwarzer, Teerapaun Tanprasert and David Kauchak (2021). Improving Human Text Simplification with Sentence Fusion. In Proceedings of the Workshop on Graph-Based Methods for Natural Language Processing (TextGraph).

Teerapaun Tanprasert and David Kauchak (2021). Flesch-Kincaid is Not a Text Simplification Evaluation Metric. In Proceedings of the Workshop on Generation Evaluation and Metrics (GEM).

Teerapaun Tanprasert and David Kauchak (2021). Problems with Flesch-Kincaid as a Text Simplification Evaluation Metric. In Proceedings of Southern California Natural Language Processing Symposium (SocalNLP).

Hoan Van, David Kauchak and Gondy Leroy (2020). AutoMeTS: Autocomplete for Medical Text Simplification. In Proceedings of International Conference on Computational Linguistics (COLING).

Connor Ford and David Kauchak (2020). The Impact of Collaborative Interfaces on Text Simplification. In Proceedings of Conversational User Interfaces Workshop (CUI@CSCW -- Workshop at ACM Conference on Computer-Supported Cooperative Work and Social Computing).

David Kauchak and Gondy Leroy (2020). A Web-Based Medical Text Simplification Tool. In Hawaii International Conference on System Sciences (HICSS).

David Kauchak, Gondy Leroy, Menglu Pei, and Sonia Colina (2019). Predicting Transition Words Between Sentences for English and Spanish Medical Text. In Proceedings of American Medical Informatics Association (AMIA).

Gondy Leroy and David Kauchak (2019). A comparison of text versus audio for information comprehension with future uses for smart speakers. In Journal of the American Medical Informatics Association (JAMIA) Open.

Andras Szep, Martin Szep, Gondy Leroy, David Kauchak, Nick Kloehn, Deborah Revere, and Melissa Just (2019). Algorithmic Generation of Grammar Simplification Rules Using Large Corpora. In Proceedings of American Medical Informatics Association Summit.

Menglu Pei, Gondy Leroy, David Kauchak, Martin Szep, and Andras Szep (2019). Splitting Sentences for Text Simplification: A Machine Learning Approach. In Proceedings of American Medical Informatics Association Summit (poster paper).

Partha Mukherjee, Gondy Leroy, and David Kauchak (2019). Using Lexical Chains to Identify Text Difficulty: A Corpus Statistics and Classification Study. In IEEE Journal of Biomedical Health Informatics.

David Kauchak, Gondy Leroy, and Melissa Grueter (2018). Demo: An Online Evidence-based Text Simplification Editor for Medical Text. In Workshop on Information Technology and Systems (WITS).

Nicholas Kloehn, Gondy Leroy, David Kauchak, Yang Gu, Sonia Colina, Nicole P. Yuan and Debra Revere (2018). Improving Consumer Understanding of Medical Text: Development and Validation of a New SubSimplify Algorithm to Automatically Generate Term Explanations in English and Spanish. In Journal of Medical Internet Research (JMIR).

Max Schwarzer and David Kauchak (2018). Human Evaluation for Text Simplification: The Simplicity-Adequacy Tradeoff. In SocalNLP Symposium. Best undergraduate paper award.

David Kauchak, Gondy Leroy and Alan Hogue (2017). Measuring Text Difficulty Using Parse-Tree Frequency. In Journal of the Association for Information Science and Technology.

Partha Mukherjeea, Gondy Leroy, David Kauchak, Srinidhi Rajanarayanan, Damian Y. Romero Diaz, Nicole P. Yuan, Gail Pritchard, Sonia Colina (2017). NegAIT: A New Parser for Medical Text Simplification Using Morphological, Sentential and Double Negation. In Journal of Biomedical Informatics.

Yang Gu, Gondy Leroy and David Kauchak (2017). When Synonyms Are Not Enough: Optimal Parenthetical Insertion for Text Simplification. In American Medical Informatics Association (AMIA) Fall Symposium.

Partha Mukherjee, Gondy Leroy, David Kauchak, Brianda A. Navarrete, Damien Y. Diaz, and Sonia Colina (2017). The Role of Surface, Semantic and Grammatical Features on Simplification of Spanish Medical Texts: A User Study. In American Medical Informatics Association (AMIA) Fall Symposium.

Gondy Leroy, Brianda A. Navarrete, Sonia Colina and David Kauchak (2017). Spanish Text Simplification Using Term Familiarity: Applying Principles from English Text Simplification. In American Medical Informatics Association (AMIA) Fall Symposium (poster paper).

Debra Revere, Partha Mukherjee, David Kauchak and and Gondy Leroy (2017). Creating a Corpus Resource for Text Simplification Research and Development. In American Medical Informatics Association (AMIA) Fall Symposium (poster paper).

David Kauchak, Gondy Leroy and Melissa Just (2016). Grammar Frequency and Simplification: When Intuition Fails. In American Medical Informatics Association (AMIA) Fall Symposium (poster paper). Distinguished Poster Award.

David Kauchak and Gondy Leroy (2016). Moving Beyond Readability Metrics for Health-Related Text Simplification. IEEE IT Professional.

David Kauchak (2016). Pomona at SemEval-2016 Task 11: Predicting Word Complexity Based on Corpus Frequency. In Proceedings of International Workshop on Semantic Evaluation (SemEval)..

Gondy Leroy, David Kauchak and Alan Hogue (2016). Effects on Text Simplification: Evaluation of Splitting Up Noun Phrases. Journal of Health Communication.

Colby Horn, Katie Manduca and David Kauchak (2014). Learning a Lexical Simplifier Using Wikipedia. In Proceedings of ACL (short paper).

David Kauchak, Obay Mouradi, Christopher Pentoney and Gondy Leroy (2014). Text Simplification Tools: Using Machine Learning to Discover Features that Identify Difficult Text. In Hawaii International Conference on System Sciences (HICSS).

Gondy Leroy and David Kauchak (2013). The Effect of Word Familiarity on Actual and Perceived Text Difficulty. In Journal of American Medical Informatics Association.

Daniel Feblowitz and David Kauchak (2013). Sentence Simplification as Tree Transduction. In Proceedings of PITR (ACL Workshop).

David Kauchak (2013). Improving Text Simplification Language Modeling Using Unsimplified Text Data. In Proceedings of ACL. Associated data.

Gondy Leroy, James E. Endicott, David Kauchak, Obay Mouradi and Melissa Just (2013). User Evaluation of the Effects of a Text Simplification Algorithm Using Term Familiarity on Perception, Understanding, Learning, and Information Retention. In Journal of Medical Internet Research (JMIR).

Gondy Leroy, David Kauchak and Obay Mouradi (2013). A User-study Measuring the Effects of Lexical Simplification and Coherence Enhancement on Perceived and Actual Text Difficulty. In International Journal of Medical Informatics (IJMI).

Obay Mouradi, Gondy Leroy, David Kauchak and James E. Endicott (2013). Influence of Text and Participant Characteristics on Perceived and Actual Text Difficulty. In Hawaii International Conference on System Sciences (HICSS).

Gondy Leroy, James Endicott, Obay Mouradi, David Kauchak and Melissa Just (2012). Improving Perceived and Actual Text Difficulty for Health Information Consumers using Semi-Automated Methods. In American Medical Infomatics Association (AMIA) Fall Symposium.

David Kauchak, Gondy Leroy and William Coster (2012). A Systematic Grammatical Analysis of Easy and Difficult Medical Text. In American Medical Infomatics Association (AMIA) Fall Symposium (poster paper).

William Coster and David Kauchak (2011). Learning to Simplify Sentences Using Wikipedia. In Proceedings of Text-To-Text Generation, ACL Workshop.

William Coster and David Kauchak (2011). Simple English Wikipedia: A New Text Simplification Task. In Proceedings of ACL (short paper). Associated data.

Guillermo Gomez-Hicks and David Kauchak (2011). Dynamic game difficulty balancing for backgammon. In Proceedings of ACM SouthEast.

David Kauchak (2010). A Course-long Information Retrieval Project. In Proceedings of Symposium on Educational Advances in Artificial Intelligence (EAAI). Associated resources available.

David Kauchak (2006). Contribution to Research on Machine Translation. Doctoral dissertation, University of California, San Diego.

David Kauchak and Regina Barzilay (2006). Paraphrasing for Automatic Evaluation. In Proceedings of HLT-NAACL.

Rasmus E. Madsen, David Kauchak and Charles Elkan (2005). Modeling Word Burstiness Using the Dirichlet Distribution. In Proceedings of the International Conference on Machine Learning (ICML'05).

David Kauchak and Francine Chen (2005). Feature-Based Segmentation of Narrative Documents. In Proc. of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, 32-39.

David Kauchak, Joseph Smarr and Charles Elkan (2004). Sources of Success for Boosted Wrapper Induction. In Journal of Machine Learning Research, 5, 499 - 527.

David Kauchak and Sanjoy Dasgupta (2003). An Iterative Improvement Procedure for Hierarchical Clustering. In Advances in Neural Information Processing Systems (NIPS).

David Kauchak and Charles Elkan (2003). Learning Rules to Improve a Machine Translation System. In Proceedings of the European Conference on Machine Learning (ECML).