Text Pre-Processing for NLP

Series: Natural Language Processing for All

When

3 – 4 p.m., Oct. 10, 2024

Where

Weaver Science-Engineering Library, Room 212 & Zoom

Prepare your text data for advanced analysis with our primer on text pre-processing for Natural Language Processing. Text pre-processing is a crucial step in any NLP pipeline, ensuring that your data is clean, normalized, and ready for modeling. This workshop will introduce pre-processing techniques for text data from sources such as web scraping and online datasets. We will take a look at tools available for categorising, organizing and tagging our text.

With a practical demonstration, we will explore handling various text formats, dealing with noise, and transforming text into a format suitable for machine learning algorithms. Whether you are interested in an NLP task or just making sense of a data dump, join us for this session on the tools and knowledge to optimize your text data effectively!

Join us for an engaging and accessible introduction to Natural Language Processing (NLP) and its practical applications for everyday tasks! In "NLP for All," we will explore the fundamental concepts behind NLP: From understanding how computers interpret human language; to discovering how to improve search queries, use regular expressions, find datasets, and learn about pipelines for working with language. Whether you're curious about chatbots, voice assistants, or automated text transcription and analysis, this series will demystify popular technologies and show you how they work.

What We Will Cover:

Foundations of NLP: Gain a solid grasp of NLP concepts and terminology without needing a technical background.
Real-World Applications: Explore practical uses of NLP in various contexts, such as improving search and information retrieval, generating and evaluating automatic transcriptions, and working with popular libraries such as spaCy, PyTorch and scikit-learn.
Hands-On Experience: We will illustrate NLP concepts in action with a well-documented code notebook, aimed at solving practical examples. We will also explore online sources for NLP tools and datasets, such as HuggingFace.

Pre-requisites:

A Google account to run Google Colab (where we will do most of our programming exercises)
Basic knowledge of Python. You can brush up python fundamentals with Software Carpentry's Introduction to Python (section 1).

SERIES: Natural Language Processing for All

Add the Series to your calendar. Add to calendar

When: Thursdays, 3-4 pm, Sept. 5 - Oct. 24, 2024
Where: Weaver Science-Engineering Library, Rm 212 and on Zoom
Instructors: Megh Krishnaswamy
YouTube: The video links would be posted here.

Workshop Materials

09/05 Introduction to NLP with SpaCy - YouTube
09/12 Regular Expressions for NLP - YouTube
09/19 NLP with Transformers - YouTube
09/26 Introduction to Semantic Search - YouTube
10/03 Introduction to Information Extraction
10/10 Text pre-processing for NLP - YouTube
10/17 Introduction to Speech Technology
10/24 Speech-to-Text with Whisper AI - YouTube
REGISTER and Add to calendar

Contacts

Megh Krishnaswamy

mkrishnaswamy@email.arizona.edu

Text Pre-Processing for NLP

When

Where

What We Will Cover:

Pre-requisites:

SERIES: Natural Language Processing for All

Contacts

Upcoming Events

Data science: more fun, less pain with tidyverse

Git and GitHub with RStudio

Machine learning in R part 1: regression