Text Pre-Processing for NLP

Series: Natural Language Processing for All

Image
graphic

When

3 – 4 p.m., Oct. 10, 2024

Prepare your text data for advanced analysis with our primer on text pre-processing for Natural Language Processing. Text pre-processing is a crucial step in any NLP pipeline, ensuring that your data is clean, normalized, and ready for modeling. This workshop will introduce pre-processing techniques for text data from sources such as web scraping and online datasets. We will take a look at tools available for categorising, organizing and tagging our text.

With a practical demonstration, we will explore handling various text formats, dealing with noise, and transforming text into a format suitable for machine learning algorithms. Whether you are interested in an NLP task or just making sense of a data dump, join us for this session on the tools and knowledge to optimize your text data effectively!

Join us for an engaging and accessible introduction to Natural Language Processing (NLP) and its practical applications for everyday tasks! In "NLP for All," we will explore the fundamental concepts behind NLP: From understanding how computers interpret human language; to discovering how to improve search queries, use regular expressions, find datasets, and learn about pipelines for working with language. Whether you're curious about chatbots, voice assistants, or automated text transcription and analysis, this series will demystify popular technologies and show you how they work.

What We Will Cover:

  • Foundations of NLP: Gain a solid grasp of NLP concepts and terminology without needing a technical background.
  • Real-World Applications: Explore practical uses of NLP in various contexts, such as improving search and information retrieval, generating and evaluating automatic transcriptions, and working with popular libraries such as spaCy, PyTorch and scikit-learn.
  • Hands-On Experience: We will illustrate NLP concepts in action with a well-documented code notebook, aimed at solving practical examples. We will also explore online sources for NLP tools and datasets, such as HuggingFace.
Pre-requisites:

SERIES: Natural Language Processing for All

Add the Series to your calendar. Add to calendar 

When:  Thursdays, 3-4 pm, Sept. 5 - Oct. 24, 2024
Where: Weaver Science-Engineering Library, Rm 212 and on  Zoom
Instructors: Megh Krishnaswamy
YouTube: Links will be added after the workshops have been presented

Workshop Materials

Contacts