Absolutely! Here's a Python function that extracts keywords from text using the NLTK library:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from collections import Counter
def extract_keywords(text, num_keywords=5):
"""
Extract key words from a text string.
Args:
text (str): The text to extract keywords from
num_keywords (int): Number of keywords to return
Returns:
list: A list of the most common keywords
"""
# Download necessary NLTK data (only needed once)
nltk.download('punkt', quiet=True)
nltk.download('stopwords', quiet=True)
# Tokenize and convert to lowercase
words = word_tokenize(text.lower())
# Remove stopwords and non-alphabetic tokens
stop_words = set(stopwords.words('english'))
words = [word for word in words if word.isalpha() and word not in stop_words]
# Count word frequencies
word_freq = Counter(words)
# Return the most common words
return [word for word, freq in word_freq.most_common(num_keywords)]
You can use this function by passing in any text string and it will return the most common meaningful words. The
num_keywords parameter lets you control how many keywords are returned.
Would you like me to explain how it works in more detail or modify the function in any way?