Zu dieser Karteikarte gibt es einen kompletten Satz an Karteikarten. Kostenlos!
49
Stemming
• Stemming is the process of reducing a word into its stem.
• The stem or root form is not necessarily a word by itself, but it
can be used to generate words by concatenating the right suffix.
• Example:
• fish, fishes and fishing stems into fish
It is a correct word
• study, studies and studying stems into studi
It is not an English word.
• Most commonly, stemming algorithms (a.k.a. stemmers) are
based on rules for suffix stripping.
• The most famous algorithm is the Porter stemmer. Introduced in 1979.
• A more aggressive stemming algorithm is the Lancaster stemmer. Introduced in 1990.
• Es gibt mehrere Python Libaries wie:NLTK und PyStemmer.
Stemming in Python
• Stemming with NLTK
import nltk
from nltk.stem.porter import PorterStemmer
def stem(tokens):
stem = []
for item in tokens:
stems.append(PorterStemmer().stem(item))
return stems
• Stemming with PyStemmer
import Stemmer
def stem(tokens):
stemmer = Stemmer.Stemmer('english')
stems = stemmer.stemWords(tokens)
return stems
• The stem or root form is not necessarily a word by itself, but it
can be used to generate words by concatenating the right suffix.
• Example:
• fish, fishes and fishing stems into fish
It is a correct word
• study, studies and studying stems into studi
It is not an English word.
• Most commonly, stemming algorithms (a.k.a. stemmers) are
based on rules for suffix stripping.
• The most famous algorithm is the Porter stemmer. Introduced in 1979.
• A more aggressive stemming algorithm is the Lancaster stemmer. Introduced in 1990.
• Es gibt mehrere Python Libaries wie:NLTK und PyStemmer.
Stemming in Python
• Stemming with NLTK
import nltk
from nltk.stem.porter import PorterStemmer
def stem(tokens):
stem = []
for item in tokens:
stems.append(PorterStemmer().stem(item))
return stems
• Stemming with PyStemmer
import Stemmer
def stem(tokens):
stemmer = Stemmer.Stemmer('english')
stems = stemmer.stemWords(tokens)
return stems
Karteninfo:
Autor: CoboCards-User
Oberthema: PTT
Thema: PTT
Schule / Uni: Uni Koblenz
Ort: Koblenz
Veröffentlicht: 08.07.2016