Gensin kullanarak çok basit bir LDA uygulaması. Burada daha fazla bilgi bulabilirsiniz: https://radimrehurek.com/gensim/tutorial.html
Onu seni
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
from nltk.stem import RSLPStemmer
from gensim import corpora, models
import gensim
st = RSLPStemmer()
texts = []
doc1 = "Veganism is both the practice of abstaining from the use of animal products, particularly in diet, and an associated philosophy that rejects the commodity status of animals"
doc2 = "A follower of either the diet or the philosophy is known as a vegan."
doc3 = "Distinctions are sometimes made between several categories of veganism."
doc4 = "Dietary vegans refrain from ingesting animal products. This means avoiding not only meat but also egg and dairy products and other animal-derived foodstuffs."
doc5 = "Some dietary vegans choose to wear clothing that includes animal products (for example, leather or wool)."
docs = [doc1, doc2, doc3, doc4, doc5]
for i in docs:
tokens = word_tokenize(i.lower())
stopped_tokens = [w for w in tokens if not w in stopwords.words('english')]
stemmed_tokens = [st.stem(i) for i in stopped_tokens]
texts.append(stemmed_tokens)
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
# generate LDA model using gensim
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word = dictionary, passes=20)
print(ldamodel.print_topics(num_topics=2, num_words=4))
[yardımcı olabilir umut (0 u'0.066 * hayvan + 0.065 *, + 0.047 * Ürün + 0.028 * felsefesi '), (1, u'0.085 *. + 0.047 * ürün + 0.028 * diyet + 0.028 * veg ')]
https://www.cs.princeton.edu/~blei adresinde listelenen konu modelleme için birkaç Python paketi vardır. /topicmodeling.html. –
C++ 'da, [ctr var] (https://github.com/Blei-Lab/ctr). – kamalbanga
Kamalbanga'nın yukarıdaki linkindeki depo, bahsettiğiniz ilk kağıdı kullanır. C++ 'da yazılsa da, [python'dan çağırabilirsiniz] (http://stackoverflow.com/questions/145270/calling-c-c-from-python). – jtitusj