qhChina Package Documentation
Welcome to the documentation for the qhChina Python package, a toolkit designed for computational analysis of Chinese texts in humanities research.
Package Components
Our package includes several modules, each focusing on specific computational approaches to Chinese text analysis:
- Word Embeddings - Tools for creating and analyzing word embeddings for Chinese texts
- Topic Modeling - Methods for topic modeling with Chinese-specific preprocessing
- BERT Classifier - BERT-based text classification for Chinese documents
- Collocations - Analysis of word collocations in Chinese texts
- Corpora - Tools for managing and processing Chinese text corpora
- Preprocessing - Text segmentation and tokenization for Chinese texts
Package Structure
The qhChina package is organized into several key modules:
qhchina.analytics
- Core analytical tools including word embeddings, topic modeling, and collocation analysisqhchina.preprocessing
- Text segmentation and preprocessing utilitiesqhchina.helpers
- Utility functions for file loading, font handling, and moreqhchina.educational
- Educational visualization tools
Getting Started
Installation
pip install qhchina
Basic Usage
import qhchina
# Load fonts for visualization
qhchina.load_fonts()
# Example with sample Chinese sentences
texts = [
["我", "今天", "去", "公园", "散步"], # Walking in the park
["她", "在", "图书馆", "学习", "汉语"], # Studying Chinese at the library
["他们", "周末", "喜欢", "做", "中国", "菜"], # Cooking Chinese food on weekends
["这个", "城市", "的", "交通", "很", "方便"], # City transportation
["我的", "家人", "明天", "要", "去", "北京", "旅游"] # Family travel to Beijing
]
# Example of using topic modeling
from qhchina.analytics.topicmodels import LDAGibbsSampler
lda = LDAGibbsSampler(n_topics=10)
lda.fit(texts)
API Reference
For detailed information about each module, please refer to the specific documentation pages linked above.