Marigoldwu/PyDGC

Repository files navigation

PyDGC, a flexible and extensible Python library for deep graph clustering (DGC), is compatible with frameworks such as PyG and OGB. It supports the easy integration of new models and datasets, facilitating the rapid development, reproduction, and fair comparison of DGC methods.

  • 2025.05: Release source code of PyDGC.

Deep graph clustering, which aims to reveal the underlying graph structure and divide the nodes into different groups, has attracted intensive attention in recent years.

More details can be found in the survey paper. Please click here to view the comprehensive archive of papers.

Timeline of representative models.

DGCBench encompasses 12 diverse datasets with different characteristics and 12 state-of-the-art methods from all major paradigms. By integrating them into a standardized pipeline, we ensure fair, reproducible, and comprehensive evaluations across multiple dimensions.

  • Integration of multiple deep graph clustering models. Supported Models
  • Support for various graph datasets from PyG and OGB. Supported Datasets
  • Model evaluation and visualization capabilities.
  • Standardized Pipeline.
  • Install with Pip

    coming soon...

  • Installation for local development

    git clone https://.com/Marigoldwu/PyDGC.git
    cd PyDGC
    pip install -e .

Take GAE as an example:

cd PyDGC/example/pipelines/gae
python run.py

You can also specify arguments in the command line:

python run.py --dataset_name CORA -eval_each

Other optional arguments:

--cfg_file_path YourPath  # path of corresponding configurations file
--flag FlagContent  # Descriptions
--drop_edge float  # probability of dropping edges
--drop_feature float  # probability of dropping features
--add_edge float  # probability of adding edges
--add_noise float  # standard deviation of Gaussian Noise
-pretrain  # only run the pretraining stage in the model
from pydgc.models import DGCModel

class MyModel(DGCModel):
    def __init__(self, logger, cfg):
        super(MyModel).__init__(logger, cfg)
        your_model = ...  # Your model
        
        self.loss_curve = []
        self.nmi_curve = []
        self.best_embedding = None
        self.best_predicted_labels = None
        self.best_results = {'ACC': -1}
    
    def forward(self, data):
        ...  # forward process
        return something
    # If needed
    def loss(self, *args, **kwargs):
    # If needed
    def pretrain(self, data, cfg, flag):
    
    def train_model(self, data, cfg, flag):
    
    def get_embedding(self, data):
    
    def clustering(self, data):
        embedding = self.get_embedding(data)
        # clustering
        return embedding, labels_, clustering_centers
    
    def evaluate(self, data):
        embedding, predicted_labels, clustering_centers = self.clustering(data)
        ground_truth = data.y.numpy()
        metric = DGCMetric(ground_truth, predicted_labels.numpy(), embedding, data.edge_index)
        results = metric.evaluate_one_epoch(self.logger, self.cfg.evaluate)
        return embedding, predicted_labels, results
from pydgc.pipelines import BasePipeline
from pydgc.utils import perturb_data
import MyModel  # import your own model

class MyPipeline(BasePipeline):
    def __init__(self, args):
        super(MyPipeline).__init__(args)
    
    def augmentation(self):
        self.data = perturb_data(self.data, self.cfg.dataset.augmentation)
        # other augmentations if needed
        
    def build_model(self):
        model = MyModel(self.logger, self.cfg)
        self.logger.model_info(model)
        return model
No.ModelPaperSource Code
1GAEVariational Graph Auto-Encoderscode
2GAE_SSC--
3DAEGCAttributed graph clustering: A deep attentional embedding approachcode
4SDCNStructural Deep Clustering Networkcode
5DFCNDeep Fusion Clustering Networkcode
6DCRNDeep Graph Clustering via Dual Correlation Reductioncode
7AGC-DRRAttributed Graph Clustering with Dual Redundancy Reductioncode
8DGClusterDGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximizationcode
9HSANHard Sample Aware Network for Contrastive Deep Graph Clusteringcode
10CCGCCluster-guided Contrastive Graph Clustering Networkcode
11MAGIRevisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspectivecode
12NS4GCReliable Node Similarity Matrix Guided Contrastive Graph Clusteringcode
No.Dataset#Samples#Features#Edges#ClassesHomo. Ratio
1Wiki2,4054,97317,981170.71
2Cora2,7081,4335,42970.81
3ACM3,0251,87013,12830.82
4Citeseer3,3273,7039,10460.74
5DBLP4,0573343,52840.80
6PubMed19,71750088,64830.80
7Ogbn-arXiv169,3431282,315,598400.65
8USPS(3NN)9,29825627,894100.98
9HHAR(3NN)10,29956130,89760.95
10BlogCatalog5,1968,189343,48660.40
11Flickr7,57512,047479,47690.24
12Roman-empire22,66230065,854180.05

More Datasets will be introduced.

ADGC: Awesome-Deep-Graph-Clustering

Older version of this repository: A-Unified-Framework-for-Attribute-Graph-Clustering