Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Karachi’s cafes sell more than just lattes

    March 7, 2026

    Neighborhood Banks, Crypto Trade ‘Are Allies’ In CLARITY Act Conflict: Exec

    March 7, 2026

    Woof, The Marathon Battle Move Is Dangerous

    March 7, 2026
    Facebook X (Twitter) Instagram
    Saturday, March 7
    Trending
    • Karachi’s cafes sell more than just lattes
    • Neighborhood Banks, Crypto Trade ‘Are Allies’ In CLARITY Act Conflict: Exec
    • Woof, The Marathon Battle Move Is Dangerous
    • Frontier Core FC South Waziristan Jobs 2026 2026 Job Commercial Pakistan
    • ‘Can’t believe it’: Calmar family reunited with missing dog after nearly 3 months
    • Limiting the injury
    • This Jammer Wants to Block Always-Listening AI Wearables. It Probably Won’t Work
    • Sindh wealthy in assets however individuals disadvantaged of primary services: JI
    • Yasir Hussain calls out Pakistan’s obsession with fair skin
    • Bitcoin Rally Seemingly Aid, Not New Bull Section: CryptoQuant
    Facebook X (Twitter) Instagram Pinterest Vimeo
    The News92The News92
    • Home
    • World
    • National
    • Sports
    • Crypto
    • Travel
    • Lifestyle
    • Jobs
    • Insurance
    • Gaming
    • AI & Tech
    • Health & Fitness
    The News92The News92
    Home - AI & Tech - A Production-Style NetworKit 11.2.1 Coding Tutorial for Large-Scale Graph Analytics, Communities, Cores, and Sparsification
    AI & Tech

    A Production-Style NetworKit 11.2.1 Coding Tutorial for Large-Scale Graph Analytics, Communities, Cores, and Sparsification

    Naveed AhmadBy Naveed AhmadMarch 7, 2026No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    In this tutorial, we implement a production-grade, large-scale graph analytics pipeline in NetworKit, focusing on speed, memory efficiency, and version-safe APIs in NetworKit 11.2.1. We generate a large-scale free network, extract the largest connected component, and then compute structural backbone signals via k-core decomposition and centrality ranking. We also detect communities with PLM and quantify quality using modularity; estimate distance structure using effective and estimated diameters; and, finally, sparsify the graph to reduce cost while preserving key properties. We export the sparsified graph as an edgelist so we can reuse it in downstream workflows, benchmarking, or graph ML preprocessing.

    !pip -q install networkit pandas numpy psutil
    
    
    import gc, time, os
    import numpy as np
    import pandas as pd
    import psutil
    import networkit as nk
    
    
    print("NetworKit:", nk.__version__)
    nk.setNumberOfThreads(min(2, nk.getMaxNumberOfThreads()))
    nk.setSeed(7, False)
    
    
    def ram_gb():
       p = psutil.Process(os.getpid())
       return p.memory_info().rss / (1024**3)
    
    
    def tic():
       return time.perf_counter()
    
    
    def toc(t0, msg):
       print(f"{msg}: {time.perf_counter()-t0:.3f}s | RAM~{ram_gb():.2f} GB")
    
    
    def report(G, name):
       print(f"\n[{name}] nodes={G.numberOfNodes():,} edges={G.numberOfEdges():,} directed={G.isDirected()} weighted={G.isWeighted()}")
    
    
    def force_cleanup():
       gc.collect()
    
    
    PRESET = "LARGE"
    
    
    if PRESET == "LARGE":
       N = 120_000
       M_ATTACH = 6
       AB_EPS = 0.12
       ED_RATIO = 0.9
    elif PRESET == "XL":
       N = 250_000
       M_ATTACH = 6
       AB_EPS = 0.15
       ED_RATIO = 0.9
    else:
       N = 80_000
       M_ATTACH = 6
       AB_EPS = 0.10
       ED_RATIO = 0.9
    
    
    print(f"\nPreset={PRESET} | N={N:,} | m={M_ATTACH} | approx-betweenness epsilon={AB_EPS}")

    We set up the Colab environment with NetworKit and monitoring utilities, and we lock in a stable random seed. We configure thread usage to match the runtime and define timing and RAM-tracking helpers for each major stage. We choose a scale preset that controls graph size and approximation knobs so the pipeline stays large but manageable.

    t0 = tic()
    G = nk.generators.BarabasiAlbertGenerator(M_ATTACH, N).generate()
    toc(t0, "Generated BA graph")
    report(G, "G")
    
    
    t0 = tic()
    cc = nk.components.ConnectedComponents(G)
    cc.run()
    toc(t0, "ConnectedComponents")
    print("components:", cc.numberOfComponents())
    
    
    if cc.numberOfComponents() > 1:
       t0 = tic()
       G = nk.graphtools.extractLargestConnectedComponent(G, compactGraph=True)
       toc(t0, "Extracted LCC (compactGraph=True)")
       report(G, "LCC")
    
    
    force_cleanup()

    We generate a large Barabási–Albert graph and immediately log its size and runtime footprint. We compute connected components to understand fragmentation and quickly diagnose topology. We extract the largest connected component and compact it to improve the rest of the pipeline’s performance and reliability.

    t0 = tic()
    core = nk.centrality.CoreDecomposition(G)
    core.run()
    toc(t0, "CoreDecomposition")
    core_vals = np.array(core.scores(), dtype=np.int32)
    print("degeneracy (max core):", int(core_vals.max()))
    print("core stats:", pd.Series(core_vals).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())
    
    
    k_thr = int(np.percentile(core_vals, 97))
    
    
    t0 = tic()
    nodes_backbone = [u for u in range(G.numberOfNodes()) if core_vals[u] >= k_thr]
    G_backbone = nk.graphtools.subgraphFromNodes(G, nodes_backbone)
    toc(t0, f"Backbone subgraph (k>={k_thr})")
    report(G_backbone, "Backbone")
    
    
    force_cleanup()
    
    
    t0 = tic()
    pr = nk.centrality.PageRank(G, damp=0.85, tol=1e-8)
    pr.run()
    toc(t0, "PageRank")
    
    
    pr_scores = np.array(pr.scores(), dtype=np.float64)
    top_pr = np.argsort(-pr_scores)[:15]
    print("Top PageRank nodes:", top_pr.tolist())
    print("Top PageRank scores:", pr_scores[top_pr].tolist())
    
    
    t0 = tic()
    abw = nk.centrality.ApproxBetweenness(G, epsilon=AB_EPS)
    abw.run()
    toc(t0, "ApproxBetweenness")
    
    
    abw_scores = np.array(abw.scores(), dtype=np.float64)
    top_abw = np.argsort(-abw_scores)[:15]
    print("Top ApproxBetweenness nodes:", top_abw.tolist())
    print("Top ApproxBetweenness scores:", abw_scores[top_abw].tolist())
    
    
    force_cleanup()

    We compute the core decomposition to measure degeneracy and identify the network’s high-density backbone. We extract a backbone subgraph using a high core-percentile threshold to focus on structurally important nodes. We run PageRank and approximate betweenness to rank nodes by influence and bridge-like behavior at scale.

    t0 = tic()
    plm = nk.community.PLM(G, refine=True, gamma=1.0, par="balanced")
    plm.run()
    toc(t0, "PLM community detection")
    
    
    part = plm.getPartition()
    num_comms = part.numberOfSubsets()
    print("communities:", num_comms)
    
    
    t0 = tic()
    Q = nk.community.Modularity().getQuality(part, G)
    toc(t0, "Modularity")
    print("modularity Q:", Q)
    
    
    sizes = np.array(list(part.subsetSizeMap().values()), dtype=np.int64)
    print("community size stats:", pd.Series(sizes).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())
    
    
    t0 = tic()
    eff = nk.distance.EffectiveDiameter(G, ED_RATIO)
    eff.run()
    toc(t0, f"EffectiveDiameter (ratio={ED_RATIO})")
    print("effective diameter:", eff.getEffectiveDiameter())
    
    
    t0 = tic()
    diam = nk.distance.EstimatedDiameter(G)
    diam.run()
    toc(t0, "EstimatedDiameter")
    print("estimated diameter:", diam.getDiameter().distance)
    
    
    force_cleanup()

    We detect communities using PLM and record the number of communities found on the large graph. We compute modularity and summarize community-size statistics to validate the structure rather than simply trusting the partition. We estimate global distance behavior using effective diameter and estimated diameter in an API-safe way for NetworKit 11.2.1.

    t0 = tic()
    sp = nk.sparsification.LocalSimilaritySparsifier(G, 0.7)
    G_sparse = sp.getSparsifiedGraph()
    toc(t0, "LocalSimilarity sparsification (alpha=0.7)")
    report(G_sparse, "Sparse")
    
    
    t0 = tic()
    pr2 = nk.centrality.PageRank(G_sparse, damp=0.85, tol=1e-8)
    pr2.run()
    toc(t0, "PageRank on sparse")
    pr2_scores = np.array(pr2.scores(), dtype=np.float64)
    print("Top PR nodes (sparse):", np.argsort(-pr2_scores)[:15].tolist())
    
    
    t0 = tic()
    plm2 = nk.community.PLM(G_sparse, refine=True, gamma=1.0, par="balanced")
    plm2.run()
    toc(t0, "PLM on sparse")
    part2 = plm2.getPartition()
    Q2 = nk.community.Modularity().getQuality(part2, G_sparse)
    print("communities (sparse):", part2.numberOfSubsets(), "| modularity (sparse):", Q2)
    
    
    t0 = tic()
    eff2 = nk.distance.EffectiveDiameter(G_sparse, ED_RATIO)
    eff2.run()
    toc(t0, "EffectiveDiameter on sparse")
    print("effective diameter (orig):", eff.getEffectiveDiameter(), "| (sparse):", eff2.getEffectiveDiameter())
    
    
    force_cleanup()
    
    
    out_path = "/content/networkit_large_sparse.edgelist"
    t0 = tic()
    nk.graphio.EdgeListWriter("\t", 0).write(G_sparse, out_path)
    toc(t0, "Wrote edge list")
    print("Saved:", out_path)
    
    
    print("\nAdvanced large-graph pipeline complete.")

    We sparsify the graph using local similarity to reduce the number of edges while retaining useful structure for downstream analytics. We rerun PageRank, PLM, and effective diameter on the sparsified graph to check whether key signals remain consistent. We export the sparsified graph as an edgelist so we can reuse it across sessions, tools, or additional experiments.

    In conclusion, we developed an end-to-end, scalable NetworKit workflow that mirrors real large-network analysis: we started from generation, stabilized the topology with LCC extraction, characterized the structure through cores and centralities, discovered communities and validated them with modularity, and captured global distance behavior through diameter estimates. We then applied sparsification to shrink the graph while keeping it analytically meaningful and saving it for repeatable pipelines. The tutorial provides a practical template we can reuse for real datasets by replacing the generator with an edgelist reader, while keeping the same analysis stages, performance tracking, and export steps.


    Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleMisplaced childhood: The cycle of the road
    Next Article Quick trend clothes pile up in South Asia as Iran struggle grounds planes
    Naveed Ahmad
    • Website
    • Tumblr

    Related Posts

    AI & Tech

    This Jammer Wants to Block Always-Listening AI Wearables. It Probably Won’t Work

    March 7, 2026
    AI & Tech

    Robinhood’s startup fund stumbles in NYSE debut

    March 7, 2026
    AI & Tech

    Microsoft Releases Phi-4-Reasoning-Vision-15B: A Compact Multimodal Model for Math, Science, and GUI Understanding

    March 7, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    How to Get a Bigger Penis – The Stem Cell Secret to Natural Penis Enlargement & A Quiz

    February 22, 20261 Views

    10 Totally different Methods to Safe Your Enterprise Premises

    February 19, 20261 Views

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Demo
    Most Popular

    How to Get a Bigger Penis – The Stem Cell Secret to Natural Penis Enlargement & A Quiz

    February 22, 20261 Views

    10 Totally different Methods to Safe Your Enterprise Premises

    February 19, 20261 Views

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views
    Our Picks

    Karachi’s cafes sell more than just lattes

    March 7, 2026

    Neighborhood Banks, Crypto Trade ‘Are Allies’ In CLARITY Act Conflict: Exec

    March 7, 2026

    Woof, The Marathon Battle Move Is Dangerous

    March 7, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Advertise
    • Disclaimer
    © 2026 TheNews92.com. All Rights Reserved. Unauthorized reproduction or redistribution of content is strictly prohibited.

    Type above and press Enter to search. Press Esc to cancel.