A Coding Implementation to Compress and Benchmark Instruction-Tuned LLMs with FP8, GPTQ, and SmoothQuant Quantization using llmcompressor

import subprocess, sys def pip(*pkgs): subprocess.check_call([sys.executable, “-m”, “pip”, “install”, “-q”, *pkgs]) pip(“llmcompressor”, “compressed-tensors”, “transformers>=4.45”, “accelerate”, “datasets”) import os, gc, time, json, math from pathlib import Path import torch from transformers import AutoModelForCausalLM, AutoTokenizer from datasets import load_dataset assert torch.cuda.is_available(), \ “Enable a GPU: Runtime > Change runtime type > T4 GPU” print(“GPU:”, torch.cuda.get_device_name(0), “| CUDA:”,…

Read More
For Eclipse, the .5B Cerebras win is simply the beginning of realizing its physical-world thesis

For Eclipse, the $2.5B Cerebras win is simply the beginning of realizing its physical-world thesis

When Lior Susan began Eclipse Ventures in 2015, the agency’s thesis of digitizing the bodily world wasn’t notably standard in Silicon Valley. “It was the period of enterprise software program and SaaS, and it felt pretty lonely the primary couple of years,” Susan mentioned on stage at a latest StrictlyVC occasion in San Francisco. Greater…

Read More

Vercel Labs Introduces Zero, a Systems Programming Language Designed So AI Agents Can Read, Repair, and Ship Native Programs

Vercel Labs 01 / 09  ·  Overview ZeroThe Programming Languagefor Agents An experimental systems language that gives AI agents structured diagnostics,typed repair metadata, and machine-readable docs — alongside sub-10 KiB native binaries. Systems Language Agent-Native v0.1.1 Apache-2.0 Experimental Context 02 / 09  ·  Why Zero Exists The Agent Repair Loop Problem Most programming languages produce…

Read More

A Coding Guide Implementing SHAP Explainability Workflows with Explainer Comparisons, Maskers, Interactions, Drift, and Black-Box Models

print(“\n” + “=”*72) print(“PART 3: Interaction decomposition”) print(“=”*72) inter = tree_expl.shap_interaction_values(X_te.iloc[:500]) inter_abs = np.abs(inter).mean(0) diag = np.diagonal(inter_abs).copy() off = inter_abs.copy(); np.fill_diagonal(off, 0) main_share = diag.sum() / (diag.sum() + off.sum()) print(f”Total attribution mass: {main_share*100:.1f}% main effects, ” f”{(1-main_share)*100:.1f}% interactions”) pairs = [(X.columns[i], X.columns[j], off[i, j]) for i in range(X.shape[1]) for j in range(i+1, X.shape[1])] pairs.sort(key=lambda t:…

Read More
OpenAI co-founder Greg Brockman takes cost of product technique

OpenAI co-founder Greg Brockman takes cost of product technique

OpenAI co-founder and president Greg Brockman is formally taking the reins of the corporate’s product technique, according to Wired. This appears to solidify an already-existing change, with Brockman overseeing OpenAI’s merchandise on an interim foundation whereas the corporate’s CEO of AGI deployment Fidji Simo is out on medical depart. Wired additionally experiences that in a…

Read More

Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context

Training large language models on long sequences has a well-known problem: attention is expensive. The scaled dot-product attention (SDPA) at the core of every transformer scales quadratically Θ(N²) in both compute and memory with sequence length N. FlashAttention addressed this through IO-aware tiling that avoids materializing the full N×N attention matrix in high-bandwidth memory, reducing…

Read More

The haves and have nots of the AI gold rush

The vibes around the current AI boom aren’t great, even in the tech industry, according to a lengthy social media post from Menlo Ventures partner Deedy Das.  Das described San Francisco as “pretty frenetic right now,” as “the divide in outcomes is the worst I’ve ever seen.” Using a “back of the envelope AI calculation,”…

Read More