Why Gradient Descent Zigzags and How Momentum Fixes It

PLOT_STEPS = 55 x_ = np.linspace(-5, 5, 500) y_ = np.linspace(-2.2, 2.2, 500) X, Y = np.meshgrid(x_, y_) Z = loss(X, Y) fig = plt.figure(figsize=(16, 10), facecolor=”#FAFAF8″) gs = GridSpec(2, 3, figure=fig, hspace=0.45, wspace=0.38, left=0.07, right=0.97, top=0.88, bottom=0.08) COLORS = { “gd”: “#E05C4B”, “mom_good”: “#3A7CA5”, “mom_large”: “#F4A536”, “contour”: “#D4C9B8”, “minima”: “#2A9D5C”, “start”: “#444444”, } PANEL_TITLES…

Read More

Katie Haun raises $1B for new venture funds

Former Andreessen Horowitz investor Katie Haun announced on Monday in a press release that her firm has raised $1 billion across new funds to continue its thesis of backing crypto and blockchain. The capital will be spread across startups at early and later stages, Bloomberg reported, and, within the crypto and blockchain space, it will be…

Read More

A Coding Guide to Survey Bias Correction Using Facebook Research Balance with IPW CBPS Ranking and Post Stratification Methods

fig, axes = plt.subplots(2, 2, figsize=(14, 10)) colors_a = [“gray”, “#1f77b4”, “#ff7f0e”, “#2ca02c”, “#d62728″][: len(asmd_means)] axes[0, 0].bar(list(asmd_means.keys()), list(asmd_means.values()), color=colors_a) axes[0, 0].axhline(0.1, ls=”–“, color=”red”, label=”0.10 imbalance threshold”) axes[0, 0].set_title(“Mean ASMD across covariates”) axes[0, 0].set_ylabel(“Mean ASMD”); axes[0, 0].legend() axes[0, 0].tick_params(axis=”x”, rotation=20) truth = target_df[“happiness”].mean() colors_b = [“#888”] + [“#1f77b4”, “#ff7f0e”, “#2ca02c”, “#d62728”][: len(methods)] + [“black”] axes[0, 1].bar(list(outcome_means.keys()),…

Read More
US authorities warns of extreme CopyFail bug affecting main variations of Linux

US authorities warns of extreme CopyFail bug affecting main variations of Linux

A extreme safety vulnerability affecting virtually each model of the Linux working system has caught defenders off-guard and scrambling to patch after safety researchers publicly launched exploit code that permits attackers to take full management of weak techniques. The U.S. authorities says the bug, dubbed “CopyFail,” is now being exploited in the wild, which means it’s…

Read More

Zyphra Introduces Tensor and Sequence Parallelism (TSP): A Hardware-Aware Training and Inference Strategy That Delivers 2.6x Throughput Over Matched TP+SP Baselines

Training and serving large transformer models at scale is fundamentally a memory management problem. Every GPU in a cluster has a fixed amount of VRAM, and as model sizes and context lengths grow, engineers constantly have to make trade-offs about how to distribute work across hardware. A new technique from Zyphra, called Tensor and Sequence…

Read More

How to Build an End-to-End Production Grade Machine Learning Pipeline with ZenML, Including Custom Materializers, Metadata Tracking, and Hyperparameter Optimization

@step(enable_cache=True) def load_data() -> Annotated[DatasetBundle, “raw_dataset”]: data = load_breast_cancer() return DatasetBundle( data.data, data.target, data.feature_names, stats={“source”: “sklearn.datasets.load_breast_cancer”}, ) @step def split_and_scale( bundle: DatasetBundle, test_size: float = 0.2, random_state: int = 42, ) -> Tuple[ Annotated[np.ndarray, “X_train”], Annotated[np.ndarray, “X_test”], Annotated[np.ndarray, “y_train”], Annotated[np.ndarray, “y_test”], ]: X_tr, X_te, y_tr, y_te = train_test_split( bundle.X, bundle.y, test_size=test_size, random_state=random_state, stratify=bundle.y, ) scaler…

Read More