Release notes for 2.39.27 – Graphistry

Overview:

This release has several major themes: Organizations, low-code, no-code, PyGraphistry[AI], Kubernetes, dashboarding, and SSO

Highlights

Features
- API: PyGraphistry[ai]: featurize(), umap(), build_gnn()
- API: PyGraphistry: compute_igraph(), layout_igraph(), compute_cugraph()
- API: PyGraphistry: Subgraph traversal pattern matching hop(), chain()
- API: Horizontal & radial axis
- API: NodeJS & browser uploading
- Organizations: Create, invite, manage, & share
- Neighborhood highlights
- Streamlit dashboarding
- PowerBI beta
Infrastructure
- RAPIDS 2022.02
- Azure ACR mirroring
- Kubernetes/Helm (experimental)
- CUDA 11.4 and graph AI support
- Version upgrades, including Python 3.8
Admin
- Control caching
Security
- SSO
- Scanning
- Pen testing
Fixes & tweaks
Docs
Migration

Graphistry Versions

Server	2.39.27
JS React+Vanilla	4.1.8 (was 4.0.0)
Python PyGraphistry client	0.27.3 (was 0.20.5)

Third-Party Versions

Arrow	5.0.0
Caddy	2.5.2 (was 2.4.6)
CUDA (In-Docker)	11.0, 11.4 (new)
Dask	2022.3.0 (was 2021.9.1)
Django	3.1
Docker (CE)	20.10.9 (was 19.03.2)
Docker Compose	1.29.3 (was 1.29.2)
fsspec	2022.1.0 (was 2021.10.0)
Gremlinpython	3.4.8
Jupyter	1.0.0
Neo4j-python-driver	4.4.5 (was 4.1.0)
NodeJS	14.19.3 (was 14.6.1)
Pandas	1.3.5 (was 1.3.3)
Postgres	14.4 (was 12.5)
Python	3.8.13 (was 3.7.10)
Neo4j node driver	4.1.1
networkX	2.6.3
Notebook	6.4.12 (was 6.4.4)
Nvidia driver (cloud)	470 (was 450)
RAPIDS	22.04 (was 21.10.01)
Redis	6.2.6 (was 6.0.5)
Splunk node SDK	1.9.0
Tornado	6.1
torch	1.12.0
torch-geometric	2.0.4
torch-god	0.2.0

Features

API: PyGraphistry[ai] mode:
- g.featurize(): Automatically encode the nodes/edges tables, including date, text, etc. columns, for ML/AI algorithms
  - Unsupervised and supervised modes via dirty_cat
- g.umap(): Generalization of umap_learn() methods to work with node/edge tables
- g.build_dgl_graph()
- Compatible versions of graph AI dependencies, including DGL, PyTorch, PyTorch Geometric, PyGOD, & UMAP
API: PyGraphistry compute & layout: igraph & cugraph bindings via g.compute_igraph(), g.layout_igraph(), and g.compute_cugraph()
API: PyGraphistry traversal: Multi-hop subgraph matching method `g.hop(nodes: pd.DataFrame, hops: Optional[int], to_fixed_point: bool, direction: Union['forward', 'reverse', 'undirected']) -> Plotter` and sequences via `g.chain()`, similar to Neo4j's Cypher pattern matching
API: Horizontal & radial axis, both via PyGraphistry (.encode_axis(...)) and REST complex encoding
API: NodeJS & JS uploads: A JWT-compatible client has been added that supports uploads with convenient async/await (or promise-based) interfaces
Organizations: Creation, management, invitation flows, & sharing
Neighborhood highlights: Use scene settings or APIs to change the default node/edge hover effect to instantly highlight nodes up to 10 hops away via incoming edges, outgoing edges, or both
PowerBI beta: Contact for access
Streamlit Dashboarding: Public and private views; edit views on web via notebooks or CLI via data/ folder (graph-app-kit). Control menu visibility via admin panel

Infrastructure

RAPIDS 2022.04
- Remove BlazingSQL for dask-sql (w/ experimental cudf GPU support)
Azure ACR mirroring: Automatically mirror Graphistry containers in your Azure via a script or Azure Pipeline. See https://github.com/graphistry/graphistry-helm
Experimental helm charts for Kubernetes:
- See https://github.com/graphistry/graphistry-helm
- Initial support targets EKS, AKS, & minikube
- Contact for support around intended patterns in secure & highly-available operations
CUDA 11.4: An optional build of CUDA 11.4 containers. Can only be used in systems with CUDA 11.4+ installed. AWS/Azure Marketplace are both switching to CUDA 11.4.
Graph AI support (optional): AI libraries for CUDA 11.4+ hosts are installed (larger image), including for neural networks (PyTorch), graph neural networks (DGL, PyG, pygod), text (Spacy, transformers), and time series (PyCaret). Use with new PyGraphistry methods like featurize(), umap(), and build_gnn().
Version upgrades: Python 3.8 (from 3.7), Redis 6.2.6, NodeJS 14.19.1, docker 20.10.9, docker-compose 1.29.3, Postgres 14.2, and various other upgrades
- Primarily regular maintenance
- Nividia upgrades enable new AI features
- Docker upgrades enable resource control in compose files

Admin

Control caching (forge-etl-python) - static: Limit most recently used items per worker according to cascade:
- N_CACHE_<ITEM_NAME>
- N_CACHE_{CPU,GPU}_{FULL,SMALL}_OVERRIDE
- N_CACHE_{CPU,GPU}_OVERRIDE
- Item default
See .env files and DEBUG logs of get_cache usages to identify specific items
Most values should be 10+ to accelerate concurrent sessions. Some do not benefit from being > 1.
Generally, CPU counts should be 2-10X higher than GPU counts
Control caching (forge-etl-python) - API To dynamically clear most CPU and GPU caches, POST to:
graphistry.acme.com/api/v1/etl/shaper/clear/?cpu=true&gpu=true (requires admin JWT)
forge-etl-python:8080/shaper/clear/?cpu=true&gpu=true
graphistry.acme.com/api/v1/etl/datasets/clear/?cpu=true&gpu=true (requires admin JWT)
forge-etl-python:8080/datasets/clear/?cpu=true&gpu=true
Media: Note new uploaded media folder `data/nexus_media` in case you have custom migration scripts
Streamlit: Public & private dashboards in data/; configure visibility via admin settings panel

Security

SSO: Initial SSO OIDC support; contact for guidance
Scanning: Graphistry has advanced to daily whole-container scans and related patching (Grype)
Pen testing: Graphistry has begun a periodic pen testing program

Fixes & tweaks

Smarter file visualization: Better entity column selection
Filters on columns with missing values: Custom filter expressions such as ones with keyword "CONTAINS(..." were crashing sessions when run on columns with missing values. They now work.
Tolerate invalid weight columns: Non-numeric edge weight columns no longer cause crashes
File uploader: Crash less, give better errors in logs, and clarify can only use at most one node/edge file at a time
Point size encoding: Changing point sizes when using default edge colors would break the gradient coloring; it not works as expected
Support partial node positions: When passing in node positions and not all nodes are specified, instead of crashing, default them to 0
Better tolerate missing nodes: When uploading node and edge tables, and the edge table references nodes missing from the nodes table, Graphistry now guesses default values based on the types of other nodes, such as whether to use 0 vs the empty string. This in turn fixes surprising behavior in histograms and coloring.
Multi-GPU support: Service `forge-etl-python` will now spread across multiple GPUs if available. Placement is by PID and GPUs exposed to it. Service `streamgl-gpu` will still only pick 1 GPU. Service `dask-cuda-worker` should already use multiple GPUs.
Generally slightly better performance and stability: Removing use Nginx on most internal hot paths has both increased speed and eliminated spurious request drops

Docs

Migration

Create empty folder `data/nexus_media`
- It is used for user media uploads like logos
- If you use our backup/migration scripts, they will include it going forward, but not the initial creation
See previous notes

Related articles