Overview:
This release has several major themes: Organizations, low-code, no-code, PyGraphistry[AI], Kubernetes, dashboarding, and SSO
Highlights
- Features
- API: PyGraphistry[ai]: featurize(), umap(), build_gnn()
- API: PyGraphistry: compute_igraph(), layout_igraph(), compute_cugraph()
- API: PyGraphistry: Subgraph traversal pattern matching hop(), chain()
- API: Horizontal & radial axis
- API: NodeJS & browser uploading
- Organizations: Create, invite, manage, & share
- Neighborhood highlights
- Streamlit dashboarding
- PowerBI beta
- Infrastructure
- RAPIDS 2022.02
- Azure ACR mirroring
- Kubernetes/Helm (experimental)
- CUDA 11.4 and graph AI support
- Version upgrades, including Python 3.8
- Admin
- Control caching
- Security
- SSO
- Scanning
- Pen testing
- Fixes & tweaks
- Docs
- Migration
Graphistry Versions
Server | 2.39.27 |
JS React+Vanilla | 4.1.8 (was 4.0.0) |
Python PyGraphistry client | 0.27.3 (was 0.20.5) |
Third-Party Versions
Arrow | 5.0.0 |
Caddy | 2.5.2 (was 2.4.6) |
CUDA (In-Docker) | 11.0, 11.4 (new) |
Dask | 2022.3.0 (was 2021.9.1) |
Django | 3.1 |
Docker (CE) |
20.10.9 (was 19.03.2)
|
Docker Compose | 1.29.3 (was 1.29.2) |
fsspec | 2022.1.0 (was 2021.10.0) |
Gremlinpython | 3.4.8 |
Jupyter | 1.0.0 |
Neo4j-python-driver | 4.4.5 (was 4.1.0) |
NodeJS | 14.19.3 (was 14.6.1) |
Pandas | 1.3.5 (was 1.3.3) |
Postgres | 14.4 (was 12.5) |
Python | 3.8.13 (was 3.7.10) |
Neo4j node driver | 4.1.1 |
networkX | 2.6.3 |
Notebook | 6.4.12 (was 6.4.4) |
Nvidia driver (cloud) | 470 (was 450) |
RAPIDS | 22.04 (was 21.10.01) |
Redis | 6.2.6 (was 6.0.5) |
Splunk node SDK | 1.9.0 |
Tornado | 6.1 |
torch | 1.12.0 |
torch-geometric | 2.0.4 |
torch-god | 0.2.0 |
Features
- API: PyGraphistry[ai] mode:
- g.featurize(): Automatically encode the nodes/edges tables, including date, text, etc. columns, for ML/AI algorithms
- Unsupervised and supervised modes via dirty_cat
- g.umap(): Generalization of umap_learn() methods to work with node/edge tables
- g.build_dgl_graph()
- Compatible versions of graph AI dependencies, including DGL, PyTorch, PyTorch Geometric, PyGOD, & UMAP
- g.featurize(): Automatically encode the nodes/edges tables, including date, text, etc. columns, for ML/AI algorithms
- API: PyGraphistry compute & layout: igraph & cugraph bindings via g.compute_igraph(), g.layout_igraph(), and g.compute_cugraph()
- API: PyGraphistry traversal: Multi-hop subgraph matching method `g.hop(nodes: pd.DataFrame, hops: Optional[int], to_fixed_point: bool, direction: Union['forward', 'reverse', 'undirected']) -> Plotter` and sequences via `g.chain()`, similar to Neo4j's Cypher pattern matching
- API: Horizontal & radial axis, both via PyGraphistry (.encode_axis(...)) and REST complex encoding
- API: NodeJS & JS uploads: A JWT-compatible client has been added that supports uploads with convenient async/await (or promise-based) interfaces
- Organizations: Creation, management, invitation flows, & sharing
- Neighborhood highlights: Use scene settings or APIs to change the default node/edge hover effect to instantly highlight nodes up to 10 hops away via incoming edges, outgoing edges, or both
- PowerBI beta: Contact for access
- Streamlit Dashboarding: Public and private views; edit views on web via notebooks or CLI via data/ folder (graph-app-kit). Control menu visibility via admin panel
Infrastructure
- RAPIDS 2022.04
- Remove BlazingSQL for dask-sql (w/ experimental cudf GPU support)
- Azure ACR mirroring: Automatically mirror Graphistry containers in your Azure via a script or Azure Pipeline. See https://github.com/graphistry/graphistry-helm
- Experimental helm charts for Kubernetes:
- See https://github.com/graphistry/graphistry-helm
- Initial support targets EKS, AKS, & minikube
- Contact for support around intended patterns in secure & highly-available operations
- CUDA 11.4: An optional build of CUDA 11.4 containers. Can only be used in systems with CUDA 11.4+ installed. AWS/Azure Marketplace are both switching to CUDA 11.4.
- Graph AI support (optional): AI libraries for CUDA 11.4+ hosts are installed (larger image), including for neural networks (PyTorch), graph neural networks (DGL, PyG, pygod), text (Spacy, transformers), and time series (PyCaret). Use with new PyGraphistry methods like featurize(), umap(), and build_gnn().
- Version upgrades: Python 3.8 (from 3.7), Redis 6.2.6, NodeJS 14.19.1, docker 20.10.9, docker-compose 1.29.3, Postgres 14.2, and various other upgrades
- Primarily regular maintenance
- Nividia upgrades enable new AI features
- Docker upgrades enable resource control in compose files
Admin
- Control caching (forge-etl-python) - static: Limit most recently used items per worker according to cascade:
- N_CACHE_<ITEM_NAME>
- N_CACHE_{CPU,GPU}_{FULL,SMALL}_OVERRIDE
- N_CACHE_{CPU,GPU}_OVERRIDE
- Item default
See .env files and DEBUG logs of get_cache usages to identify specific items
Most values should be 10+ to accelerate concurrent sessions. Some do not benefit from being > 1.
Generally, CPU counts should be 2-10X higher than GPU counts - Control caching (forge-etl-python) - API To dynamically clear most CPU and GPU caches, POST to:
graphistry.acme.com/api/v1/etl/shaper/clear/?cpu=true&gpu=true (requires admin JWT)
forge-etl-python:8080/shaper/clear/?cpu=true&gpu=true
graphistry.acme.com/api/v1/etl/datasets/clear/?cpu=true&gpu=true (requires admin JWT)
forge-etl-python:8080/datasets/clear/?cpu=true&gpu=true - Media: Note new uploaded media folder `data/nexus_media` in case you have custom migration scripts
- Streamlit: Public & private dashboards in data/; configure visibility via admin settings panel
Security
- SSO: Initial SSO OIDC support; contact for guidance
- Scanning: Graphistry has advanced to daily whole-container scans and related patching (Grype)
- Pen testing: Graphistry has begun a periodic pen testing program
Fixes & tweaks
- Smarter file visualization: Better entity column selection
- Filters on columns with missing values: Custom filter expressions such as ones with keyword "CONTAINS(..." were crashing sessions when run on columns with missing values. They now work.
- Tolerate invalid weight columns: Non-numeric edge weight columns no longer cause crashes
- File uploader: Crash less, give better errors in logs, and clarify can only use at most one node/edge file at a time
- Point size encoding: Changing point sizes when using default edge colors would break the gradient coloring; it not works as expected
- Support partial node positions: When passing in node positions and not all nodes are specified, instead of crashing, default them to 0
- Better tolerate missing nodes: When uploading node and edge tables, and the edge table references nodes missing from the nodes table, Graphistry now guesses default values based on the types of other nodes, such as whether to use 0 vs the empty string. This in turn fixes surprising behavior in histograms and coloring.
- Multi-GPU support: Service `forge-etl-python` will now spread across multiple GPUs if available. Placement is by PID and GPUs exposed to it. Service `streamgl-gpu` will still only pick 1 GPU. Service `dask-cuda-worker` should already use multiple GPUs.
- Generally slightly better performance and stability: Removing use Nginx on most internal hot paths has both increased speed and eliminated spurious request drops
Docs
Migration
- Create empty folder `data/nexus_media`
- It is used for user media uploads like logos
- If you use our backup/migration scripts, they will include it going forward, but not the initial creation
- See previous notes
Comments
0 comments
Please sign in to leave a comment.