AI Perspectives

AI data flows in an infinite loop.

This virtuous cycle enables ongoing creation and iteration, refining models as they run.

The infinite AI data loop.

AI both consumes and creates data. In fact, AI models improve by using trustworthy data — both data generated by the model itself and ingested from new data sources. This infinite loop of data production and consumption leads to smarter applications and better outputs.

This fundamentally changes the value of data and how we use it. Storing more data across this infinite loop makes for better AI.

Data is integral to AI at every step.

Along with newly captured data sources, every answer, piece of content, or artifact that AI generates becomes part of the input for the next training round, driving a continuous loop of improving output. In large-scale data centre deployments, the six phases of the AI data loop are enabled by a mix of storage and memory devices

1. SOURCE DATA

It begins with defining, finding and preparing the data.

The dataset could be anything from a small, structured database to the internet itself. Network hard drives provide raw data with long-term retention and data protection. Network SSDs act as an immediately accessible data tier.

2. TRAIN MODELS

Next, the model learns by training on stored data.

Training is a trial-and-error process where a model converges and is safeguarded with checkpoints. The training requires high-speed data access. This compute-intensive phase uses HBM, DRAM, and local SSDs for learning. Network hard drives and SSDs store checkpoints to protect and refine model training.

3. CREATE CONTENT

The inference process uses the trained model to create outputs.

Depending on the application, the model may be used for tasks like chat, image analysis or video creation. The primary storage enablers of this iterative creation are HBM, DRAM and local SSDs.

4. STORE CONTENT

The process of iteration creates new, validated data needing storage.

This data is saved for continued refinement, quality assurance and compliance. Hard drives store and protect the replicated versions of created content. Network SSDs provide a speed-matching data tier.

5. PRESERVE DATA

Replicated datasets are retained across regions and environments.

Stored data is the backbone of trustworthy AI, allowing data scientists to ensure models are acting as expected. Hard drives are the primary enablers of data needing longer-term storage and data protection. Network SSDs are used as a performance gasket to connect hard drives to the local SSD layer and help data move around the ecosystem.

6. REUSE DATA

Source, model, and inference data fuel the next effort.

Content outputs feed back into the model, improving its accuracy and enabling new models. Network hard drives and SSDs support geo-dispersed AI data creation. Raw datasets and outcomes become sources for new workflows..

AI workloads require a spectrum of storage.

Memory and storage technologies like DRAM, hard drives, and SSDs play critical roles throughout the AI data workflow. Each step requires an optimised mix of these devices to support the performance and scalability requirements of each workload.

Read blog