Energy of Compounding on High of a Unified Information Platform – Accelerating Innovation in Cloud Safety


Prisma Cloud is the main Cloud Safety platform that gives complete code-to-cloud visibility into your dangers and incidents, providing key remediation capabilities to handle and monitor your code-to-cloud journey. The platform right now secures over 1B+ property or workloads throughout code to cloud globally. It secures a few of the most demanding environments with prospects who’ve tens of 1000’s of cloud accounts that see fixed mutations and configuration modifications within the scale of trillions each hour.

All through this weblog we are going to evaluate Prisma Cloud’s historic strategy to constructing knowledge and AI into our merchandise, the challenges we bumped into with our present knowledge platform, and the way with Databricks Information Intelligence Platform, Prisma Cloud have achieved a transformative, enterprise-wide impression that immediately advantages each our prospects and inner groups.

Prisma Cloud’s focus was to supply best-of-breed options inside every phase/module after which present value-added safety features that assist tie indicators from completely different modules to ship deeper capabilities as a platform providing. Some examples embody:

  • Addressing posture points associated to infrastructure configuration and administration. Fixing these points in code and fostering an automation mindset assist forestall them in manufacturing. Combining our Posture Administration providing with our Code Safety providing was important to make sure traceability and resolve points immediately in code.
  • Visualizing and managing controls by a platform ‘data graph’ helps prospects perceive how assets and workloads are linked. This strategy permits them to evaluate findings and establish paths that pose better issues for a SOC administrator. Aggregating all indicators in a single place is essential for this course of.

Prisma Cloud is about up with over 10 modules, every being better of breed in its safety features and producing indicators to the platform. Clients can select to leverage the platform for his or her vertical wants (e.g. for vulnerability administration) or for the entire suite. The platform strategy encourages the client to discover adjoining areas, growing total worth and driving better stickiness.

Prisma Cloud’s technical problem is basically an information problem. With our fast module growth—pushed by each natural innovation and M&As—growing a unified knowledge technique from scratch was a demanding activity. Nevertheless the imaginative and prescient was clear: and not using a answer to consolidate all knowledge in a single place, we couldn’t absolutely ship the capabilities our prospects want whereas harnessing the facility of best-of-breed modules.

As one of many largest adopters of GenAI, Palo Alto Networks has constructed its AI technique round three key pillars: leveraging AI to boost safety choices, securing AI to assist prospects defend their AI utilization, and optimizing person expertise by AI-driven copilots and automation. See PrecisionAI for extra particulars.

Palo Alto Networks and Prisma Cloud had a robust historical past of deep AI/ML utilization throughout a number of merchandise and options lengthy earlier than the GenAI wave reshaped the business. Nevertheless, the fast evolution of AI capabilities accelerated the necessity for a long-term, complete knowledge technique.

Databricks ecosystem in Prisma Cloud Structure

We selected the Databricks Information Intelligence Platform as the perfect match for our strategic route and necessities, because it encompassed all of the crucial elements wanted to assist our imaginative and prescient. With Databricks, we’ve considerably accelerated our knowledge consolidation efforts and scaled modern use circumstances—delivering measurable buyer advantages inside simply six months of rollout.

In simply the primary 12 months of integrating Databricks, Palo Alto Networks achieved a transformative, enterprise-wide impression that immediately advantages each our prospects and inner groups. By centralizing knowledge workflows on the Databricks Platform, we considerably diminished complexity and accelerated innovation, enabling us to iterate on AI/ML options thrice quicker than earlier than. Alongside this elevated velocity, we realized a 20% discount in price of products bought and a 3x lower in engineering improvement time.

Leveraging enhanced collaboration—fueled by Databricks Workflows, Databricks Unity Catalog for unified governance, and Databricks Auto Loader has allowed us to ship safety options at an unprecedented velocity and scale. This has dramatically accelerated Prisma Cloud’s knowledge processing and enabled us to convey impactful options to market quicker than ever earlier than.

The challenges of homegrown options

Prisma Cloud runs most of its infrastructure on AWS with a mature engineering tech stack constructed round AWS native providers. Our workforce had in depth expertise leveraging Apache Spark for ETL and analytical processing, operating our infrastructure on AWS Glue and EMR.

Recognizing the necessity for a devoted knowledge platform, we initially developed a homegrown answer leveraging EMR, Glue and S3 as the inspiration for our preliminary model. Whereas this strategy labored nicely with a small workforce, scaling it to assist a broader knowledge technique and adoption throughout a number of groups shortly grew to become a problem. We discovered ourselves managing 1000’s of Glue jobs and a number of EMR clusters—all requiring enterprise-grade capabilities akin to monitoring, alerting, reliability checks, and governance/safety guardrails.

As our wants grew, so did the operational overhead. A good portion of our engineering effort was diverted to sustaining what had successfully develop into an “Working System” for our knowledge platform somewhat than specializing in innovation and value-driven use circumstances.

Whereas this effort addressed our strategic wants, we quickly began operating into a number of challenges in sustaining this model. A few of them are listed beneath

  • Bespoke tooling and knowledge transformations – Groups spent appreciable time in a number of conferences simply to establish knowledge attributes, find them and design customized pipelines for every use case, slowing down improvement and collaboration.
  • Time-consuming infrastructure administration – With a number of tuning parameters on the core of our Spark jobs, we struggled to develop a scalable, generic change administration course of. This added important cognitive load to infrastructure groups accountable for managing clusters.
  • Price administration and budgeting – Managing EMR and Glue immediately required manually setting a number of guardrails, together with centralized observability throughout all stacks. As our tasks grew, so did the headcount necessities for sustaining a extra mature knowledge platform.
  • Spark Administration – We additionally bumped into challenges round a few of the updates to the Spark core libraries not being supported on AWS which brought on a few of our jobs to be inefficient in comparison with what could be state-of-the-art. Inside AWS limits on executor administration pressured us into in depth troubleshooting and recurring conferences to find out root causes.

Regardless of these challenges, our homegrown answer continues to scale, processing tens of tens of millions of knowledge mutations per hour for crucial use circumstances. As we glance forward, we see a transparent have to migrate to a extra mature platform—one that enables us to retire in-house tooling and refocus engineering efforts on securing our prospects’ cloud environments somewhat than managing infrastructure.

Information structure and its evolution at Prisma Cloud

At Prisma Cloud, we observe the 8-factor rule for any technical analysis to evaluate its benefits and downsides. These elements are analyzed by our inner technical management committee, the place we have interaction in discussions to achieve a consensus. In circumstances the place an element can’t be adequately rated, we collect further knowledge by business-relevant prototyping to make sure a well-informed choice.

The important thing elements are listed beneath:

  • Practical match – Does it remedy our enterprise wants?
  • Structure/Design match – Is it aligned with our long-term technical imaginative and prescient?
  • Developer adoption – How well-liked is it with builders right now?
  • Stability/Ecosystem – Are there large-scale enterprises utilizing this know-how?
  • Deployment complexity – How a lot effort are we speaking about with its deployment and alter administration?
  • Price – How do the COGs evaluate to the worth of the options we plan to supply to leverage this know-how?
  • Comparative benchmarks – Are there present benchmarks that show comparable scale?

Considered one of our key long-term objectives was the flexibility to maneuver in direction of a safety knowledge mesh mannequin. Given our platform strategy, we categorize knowledge into 3 basic sorts:

  • Uncooked knowledge – This contains knowledge ingested immediately from producers or modules because it enters the platform. In Databricks lakehouse terminology – this corresponds to Bronze knowledge.
  • Processed knowledge – The Prisma Cloud Platform is an opinionated platform, transforms uncooked knowledge into normalized platform objects. That is known as Processed knowledge, which aligns with the Silver knowledge layer in lakehouse terminology.
  • Correlated knowledge – This class unlocks internet worth by correlating completely different datasets, enabling superior insights and analytics. This corresponds to the Gold layer in lakehouse terminology.

In contrast to conventional knowledge lakes, the place Bronze knowledge is commonly discarded, our platform’s breadth and depth necessitate a extra evolutionary strategy. Relatively than merely remodeling knowledge into Gold datasets, we envision our knowledge lake evolving into an information mesh, permitting for better flexibility, accessibility, and cross-domain insights. The diagram beneath displays the long-term functionality that we search to extract from our knowledge lake investments.

All of our assessments had been centered across the above philosophy.

Analysis outcomes

Other than checking all of the containers in our new know-how analysis framework, the next key insights additional cemented Databricks as our most well-liked knowledge platform.

  1. Simplification of present tech stack – Our infrastructure relied on a number of Glue and EMR jobs, lots of which required ad-hoc tooling and repetitive upkeep. With Databricks, we recognized a chance to cut back 30%-40% of our jobs, permitting our engineers to give attention to core enterprise options as a substitute of maintenance.
  2. Price discount – We noticed at the least a 20% drop in present spend, even earlier than factoring in amortization with accelerated adoption throughout varied use circumstances.
  3. Platform options and ecosystem – Databricks offered speedy worth by options akin to JDBC URL publicity for knowledge consumption, built-in ML/AI infrastructure, automated mannequin internet hosting, enhanced governance and entry management, and superior knowledge redaction and masking. These capabilities had been crucial as we upgraded our knowledge dealing with methods for each tactical and strategic wants.
  4. Coaching and adoption ease – Onboarding new engineers onto Databricks proved considerably simpler than having them construct scalable ETL pipelines from scratch on AWS. This lowered the barrier to entry and accelerated the adoption of Spark-based applied sciences, that are important at our scale.

Analysis particulars

Standards EMR/GLUE (or Cloud Present native tech) Databricks
Ease of Deployment Every workforce must work on their deployment code. Usually a dash of labor. One-time integration and groups will undertake. SRE work was diminished to some days.
Ease of Admin Sustaining variations and safety patches. SREs usually take just a few days. SRE work is now not wanted.
Integrations SRE must setup Airflow and ksql (usually a dash of labor for brand new groups) Out of the Field
MLflow Want to purchase a instrument or undertake open supply. Every workforce must combine. (A couple of months first time, a dash of labor for every workforce). Out of the Field
Information Catalog(Requires Information lineage, safety, role-based entry management, searchable and tagging the info.) Want to purchase instruments and combine with Prisma. Out of the Field
Leverage ML Libraries and Auto ML Want to purchase and combine with Prisma. Out of the Field
SPOG for Builders and SRE Not out there with EMR/GLUE. Out of the Field
DB sql(SQL on s3 knowledge) Athena, Presto. SRE assist is required to combine with Prisma. Out of the Field

Software case research

Given our early pilots, we had been satisfied to start out planning a migration path from our present S3-based knowledge lake onto the Databricks Platform. An ideal alternative arose with a key insights venture that required entry to knowledge from each Uncooked and Correlated layers to uncover internet new safety insights and optimize safety downside decision.

Earlier than adopting Databricks, executing such a venture concerned a number of advanced and time-consuming steps:

  • Figuring out knowledge wants – A chicken-and-egg downside emerged: whereas we wanted to outline our knowledge wants upfront, most insights required exploration throughout a number of datasets earlier than figuring out their worth.
  • Integration complexity – As soon as knowledge wants had been outlined, we needed to coordinate knowledge with house owners to ascertain integration paths—usually resulting in bespoke, one-off pipelines.
  • Governance & entry management – As soon as all knowledge is accessible, then we had to make sure correct safety and governance. This required guide configurations, with completely different implementations relying on the place the info resides.
  • Observability and troubleshooting – With knowledge pipeline monitoring break up throughout a number of groups, resolving points required important cross-team coordination, making debugging extremely use-case-specific.

We examined the impression of the Databricks Information Intelligence Platform on this crucial venture by the next steps:

  • Step 1: Infrastructure and Migration Planning

    We bootstrapped Databricks in our dev environments and began planning the migration of our inhouse knowledge lake on S3 onto Databricks. We utilized Databricks Asset Bundles and Terraform for each the migration and our infrastructure and useful resource deployment.

    Previous to adopting Databricks, engineers spent most of their time managing AWS infrastructure throughout varied instruments. With Databricks, we have now a centralized platform to handle person and group cluster configurations.

    Databricks provides an enhanced Spark surroundings by Photon, offering a totally managed platform with optimized efficiency, whereas AWS primarily delivers Spark by its EMR service, which requires extra guide configuration and doesn’t obtain the identical degree of efficiency optimization as Databricks. Moreover, the flexibility to construct, deploy, and serve fashions on Databricks has enabled us to scale extra quickly.

  • Step 2: Structuring Workstreams for Scale

    We divided the venture into 4 workstreams on the Databricks platform: Information Catalog Administration, Information Lake Hydration, Governance and Entry Management, and Dev Tooling/Automation.

    Unity Catalog was important for constructing our platform, offering unified governance and entry controls in a single area. By using attribute-based entry management (ABAC) and knowledge masking, we had been in a position to obfuscate knowledge as wanted with out slowing down improvement time.

  • Step 3: Accelerating Information Onboarding & Governance

    Catalog registration and onboarding of our present knowledge in our knowledge lake took only some hours whereas establishing governance and entry management was a one-time effort.

    Unity Catalog offered a centralized platform for managing all permissions, simplifying the safety of our whole knowledge property, together with each structured and unstructured knowledge. This encompassed governance for knowledge, fashions, dashboards, notebooks, and extra.

  • Step 4: Scaling Information Hydration & Observability

    We seamlessly built-in beforehand unavailable uncooked knowledge into our present knowledge lake and prioritized its hydration onto the Databricks Platform. Capitalizing on complete Kafka, database, and S3 integrations, we efficiently developed production-grade hydration jobs, scaling to trillions of rows inside just some sprints.

    In manufacturing, we rely extensively on Databricks Workflows, whereas interactive clusters assist improvement, testing, and efficiency environments devoted to constructing modern options for our Prisma Cloud product. Databricks Serverless SQL underpins our dashboards, making certain environment friendly monitoring of mannequin drift and efficiency metrics. Furthermore, system tables empower us to pinpoint and analyze high-cost jobs and runs over time, observe important finances fluctuations, and foster efficient price optimization and useful resource administration.

    This holistic strategy grants executives clear visibility into platform utilization and consumption, streamlining observability and budgeting with out counting on fragmented insights from a number of AWS instruments akin to EMR, Glue, SageMaker, and Neptune.

The end result

This consolidation proved transformative. Inside a single week of prototyping, we uncovered invaluable insights by combining uncooked, processed, and correlated knowledge units, enabling a extra productive analysis of product-market match. Consequently, we gained clear route on which buyer challenges to pursue and a stronger understanding of the impression we might ship.

Inside simply six months of partnering with Databricks, we launched a pivotal safety innovation for our prospects—an achievement that may have been just about inconceivable given our former know-how stack, expansive buyer base, and the necessity to prioritize core safety features.

Databricks utilization stats

  • ~3 Trillion data crunching per day.
  • P50 processing time: < 30 minutes.
  • Max parallelism: 24
  • Auto Loader utilization drops ingest latencies to seconds, providing close to real-time processing.
  • Out-of-the-box options, akin to AI/BI dashboards with system tables, helped improvement groups establish and analyze high-cost jobs and runs over time, monitor important finances modifications, and assist efficient price optimization and useful resource administration.

Conclusion

Because the above software case research confirmed, the timing of our progress aligned with Databricks rising because the main knowledge platform of selection. Our shared dedication to fast innovation and scalability made this partnership a pure match.

By reframing the technical problem of cloud safety as an information downside, we had been in a position to hunt down know-how suppliers who had been specialists on this space. This strategic shift allowed us to give attention to depth, leveraging Databricks’ highly effective platform whereas making use of our area intelligence to tailor it for our scale and enterprise wants. Finally, this collaboration has empowered us to speed up innovation, improve safety insights, and ship better worth to our prospects.

Learn extra concerning the Databricks and Palo Alto Networks collaboration right here.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles