Hortonworks から最新情報をメールで受け取る

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.





Sandbox をダウンロード




クラウド Hortonworks はリーダーです。Forrester Wave をお読みください



What happens when the data you need is hidden in silos, or when billions of dollars are riding on drug testing data you can’t access? How do you see a long-term view of 10 billion records to understand biological response to drugs? Researchers in the pharmaceutical industry turn to Hortonworks for advanced big data analytics on integrated translational data and to gain a holistic view of their pharmaceutical data.


Big Data integration, pharmaceutical big data analytics, internal and external collaboration, portfolio decision support, more efficient clinical trials, faster time to market, improved yields, improved safety - these are just a few of the benefits pharmaceutical companies around the world achieve by tapping into the full power of their pharma big data.


「ゴールデンバッチ」を目指して、Merck はワクチン歩留まりを最適化

Merck は製造データを分析し、最も重要な予測変数を「ゴールデンバッチ」として区別することで、ワクチン歩留まりを最適化しました。Merck の幹部陣は、生産量を増やしコストを削減するため、長い間無駄のない製造に頼ってきましたが、歩留まりを強化する斬新的な方法を発見することは次第に困難になりました。そこで、さらにコストを削減し、歩留まりを向上することができる新たな洞察を求めて、オープンなエンタープライズ Hadoop を検討しました。Merck は Hortonworks に対して、1 つのワクチンを 10 年分さかのぼり、255 件のバッチ記録からデータを発見するよう依頼しました。そのデータはメンテナンスとビル管理システムの 16 ヶ所に分散されており、較正設定、空気圧、温度、湿度などの正確なセンサーデータが含まれていました。Hortonworks Data Platform にすべてのデータを蓄積して 150 億件の計算を処理した結果、Merck は 10 年にわたり求めていた質問への新たな答えを得ました。何百もの変数の中から、Merck チームは歩留まりを最適化するものを発見することができたのです。同社は、可能な限り低価格で高品質の医薬品を提供することに注力しながら、他のワクチンにもそれらの教訓を適用する道へと進みました。InformationWeek のダグ・ヘンシェンによる、Merck のジョージ・リャド氏のインタビューをご覧ください。


One Hortonworks pharmaceutical customer uses HDP for a single view of its supply chain and their self-declared “War on Waste”. The operations team added up the ingredients going into making their drugs, and compared that with the physical product they shipped. They found a big gap between the two and launched their War on Waste, using HDP big data analytics to identify where those valuable resources were going. Once it identifies those root causes of waste, real-time alerts in HDP notify the team when they are at risk of exceeding pre-determined thresholds.


The goal of Translational Research is to apply the results of laboratory research towards improving human health. Hadoop empowers researchers, clinicians, and analysts to unlock insights from translational data to drive evidence-based medicine programs. The data sources for translational research are complex and typically locked in data siloes, making it difficult for scientists to obtain an integrated, holistic view of their data. Other challenges revolve around data latency (the delay in getting data loaded into traditional data stores), handling unstructured and semi-structured types of data, and bridging lack of collaborative analysis between translation and clinical development groups. Researchers are turning to Open Enterprise Hadoop as a cost-effective, reliable platform for managing big data in clinical trials and performing advanced analytics on integrated translational data. HDP allows translational and clinical groups to combine key data from sources such as: Omics (genomics, proteomics, transcription profiling, etc) Preclinical data Electronic lab notebooks Clinical data warehouses Tissue imaging data Medical devices and sensors File sources (such as Excel and SAS) Medical literature Through Hadoop, analysts can build a holistic view that helps them understand biological response and molecular mechanisms for compounds or drugs. They’re also able to uncover biomarkers for use in R&D and clinical trials. Finally, they can be assured that all data will be stored forever, in its native format, for analysis with multiple future applications.


IT systems cannot economically store and process next generation sequencing (NGS) data. For example, primary sequencing results are in large image format and are too costly to store over the long term. Point solutions have lacked the flexibility to keep up with changing analytical methodologies, and are often expensive to customize and maintain. Open Enterprise Hadoop overcomes those challenges by helping data scientists and researchers unlock insights from NGS data while preserving the raw results on a reliable, cost-effective platform. NGS scientists are discovering the benefits of large-scale processing and analysis delivered by HDP components such as Apache Spark. Pharmaceutical researchers are using Hadoop to easily ingest diverse data types from external sources of genetic data, such as TCGA , GENBank , and EMBL. Another clear advantage of HDP for NGS is that researchers have access to cutting-edge bioinformatics tools built specifically for Hadoop. These enable analysis of various NGS data formats, sorting of reads, and merging of results. This takes NGS to the next level through: Batch processing of large NGS data sets Integration of internal with publically available external sequence data Permanent data storage for large image files, in their native format Substantial cost savings on data processing and storage.

HDP はリアルワールドエビデンスを提供するために、リアルワールドデータを使用

Real-World Evidence (RWE) promises to quantify improvements to health outcomes and treatments, but this data must be available at scale. High data storage and processing costs, challenges with merging structured and unstructured data, and an over-reliance on informatics resources for analysis-ready data have all slowed the evolution of RWE. With Hadoop, RWE groups are combining key data sources, including claims, prescriptions, electronic medical records, HIE, and social media, to obtain a full view of RWE. With big data analytics in the pharmaceutical industry, analysts are unlocking real insights and delivering advanced insights via cost-effective and familiar tools such as SAS® ,R®, TIBCO™ Spotfire®, or Tableau®. RWE through Hadoop delivers value with optimal health resource utilization across different patient cohorts, a holistic view of cost/quality tradeoffs, analysis of treatment pathways, competitive pricing studies, concomitant medication analysis, clinical trial targeting based on geographic & demographic prevalence of disease, prioritization of pipelined drug candidates, metrics for performance-based pricing contracts, drug adherence studies, and permanent data storage for compliance audits.


リアルワールドエビデンス(RWE)は、健康転帰や治療における改善を確実に数量化しますが、このデータは大規模に利用可能でなければなりません。高額なデータストレージコストや処理コスト、構造化データと非構造化データの統合に関する課題、情報科学リソースに対する分析可能データの過度の依存、これらすべてが RWE の進化を遅らせてきました。RWE グループは Hadoop を使って、保険請求、処方箋、電子カルテ、HIE、ソーシャルメディアなどの主要なデータソースを組み合わせ、RWE の完全なビューを得ています。アナリストは本物の洞察を発見し、SAS®、R®、TIBCO™ Spotfire®、Tableau® などの費用対効果が高く使い慣れたツールを使用して、高度な分析洞察を提供しています。Hadoop を使った RWE は以下のように価値を提供します: • 異なる患者コホート全体に対する最適な医療リソースの活用 • コストと品質の妥協点に対する全体的なビュー • 治療過程の分析 • 競争力のある価格設定の研究 • 併用薬の分析 • 地理および人口統計学的有病率を元に焦点を絞った臨床試験 • 新薬パイプライン候補の優先順位付け • パフォーマンスベース価格による契約の設定基準 • 服薬遵守に関する研究 • コンプライアンス監査に備えた永久的なデータを保管します。