![]() We implement HPCache in Proteus and show that i) estimating speedup potential improves memory space utilization, and ii) simple runtime statistics suffice to infer speedup expectations. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. HPCache caches data based on their speedup potential instead of relying on frequency-based statistics. This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. As a result, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. ![]() On the other hand, fast storage offers loading times that approach or even outperform fully in-memory query execution response times, rendering purely frequency-based statistics incapable of capturing impact of a caching decision on query execution. Purely frequency- & time-based caching decisions, however, are a proxy of the expected query execution speedup only when disk accesses are significantly slower than in-memory query processing. As query processing starts to dominate the execution time, the gap between online and offline AQP methods diminishes.Īnalytical engines rely on in-memory caching to avoid disk accesses and provide timely responses by keeping the most frequently accessed data in memory. The cost of data reduction with materialization is up to 2.5x of the equivalent group-by in the case of stratified sampling and virtually free (∼1x) for reasonable sample sizes of other strategies. Second, we reduce the performance penalties of the added processing and sample materialization through system-aware operator design and compare the sample creation time to the matching relational operators of an in-memory JIT-compiled engine. First, we evaluate three common sampling methods, identify algorithmic bottlenecks, and propose hardware-conscious optimizations. We demonstrate that the sampling operators can be practical in modern scale-up analytical systems. ![]() Modern analytical engines optimized for high-bandwidth media and multi-core architectures only exacerbate existing inefficiencies, resulting in prohibitive query-time online sampling and longer preprocessing times in offline AQP systems. While approximate query processing (AQP) techniques present a principled method to trade off accuracy for faster queries in analytics, the sample creation is often considered a second-class citizen. As the data volume grows, reducing the query execution times remains an elusive goal. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |