Big Data and Systems

Big Data and Systems

  • Ray: A distributed framework for emerging AI applications. P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, W. Paul, M. I. Jordan, and I. Stoica. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Carlsbad, CA, 2018.

  • CoCoA: A general framework for communication-efficient distributed optimization. V. Smith, S. Forte, C. Ma, M. Takac, M. I. Jordan, and M. Jaggi. Journal of Machine Learning Research, 18, 1-49, 2018.

  • Ray RLlib: A framework for distributed reinforcement learning. E. Liang, R. Liaw, P. Moritz, R. Nishihara, R. Fox, K. Goldberg, J. Gonzalez, M. I. Jordan, and I. Stoica. arxiv.org/abs/1712.09381, 2018.

  • Flexible primitives for distributed deep learning in Ray. Y. Bulitov, P. Moritz, R. Nishihara, M. I. Jordan, and I. Stoica. Systems and Machine Learning Conference (SysML), Stanford, CA, 2018.

  • Communication-efficient distributed statistical inference. M. I. Jordan, J. Lee, and Y. Yang. Journal of the American Statistical Association, to appear.

  • Perturbed iterate analysis for asynchronous stochastic optimization. H. Mania, X. Pan, D. Papailiopoulos, B. Recht, K. Ramchandran, and M. I. Jordan. SIAM Journal on Optimization, to appear.

  • Real-time machine learning: The missing pieces. R. Nishihara, P. Moritz, S. Wang, A. Tumanov, W. Paul, J. Schleier-Smith, R. Liaw, M. I. Jordan and I. Stoica. arXiv:1703.03924, 2017.

  • CoCoA: A general framework for communication-efficient distributed optimization. V. Smith, S. Forte, C. Ma, M. Takac, M. I. Jordan, and M. Jaggi. arXiv:1611.02189, 2016.

  • CYCLADES: Conflict-free asynchronous machine learning. X. Pan, M. Lam, S. Tu, D. Papailiopoulos, C. Zhang, M. I. Jordan, K. Ramchandran, C. Re, and B. Recht. arXiv:1605.09721, 2016.

  • Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. Y. Zhang, X. Chen, D. Zhou, and M. I. Jordan. Journal of Machine Learning Research, 101, 1-44, 2016.

  • Parallel correlation clustering on big graphs. X. Pan, D. Papailiopoulos, S. Oymak, B. Recht, K. Ramchandran, and M. I. Jordan. arXiv:1507.05086, 2015.

  • Automating model search for large scale machine learning. E. Sparks, A. Talwalkar, D. Haas, M. Franklin, M. I. Jordan, and T. Kraska. ACM Symposium on Cloud Computing (SOCC), Kohala Coast, Hawaii, 2015.

  • Asynchronous complex analytics in a distributed dataflow architecture. J. Gonzalez, P. Bailis, M. I. Jordan, M. Franklin, J. Hellerstein, A. Ghodsi, and I. Stoica. arXiv:1510.07092, 2015.

  • Splash: User-friendly programming interface for parallelizing stochastic algorithms. Y. Zhang and M. I. Jordan. arXiv:1506.07552, 2015.

  • Distributed estimation of generalized matrix rank: Efficient algorithms and lower bounds. Y. Zhang, M. Wainwright, and M. I. Jordan. arXiv:1502.01403, 2015.

  • Communication-efficient distributed dual coordinate ascent. M. Jaggi, V. Smith, M. Takac, J. Terhorst, T. Hofmann, and M. I. Jordan. In Z. Ghahramani, M. Welling, C. Cortes and N. Lawrence (Eds.), Advances in Neural Information Processing Systems (NIPS) 28, 2015.

  • Communication-efficient distributed dual coordinate ascent. M. Jaggi, V. Smith, M. Takac, J. Terhorst, T. Hofmann, and M. I. Jordan. In Z. Ghahramani, M. Welling, C. Cortes and N. Lawrence (Eds.), Advances in Neural Information Processing Systems (NIPS) 28, 2015.

  • Parallel double greedy submodular maximization. X. Pan, S. Jegelka, J. Gonzalez, J. Bradley, and M. I. Jordan. In Z. Ghahramani, M. Welling, C. Cortes and N. Lawrence (Eds.), Advances in Neural Information Processing Systems (NIPS) 28, 2015.

  • Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. Y. Zhang, X. Chen, D. Zhou, and M. I. Jordan. In Z. Ghahramani, M. Welling, C. Cortes and N. Lawrence (Eds.), Advances in Neural Information Processing Systems (NIPS) 28, 2015.

  • Distributed matrix completion and robust factorization. L. Mackey, A. Talwalkar and M. I. Jordan. Journal of Machine Learning Research, 16, 913-960, 2015.

  • The missing piece in complex analytics: Low latency, scalable model management and serving with Velox. D. Crankshaw, P. Bailis, J. E. Gonzalez, H. Li, Z. Zhang, M. J. Franklin, A. Ghodsi, and M. I. Jordan. Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, 2014.

  • Knowing when you're wrong: Building fast and reliable approximate query processing systems. S. Agarwal, H. Milner, A. Kleiner, B. Mozafari, M. I. Jordan, S. Madden, and I. Stoica. Proceedings of the 2014 ACM International Conference on Management of Data (SIGMOD), Snowbird, Utah, 2014.

  • Information-theoretic lower bounds for distributed statistical estimation with communication constraints. J. Duchi, M. I. Jordan, M. Wainwright, and Y. Zhang. arXiv:1405.0782, 2014.

  • Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. Y. Zhang, M. Wainwright, and M. I. Jordan. arXiv:1402.1918, 2014.

  • Privacy aware learning. J. Duchi, M. I. Jordan, and M. Wainwright. Journal of the ACM, 61, http://dx.doi.org/10.1145/2666468, 2014.

  • A scalable bootstrap for massive data. A. Kleiner, A. Talwalkar, P. Sarkar and M. I. Jordan. Journal of the Royal Statistical Society, Series B, doi:10.1111/rssb.12050, 2014.

  • On statistics, computation and scalability. M. I. Jordan. Bernoulli, 19, 1378-1390, 2013.

  • Computational and statistical tradeoffs via convex relaxation. V. Chandrasekaran and M. I. Jordan. Proceedings of the National Academy of Sciences, 110, E1181-E1190, 2013.

  • Local privacy, data processing inequalities, and statistical minimax rates. J. Duchi, M. I. Jordan, and M. Wainwright. arXiv:1302.3203, 2013.

  • MLI: An API for distributed machine learning. E. Sparks, A. Talwalkar, V. Smith, J. Kottalam, X. Pan, J. Gonzalez, M. I. Jordan, M. Franklin, and T. Kraska. IEEE International Conference on Data Mining (ICDM), Dallas, TX, 2013.

  • Optimistic concurrency control for distributed unsupervised learning. X. Pan, J. Gonzalez, S. Jegelka, T. Broderick, and M. I. Jordan. arXiv:1307.8049, 2013.

  • Local privacy and statistical minimax rates. J. Duchi, M. I. Jordan, and M. Wainwright. arXiv:1302.3203, 2013.

  • MAD-Bayes: MAP-based asymptotic derivations from Bayes. T. Broderick, B. Kulis, and M. I. Jordan. In S. Dasgupta and D. McAllester (Eds.), Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, 2013. [Supplementary information].

  • A general bootstrap performance diagnostic. A. Kleiner, A. Talwalkar, S. Agarwal, M. I. Jordan, and I. Stoica. ACM Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, IL, 2013.

  • Small-variance asymptotics for exponential family Dirichlet process mixture models. K. Jiang, B. Kulis, and M. I. Jordan. In P. Bartlett, F. Pereira, L. Bottou and C. Burges (Eds.), Advances in Neural Information Processing Systems (NIPS) 26, 2013.

  • Privacy aware learning. J. Duchi, M. I. Jordan, and M. Wainwright. In P. Bartlett, F. Pereira, L. Bottou and C. Burges (Eds.), Advances in Neural Information Processing Systems (NIPS) 26, 2013.

  • The Big Data bootstrap. A. Kleiner, A. Talwalkar, P. Sarkar, and M. I. Jordan. In J. Langford and J. Pineau (Eds.), Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, UK, 2012.

  • Revisiting k-means: New algorithms via Bayesian nonparametrics. B. Kulis and M. I. Jordan. In J. Langford and J. Pineau (Eds.), Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, UK, 2012.

  • Bayesian bias mitigation for crowdsourcing. F. L. Wauthier and M. I. Jordan. In P. Bartlett, F. Pereira, J. Shawe-Taylor and R. Zemel (Eds.) Advances in Neural Information Processing Systems (NIPS) 25, 2012.

  • Divide-and-conquer matrix factorization. L. Mackey, A. Talwalkar and M. I. Jordan. In P. Bartlett, F. Pereira, J. Shawe-Taylor and R. Zemel (Eds.) Advances in Neural Information Processing Systems (NIPS) 25, 2012.

  • The SCADS Director: Scaling a distributed storage system under stringent performance requirements. B. Trushkowsky, P. Bodik, A. Fox, M. Franklin, M. I. Jordan, and D. Patterson. In 9th USENIX Conference on File and Storage Technologies (FAST '11), San Jose, CA, 2011.

  • Bayesian inference for queueing networks and modeling of Internet services. C. Sutton and M. I. Jordan. Annals of Applied Statistics, 5, 254-282, 2011.

  • Managing data transfers in computer clusters with Orchestra. M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica (2011). ACM SIGCOMM, Toronto, Canada, 2011.

  • Detecting large-scale system problems by mining console logs. W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. Proceedings of the 27th International Conference on Machine Learning (ICML), Haifa, Israel, 2010.

  • Characterizing, modeling, and generating workload spikes for stateful services. P. Bodik, A. Fox, M. Franklin, M. I. Jordan, and D. Patterson. First ACM Symposium on Cloud Computing (SOCC), Indianapolis, IN, 2010.

  • Large-scale system problems detection by mining console logs. W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. 22nd ACM Symposium on Operating Systems Principles (SOSP), Big Sky, MT, 2009.

  • Online system problem detection by mining patterns of console logs. W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. IEEE International Conference on Data Mining (ICDM), Miami, FL, 2009.

  • Automatic exploration of datacenter performance regimes. P. Bodik, R. Griffith, C. Sutton, A. Fox, M. I. Jordan, and D. Patterson. First Workshop on Automated Control for Datacenters and Clouds (ACDC), Barcelona, Spain, 2009.

  • Statistical machine learning makes automatic control practical for Internet datacenters. P. Bodik, R. Griffith, C. Sutton, A. Fox, M. I. Jordan, and D. Patterson. Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, CA, 2009.

  • Predicting multiple performance metrics for queries: Better decisions enabled by machine learning. A. Ganapathi, H. Kuno, U. Dayal, J. Wiener, A. Fox, M. I. Jordan, and D. Patterson. IEEE International Conference on Data Engineering (ICDE), Shanghai, China, 2009.

  • Probabilistic inference in queueing networks. C. A. Sutton and M. I. Jordan. Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SYSML), 2008.

  • Communication-efficient online detection of network-wide anomalies. L. Huang, X. Nguyen, M. Garofalakis, J. M. Hellerstein, M. I. Jordan, A. Joseph, and N. Taft. 26th Annual IEEE Conference on Computer Communications (INFOCOM'07), 2007.

  • In-network PCA and anomaly detection. L. Huang, X. Nguyen, M. Garofalakis, M. I. Jordan, A. Joseph, and N. Taft. In B. Schoelkopf, J. Platt and T. Hofmann (Eds.), Advances in Neural Information Processing Systems (NIPS) 20, 2007. [Long version].

  • Response-time modeling for resource allocation and energy-informed SLAs. P. Bodik, C. Sutton, A. Fox, D. Patterson, and M. I. Jordan. Workshop on Statistical Learning Techniques for Solving Systems Problems, Whistler, BC, 2007.

  • Statistical debugging: Simultaneous identification of multiple bugs. A. Zheng, M. I. Jordan, B. Liblit, M. Nayur, and A. Aiken. Proceedings of the 23rd International Conference on Machine Learning (ICML), 2006.

  • Scalable statistical bug isolation. B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. In press, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2005. [Software]

  • Combining visualization and statistical analysis to improve operator confidence and efficiency for failure detection and localization. P. Bodik, G. Friedman, L. Biewald, H. Levine, G. Candea, K. Patel, G. Tolle, J. Hui, A. Fox, M. I. Jordan, and D. Patterson. International Conference on Autonomic Computing (ICAC), 2005.

  • Combining statistical monitoring and predictable recovery for self-management. A. Fox, E. Kiciman, D. A. Patterson, R. H. Katz and M. I. Jordan. ACM SIGSOFT Proceedings of the Workshop on Self-Managed Systems (WOSS), 2004.

  • Bug isolation via remote program sampling. B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI), San Diego, 2003.

  • Public deployment of cooperative bug isolation. B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Workshop on Remote Analysis and Measurement of Software Systems (RAMSS), 2004.

  • Statistical debugging of sampled programs. A. X. Zheng, M. I. Jordan, B. Liblit, and A. Aiken. In S. Thrun, L. Saul, and B. Schoelkopf (Eds.), Advances in Neural Information Processing Systems (NIPS) 17, 2004.

  • Failure diagnosis using decision trees. M. Chen, A. X. Zheng, J. Lloyd, M. I. Jordan, and E. Brewer. International Conference on Autonomic Computing (ICAC), 2004.

  • Sampling user executions for bug isolation. B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Workshop on Remote Analysis and Measurement of Software Systems (RAMSS), 2003.

  • Stable algorithms for link analysis. A. Y. Ng, A. X. Zheng, and M. I. Jordan. Proceedings of the 24th International Conference on Research and Development in Information Retrieval (SIGIR), New York, NY: ACM Press, 2001.

  • Link analysis, eigenvectors, and stability. A. Y. Ng, A. X. Zheng, and M. I. Jordan. International Joint Conference on Artificial Intelligence (IJCAI), 2001.