Juan Soto

Academic Director
Chair of Database Systems and Information Management
Faculty IV Electrical Engineering and Computer Science
Techniche Universität Berlin
Einsteinufer 17
10587 Berlin
Germany

Office: Building EN-7, Room 724
E-Mail: juan dot soto at tu-berlin dot de
Tel: +49 30 314 23551

About

Education

Further Undergraduate and Graduate Coursework

Specialized Training

Honors and Awards

Professional Experience

Publications

  1. A Survey on Transactional Stream Processing, S. Zhang, J. Soto, and V. Markl, VLDB Journal 2022 (submitted).
  2. Handling Iteration in Distributed Dataflow Systems, G. Gevay, J. Soto, V. Markl, ACM Computing Surveys, October 2021.
  3. HyMAC: A Hybrid Matrix Computation System (Demo Paper), Z. Chen, Z. Xu, C. Xu, J. Soto, V. Markl, W. Qian, and A. Zhou, VLDB 2021.
  4. Hybrid Evaluation for Distributed Iterative Matrix Computation, Z. Chen, C. Xu, J. Soto, V. Markl, W. Qian, and A. Zhou, SIGMOD 2021.
  5. Fault Tolerance for Distributed Iterative Dataflows in Action, C. Xu, R. P. Lemaitre, J. Soto, and V. Markl, PVLDB 11 (12), 1990-1993, VLDB Endowment, 2018.
  6. A Survey of State Management in Big Data Processing Systems, Q.C. To, J. Soto, and V. Markl, The VLDB Journal 27 (6), 847-872, Springer, 2018.
  7. Large Scale Data Stream Processing Systems, Book Chapter, Paris Carbone, et al., Handbook of Big Data Technologies, pp. 219-260, Springer, 2017.
  8. On Fault Tolerance for Distributed Iterative Dataflow Processing, C. Xu, et al., IEEE Transactions on Knowledge & Data Engineering (TKDE), Vol. 29, Issue 8, pp. 1709-1722, IEEE, 2017.
  9. Efficient Sample Generation for Scalable Meta Learning, S. Schelter, J. Soto, V. Markl, D. Burdick, B. Reinwald, and A. Evfimievski, 31st IEEE International Conference on Data Engineering (ICDE) April 2015.
  10. Composite Key Generation on a Shared-Nothing Architecture, M. Hoffmann, A. Alexandrov, P. Andritsos, J. Soto, and V. Markl, The Sixth Technology Conference on Performance Evaluation and Benchmarking TPCTC 2014 (Collocated with VLDB 2014).
  11. Challenges and Opportunities in Big Data Generation, C. Brücke, M. Hoffmann, V. Markl, and J. Soto, Big (Data) is Beautiful, Informatiktage 2014, The German Informatics Society (GI), March 27-28, 2014.
  12. Organisational and Societal Obstacles to Implementations of Technical Systems Supporting PSI Reuse, Nils Barnickel, Edzard Höfig, Jens Klessmann, Juan Soto, Fraunhofer Institute FOKUS, Share PSI (Public Sector Information) Workshop, May 2011, Brussels, Belgium.
  13. An Experience Report on the Verification of Algorithms in the C++ Standard Library using Frama-C, J. Burghardt, J. Gerlach, H. Pohl, and J. Soto, Proceedings of the International Conference on Formal Verification of Object-oriented Software (FoVeOOS 2010), pages 191-204, June 2010.
  14. ACSL By Example: Towards a Verified C Standard Library (Version 5.1.0 for Frama-C Boron), Burghardt, J. Gerlach, K. Hartig, H. Pohl, and J. Soto, Fraunhofer FIRST Technical Report, May 2010.
  15. Open-Source Excel Tools for Statistical Metrology, XVIII IMEKO World Congress, September 2006.
  16. The Limits of Image-based Optical Overlay Metrology, Proceedings of the 31st SPIE International Symposium: Microlithography 2006.
  17. NIST Special Publication 800-22: A Statistical Test Suite for the Validation of Random Number Generators and Pseudo Random Number Generators for Cryptographic Applications, October 2000.
  18. NIST Technical Report 6483: Randomness Testing of the Advanced Encryption Standard Finalist Candidates, April 2000.
  19. Randomness Testing of IBM Hardware Based Random Number Generators, NIST Internal Report, October 1999.
  20. Statistical Testing of Random Number Generators, Proceedings of the 22nd National Information Systems Security Conference, October 1999.
  21. NIST Technical Report 6390: Randomness Testing of the Advanced Encryption Standard Candidate Algorithms, September 1999.
  22. Adaptive Finite-difference Computations of Dendritic Growth Using a Phase-field Model, Modelling Simulation Materials Science Engineering, 5 (1997) 365-380.
  23. On the Optimization of Kinematics for Planar Robot Linkages, ASEL Technical Report #ROB9508, Applied Science & Engineering Laboratories, University of Delaware, Alfred I. DuPont Institute, Wilmington, DE, December 1995.
  24. Mathematica and the Newman-Penrose Formalism, Proceedings of the Workshop on Computational Physics, California State University at Fullerton, Editors: Mark Shapiro & Jim Feagin, June 1990.

Teaching Experience: Bachelor's Courses

  1. Recitation Classes in English, TRIO Program, UPR Humacao (Fall 1988)
  2. Recitation Classes in Mathematics, TRIO Program, UPR Humacao (Spring 1989)
  3. Developmental Math, Catonsville Community College (Fall 1993)
  4. Applied Algebra and Trigonometry, Catonsville Community College (Fall 1993)
  5. Business Math, Catonsville Community College (Fall 1993)
  6. Programming in BASIC, Catonsville Community College (Fall 1993)
  7. Statistical Methods, Catonsville Community College (Winter 1994)
  8. Developmental Math, Catonsville Community College (Spring 1994)
  9. Assembly Language Programming, Catonsville Community College (Spring 1994)
  10. Calculus I, Catonsville Community College (Summer 1994)
  11. CISC 120: Object Oriented Programming in C++, Univ. of Delaware (WS 1996)
  12. DBSEM Foundations of Database Systems Seminar, TU Berlin (WS 2019-2020)
  13. DBSEM Foundations of Database Systems Seminar, TU Berlin (SS 2020)
  14. DBSEM Foundations of Database Systems Seminar, TU Berlin (WS 2021-2022)
  15. DBSEM Database Systems Seminar (Data Cleaning), TU Berlin (SS 2021)
  16. DBSEM Database Systems Seminar (Data Stream Management), TU Berlin (SS 2021)
  17. DBSEM Database Systems Seminar (Data Cleaning), TU Berlin (WS 2021-2022)
  18. DBSEM Database Systems Seminar (Data Cleaning), TU Berlin (SS 2022)
  19. DBSEM Database Systems Seminar (Data Stream Management), TU Berlin (SS 2022)
  20. DBSEM Database Systems Seminar (Data Cleaning), TU Berlin (WS 2022-2023)

Teaching Experience: Master's Courses

  1. AIM-3 Scalable Data Science, TU Berlin (WS 2014-2015)
  2. AIM-3 Scalable Data Science, TU Berlin (SS 2015)
  3. AIM-3 Scalable Data Science, TU Berlin (WS 2015-2016)
  4. AIM-3 Scalable Data Science, TU Berlin (SS 2016)
  5. AIM-3 Scalable Data Science, TU Berlin (WS 2016-2017)
  6. AIM-3 Scalable Data Science, TU Berlin (SS 2017)
  7. AIM-3 Scalable Data Science, TU Berlin (WS 2017-2018)
  8. AIM-3 Scalable Data Science, TU Berlin (SS 2018)
  9. AIM-3 Scalable Data Science, TU Berlin (WS 2018-2019)
  10. AIM-3 Scalable Data Science, TU Berlin (SS 2019)
  11. BDASEM Big Data Analytics Seminar, TU Berlin (WS 2019-2020)
  12. BDASEM Big Data Analytics Seminar, TU Berlin (SS 2020)
  13. BDASEM Big Data Analytics Seminar, TU Berlin (WS 2021-2022)
  14. IMSEM Hot Topics in Information Management Seminar, TU Berlin (WS 2021-2022)
  15. IMSEM Hot Topics in Information Management Seminar, TU Berlin (WS 2022-2023)

Supervised Bachelor's Theses

  1. Design, Implementation, and Evaluation of Naive Bayes Classification in Apache Flink, Jonathan Hasenburg, 2015.
  2. Exploratory Data Analysis in Practice, Jessica Bongard, 2018.
  3. On the Problem of Data Visualization Method Selection: Challenges and a Proposed Solution, Maximilian Ksoll, 2019.
  4. An Analysis of Lightweight Cryptographic Algorithms, Nabil Douss, 2019.
  5. An Analysis of Randomness in Lightweight Cryptographic Algorithms Designed for Internet of Things Applications, Jan-Ulrich Holtgrave, 2020.
  6. On the Problem of Customer Lifetime Value Prediction, Klea Paci, 2020.
  7. On the Issue of Fairness in Machine Learning Methods, Florian Schuler, 2020.
  8. On the Evaluation of Software Quality in Machine Learning Libraries, Arthur Söhler, 2022
  9. An Empirical Study on the Capabilities and Limitations of Inclusion Dependency Discovery Algorithms for Data Profiling, Johanna Werhahn, 2022
  10. On Fairness and Bias in Data Science, Martin Manolov, 2023
  11. Data Transformations in Data Cleaning, David Michel, 2023
  12. Forecasting Challenges and Countermeasures, Enea Gurra, 2023

Supervised Master's Theses

  1. Identifier Namespaces in Mathematical Notation, Alexey Grigorev, 2015.
  2. Techniques for Hierarchical Product Classification, Marilyn Nowacka Barros, 2017.
  3. Anomaly Detection in Online Monitoring, Benjamin Pietrowicz, 2018.
  4. Visual Analytics for Moving Objects Databases, Fabricio Ferreira da Silva, 2021.
  5. Automating Data Quality Processes in Big Data Settings, Yalei Li, 2021.
  6. Towards Continual Learning in Language Modelling, Haroon Rashid, 2021.
  7. SentiStream: Towards Online Sentiment Learning of Massive Data Streams, Huilin Wu, 2022.
  8. Towards Real-time Fault Diagnosis: A Hybrid-model Approach, Yuan Li, 2022.
  9. On the Problem of Combining Models for Improved Anomaly Detection, Liying Yang, 2022.
  10. On the Problem of Software Quality in Machine Learning Systems, Haftamu Hailu Tefera, 2022.

Imprint. The webpages of the TU Berlin (TUB) consists of numerous offers from various institution (e.g., faculties, institutes, chairs, central facilities, administrative facilities) and the personal pages of TUB employees. The institutions, and the persons who have created the webpages are editorially responsible, and responsible for the content. Please contact the responsible authors of a relevant page if you have questions.

Address of the TU Berlin: TU Berlin, Die Präsidentin, Prof. Dr. Geraldine Rauch, Strasse des 17. Juni 135, 10623 Berlin, TU Berlin.

Editorial and Content Responsibility. TU Berlin, Juan Soto, Database Systems and Information Management Group, Electrical Engineering and Computer Science (Faculty IV), Einsteinufer 17, 10587 Berlin, juan dot soto at tu-berlin dot de.

Legal Notices on Copyright. The webpage layout, the graphics used, and all other content are protected by copyright.

Data Privacy. Thank you for your interest in the webpage of Juan Soto, Database Systems and Information Management Group (DIMA) at Technische Universität Berlin (TUB). The protection of the personal data of visitor of my personal webpage is very important to us. Therefore we want to inform you about our data security policy. The following Privacy Policy refers to the webpage, which falls under the responsibility of Juan Soto. The webpage is concerned with the dissemination of information about my person, research projects, publications and teaching at Database Systems and Information Management group (DIMA) at TU Berlin. It is not concerned with commercial transactions or with the exchange of data for marketing purposes.

Subject of Data Privacy. Data privacy covers personal data. According to art. 4 par. 1 of DSGVO, these are data referring to an identified or identifiable individual, hence all data which could be used to identify you. This applies for data such as name, private address, e-mail address, telephone number but also to usage data such as your IP address. Of course, the DIMA group observes the legal requirements of data privacy and other applicable regulations. We are committed to ensure that you can trust us concerning your personal data. Therefore transfers of sensitive data are encrypted. In addition our webpages are protected against damage and unauthorized access by technical measures.

Data Collection and Storage. For the usage of our website the registration of your personal data is not necessary in general.

Collection and Storage of Usage Data. For the optimization of our websites we collect and store data, such as visited website, date and time of access, the website which you are coming from and so on for a period of two weeks. After that they will be deleted automatically.

Right of Access to Personal Data. You can retrieve information about your stored personal data without giving reasons at any time free of charge. Please contact the address provided below. We will be pleased to assist you if you have any further questions about our data privacy information. Please note that data privacy regulations and handling of data privacy can change from time to time making it necessary to inform oneself about changes of data privacy laws and company policies. This data privacy statement only applies for content of TU Berlin and DIMA webservers which provide this data privacy statement and does not cover linked websites of external webservers. We invite you to contact us if you have questions about this policy at: TU Berlin, Juan Soto, Database Systems and Information Management Group, Electrical Engineering and Computer Science (Faculty IV), Einsteinufer 17, 10587 Berlin, juan dot soto at tu-berlin dot de.

Editorial and Content Responsibility. datenschutz at dima dot tu-berlin dot de