Chair of Database Systems and Information Management
Faculty IV Electrical Engineering and Computer Science
Techniche Universität Berlin
Office: Building EN-7, Room 724
E-Mail: juan dot soto at tu-berlin dot de
Tel: +49 30 314 23551
- Juan Soto is a Puerto Rican Computer Scientist, Lecturer, Researcher, and Academic Director in the Database Systems and Information Management (DIMA) Group at the TU Berlin.
- His responsibilities include conducting data science research (e.g., automating exploratory data analysis, investigating numerical issues arising in big data analytics), contributing to grant writing efforts, coordinating TUB Data Analytics Lab activities, managing industrial projects, teaching Bachelor's seminars on data cleaning and data stream management and a Master's seminar on data science, supporting computer science and engineering graduate students, and technical editing of research papers and grant proposals, among other things.
- Between 2016 - 2022, Juan was a Senior Consultant at the German Research Institute for Artificial Intelligence (DFKI).
- Over the past 25 years, Juan has worked in varying computer science subfields, including business intelligence, computer vision, cybersecurity, embedded systems, health IT, mathematical science, remote sensing, software engineering, and statistical science.
- His primary areas of expertise include cybersecurity, data analysis, scientific/statistical/symbolic mathematical computing, and software engineering.
- B.S. in Computational Mathematics, University of Puerto Rico, Humacao, Puerto Rico
- M.S. in Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York
- M.S. in Computer and Information Sciences, University of Delaware, Newark, Delaware
- Graduate Certificate in Federal Statistics, George Mason University, Fairfax, Virginia
Further Undergraduate and Graduate Coursework
- Bachelor's Course (Differential Equations), Stony Brook University, Stony Brook, New York
- Bachelor's Courses (Logic in Computer Science, Computer Architecture), University of Delaware, Newark, Delaware
- Master's Course (Enterprise Security and Privacy), John Hopkins University, Rockville, Maryland
- Master's Courses (Management of Information and Systems Security, Systems Engineering, Decision Making Under Uncertainty, Managing the Protection of Information Assets and Systems), George Washington University, Gaithersburg, Maryland
- While at NIST, he completed a Systems Security Officer course and a Certification & Accreditation Process course.
- While at Cygnacom Solutions, he achieved Certification in the Evaluation of Cryptographic Modules, according to the FIPS 140-2 standard on the Security Requirements for Cryptographic Modules.
- Additionally, he further developed his management skills via participation in the Building the Next Generation Leadership Development Program at NIST and completion of a Project Management course at the United States Office of Personnel Management's Western Management Development Center in Denver, Colorado.
Honors and Awards
- RSA Award for Excellence in Public Policy for Leadership in Developing the Advanced Encryption Standard (AES), 2001
- U.S. Department of Commerce Gold Medal for Leadership in Developing the AES (the highest honor granted for distinguished and exceptional performance), 2001
- President's Fellowship, University of Delaware, 1994 - 1996
- Teaching Fellowship (Mathematics), Catonsville Community College, 1993 - 1994
- William Burghardt Turner Fellowship, Stony Brook University, 1991 - 1993
- Graduated Magna Cum Laude, University of Puerto Rico, 1991
- Natural Sciences Honors Scholarship, Turabo University, Puerto Rico, 1987 - 1988
- In summer 2014, he was a Guest Researcher at the SICS Swedish ICT (Currently, RISE Research Institutes of Sweden), in an effort to further grow the collaboration between the TUB Data Analytics Laboratory and SICS research teams and pursue a joint EU Horizon 2020 project proposal.
- Formerly, he was a Senior Project Manager in the TUB Security in Telecommunications Group and the Deutsche Telekom Innovation Laboratories, where he was responsible for boosting TUB's overall security research programs (e.g., supporting the Helmholtz Research School on Security Technologies), coordinating widespread activities, and driving TUB security initiatives.
- Prior to joining TUB in August 2012, he was a Project Leader & Senior Researcher in the Software Quality and Process Improvement domain at Fraunhofer FOKUS (Fraunhofer Institute for Open Communication Systems).
- When he lived in the United States, he held scientific (R&D) and engineering-systems oriented positions at Stanford Research Institute (SRI) International (Menlo Park, California), White Oak Technologies (Acquired by Novetta Solutions) (Silver Spring, Maryland), the National Institute of Standards and Technology (Gaithersburg, Maryland), Cygnacom Solutions (An Entrust Company) (McLean, Virginia), and Lockheed-Martin Management & Data Systems (Valley Forge, Pennsylvania).
- Early in his career, he was a Mathematics Instructor at the Community College of Baltimore County (Baltimore, Maryland), where he taught courses in mathematics, computer programming, and statistical methods.
- A Survey on Transactional Stream Processing, S. Zhang, J. Soto, and V. Markl, VLDB Journal 2022 (submitted).
- Handling Iteration in Distributed Dataflow Systems, G. Gevay, J. Soto, V. Markl, ACM Computing Surveys, October 2021.
- HyMAC: A Hybrid Matrix Computation System (Demo Paper), Z. Chen, Z. Xu, C. Xu, J. Soto, V. Markl, W. Qian, and A. Zhou, VLDB 2021.
- Hybrid Evaluation for Distributed Iterative Matrix Computation, Z. Chen, C. Xu, J. Soto, V. Markl, W. Qian, and A. Zhou, SIGMOD 2021.
- Fault Tolerance for Distributed Iterative Dataflows in Action, C. Xu, R. P. Lemaitre, J. Soto, and V. Markl, PVLDB 11 (12), 1990-1993, VLDB Endowment, 2018.
- A Survey of State Management in Big Data Processing Systems, Q.C. To, J. Soto, and V. Markl, The VLDB Journal 27 (6), 847-872, Springer, 2018.
- Large Scale Data Stream Processing Systems, Book Chapter, Paris Carbone, et al., Handbook of Big Data Technologies, pp. 219-260, Springer, 2017.
- On Fault Tolerance for Distributed Iterative Dataflow Processing, C. Xu, et al., IEEE Transactions on Knowledge & Data Engineering (TKDE), Vol. 29, Issue 8, pp. 1709-1722, IEEE, 2017.
- Efficient Sample Generation for Scalable Meta Learning, S. Schelter, J. Soto, V. Markl, D. Burdick, B. Reinwald, and A. Evfimievski, 31st IEEE International Conference on Data Engineering (ICDE) April 2015.
- Composite Key Generation on a Shared-Nothing Architecture, M. Hoffmann, A. Alexandrov, P. Andritsos, J. Soto, and V. Markl, The Sixth Technology Conference on Performance Evaluation and Benchmarking TPCTC 2014 (Collocated with VLDB 2014).
- Challenges and Opportunities in Big Data Generation, C. Brücke, M. Hoffmann, V. Markl, and J. Soto, Big (Data) is Beautiful, Informatiktage 2014, The German Informatics Society (GI), March 27-28, 2014.
- Organisational and Societal Obstacles to Implementations of Technical Systems Supporting PSI Reuse, Nils Barnickel, Edzard Höfig, Jens Klessmann, Juan Soto, Fraunhofer Institute FOKUS, Share PSI (Public Sector Information) Workshop, May 2011, Brussels, Belgium.
- An Experience Report on the Verification of Algorithms in the C++ Standard Library using Frama-C, J. Burghardt, J. Gerlach, H. Pohl, and J. Soto, Proceedings of the International Conference on Formal Verification of Object-oriented Software (FoVeOOS 2010), pages 191-204, June 2010.
- ACSL By Example: Towards a Verified C Standard Library (Version 5.1.0 for Frama-C Boron), Burghardt, J. Gerlach, K. Hartig, H. Pohl, and J. Soto, Fraunhofer FIRST Technical Report, May 2010.
- Open-Source Excel Tools for Statistical Metrology, XVIII IMEKO World Congress, September 2006.
- The Limits of Image-based Optical Overlay Metrology, Proceedings of the 31st SPIE International Symposium: Microlithography 2006.
- NIST Special Publication 800-22: A Statistical Test Suite for the Validation of Random Number Generators and Pseudo Random Number Generators for Cryptographic Applications, October 2000.
- NIST Technical Report 6483: Randomness Testing of the Advanced Encryption Standard Finalist Candidates, April 2000.
- Randomness Testing of IBM Hardware Based Random Number Generators, NIST Internal Report, October 1999.
- Statistical Testing of Random Number Generators, Proceedings of the 22nd National Information Systems Security Conference, October 1999.
- NIST Technical Report 6390: Randomness Testing of the Advanced Encryption Standard Candidate Algorithms, September 1999.
- Adaptive Finite-difference Computations of Dendritic Growth Using a Phase-field Model, Modelling Simulation Materials Science Engineering, 5 (1997) 365-380.
- On the Optimization of Kinematics for Planar Robot Linkages, ASEL Technical Report #ROB9508, Applied Science & Engineering Laboratories, University of Delaware, Alfred I. DuPont Institute, Wilmington, DE, December 1995.
- Mathematica and the Newman-Penrose Formalism, Proceedings of the Workshop on Computational Physics, California State University at Fullerton, Editors: Mark Shapiro & Jim Feagin, June 1990.
Teaching Experience: Bachelor's Courses
- Recitation Classes in English, TRIO Program, UPR Humacao (Fall 1988)
- Recitation Classes in Mathematics, TRIO Program, UPR Humacao (Spring 1989)
- Developmental Math, Catonsville Community College (Fall 1993)
- Applied Algebra and Trigonometry, Catonsville Community College (Fall 1993)
- Business Math, Catonsville Community College (Fall 1993)
- Programming in BASIC, Catonsville Community College (Fall 1993)
- Statistical Methods, Catonsville Community College (Winter 1994)
- Developmental Math, Catonsville Community College (Spring 1994)
- Assembly Language Programming, Catonsville Community College (Spring 1994)
- Calculus I, Catonsville Community College (Summer 1994)
- CISC 120: Object Oriented Programming in C++, Univ. of Delaware (WS 1996)
- DBSEM Foundations of Database Systems Seminar, TU Berlin (WS 2019-2020)
- DBSEM Foundations of Database Systems Seminar, TU Berlin (SS 2020)
- DBSEM Foundations of Database Systems Seminar, TU Berlin (WS 2021-2022)
- DBSEM Database Systems Seminar (Data Cleaning), TU Berlin (SS 2021)
- DBSEM Database Systems Seminar (Data Stream Management), TU Berlin (SS 2021)
- DBSEM Database Systems Seminar (Data Cleaning), TU Berlin (WS 2021-2022)
- DBSEM Database Systems Seminar (Data Cleaning), TU Berlin (SS 2022)
- DBSEM Database Systems Seminar (Data Stream Management), TU Berlin (SS 2022)
- DBSEM Database Systems Seminar (Data Cleaning), TU Berlin (WS 2022-2023)
Teaching Experience: Master's Courses
- AIM-3 Scalable Data Science, TU Berlin (WS 2014-2015)
- AIM-3 Scalable Data Science, TU Berlin (SS 2015)
- AIM-3 Scalable Data Science, TU Berlin (WS 2015-2016)
- AIM-3 Scalable Data Science, TU Berlin (SS 2016)
- AIM-3 Scalable Data Science, TU Berlin (WS 2016-2017)
- AIM-3 Scalable Data Science, TU Berlin (SS 2017)
- AIM-3 Scalable Data Science, TU Berlin (WS 2017-2018)
- AIM-3 Scalable Data Science, TU Berlin (SS 2018)
- AIM-3 Scalable Data Science, TU Berlin (WS 2018-2019)
- AIM-3 Scalable Data Science, TU Berlin (SS 2019)
- BDASEM Big Data Analytics Seminar, TU Berlin (WS 2019-2020)
- BDASEM Big Data Analytics Seminar, TU Berlin (SS 2020)
- BDASEM Big Data Analytics Seminar, TU Berlin (WS 2021-2022)
- IMSEM Hot Topics in Information Management Seminar, TU Berlin (WS 2021-2022)
- IMSEM Hot Topics in Information Management Seminar, TU Berlin (WS 2022-2023)
Supervised Bachelor's Theses
- Design, Implementation, and Evaluation of Naive Bayes Classification in Apache Flink, Jonathan Hasenburg, 2015.
- Exploratory Data Analysis in Practice, Jessica Bongard, 2018.
- On the Problem of Data Visualization Method Selection: Challenges and a Proposed Solution, Maximilian Ksoll, 2019.
- An Analysis of Lightweight Cryptographic Algorithms, Nabil Douss, 2019.
- An Analysis of Randomness in Lightweight Cryptographic Algorithms Designed for Internet of Things Applications, Jan-Ulrich Holtgrave, 2020.
- On the Problem of Customer Lifetime Value Prediction, Klea Paci, 2020.
- On the Issue of Fairness in Machine Learning Methods, Florian Schuler, 2020.
- On the Evaluation of Software Quality in Machine Learning Libraries, Arthur Söhler, 2022
- An Empirical Study on the Capabilities and Limitations of Inclusion Dependency Discovery Algorithms for Data Profiling, Johanna Werhahn, 2022
- On Fairness and Bias in Data Science, Martin Manolov, 2023
- Data Transformations in Data Cleaning, David Michel, 2023
- Forecasting Challenges and Countermeasures, Enea Gurra, 2023
Supervised Master's Theses
- Identifier Namespaces in Mathematical Notation, Alexey Grigorev, 2015.
- Techniques for Hierarchical Product Classification, Marilyn Nowacka Barros, 2017.
- Anomaly Detection in Online Monitoring, Benjamin Pietrowicz, 2018.
- Visual Analytics for Moving Objects Databases, Fabricio Ferreira da Silva, 2021.
- Automating Data Quality Processes in Big Data Settings, Yalei Li, 2021.
- Towards Continual Learning in Language Modelling, Haroon Rashid, 2021.
- SentiStream: Towards Online Sentiment Learning of Massive Data Streams, Huilin Wu, 2022.
- Towards Real-time Fault Diagnosis: A Hybrid-model Approach, Yuan Li, 2022.
- On the Problem of Combining Models for Improved Anomaly Detection, Liying Yang, 2022.
- On the Problem of Software Quality in Machine Learning Systems, Haftamu Hailu Tefera, 2022.
The webpages of the TU Berlin (TUB) consists of numerous offers from various institution (e.g., faculties, institutes, chairs, central facilities, administrative facilities) and the personal pages of TUB employees. The institutions, and the persons who have created the webpages are editorially responsible, and responsible for the content. Please contact the responsible authors of a relevant page if you have questions.
Address of the TU Berlin:
TU Berlin, Die Präsidentin, Prof. Dr. Geraldine Rauch, Strasse des 17. Juni 135, 10623 Berlin, TU Berlin.
Editorial and Content Responsibility.
TU Berlin, Juan Soto, Database Systems and Information Management Group, Electrical Engineering and Computer Science (Faculty IV), Einsteinufer 17, 10587 Berlin, juan dot soto at tu-berlin dot de.
Legal Notices on Copyright.
The webpage layout, the graphics used, and all other content are protected by copyright.
Subject of Data Privacy.
Data privacy covers personal data. According to art. 4 par. 1 of DSGVO, these are data referring to an identified or identifiable individual, hence all data which could be used to identify you. This applies for data such as name, private address, e-mail address, telephone number but also to usage data such as your IP address. Of course, the DIMA group observes the legal requirements of data privacy and other applicable regulations. We are committed to ensure that you can trust us concerning your personal data. Therefore transfers of sensitive data are encrypted. In addition our webpages are protected against damage and unauthorized access by technical measures.
Data Collection and Storage.
For the usage of our website the registration of your personal data is not necessary in general.
Collection and Storage of Usage Data.
For the optimization of our websites we collect and store data, such as visited website, date and time of access, the website which you are coming from and so on for a period of two weeks. After that they will be deleted automatically.
Right of Access to Personal Data.
You can retrieve information about your stored personal data without giving reasons at any time free of charge. Please contact the address provided below. We will be pleased to assist you if you have any further questions about our data privacy information. Please note that data privacy regulations and handling of data privacy can change from time to time making it necessary to inform oneself about changes of data privacy laws and company policies. This data privacy statement only applies for content of TU Berlin and DIMA webservers which provide this data privacy statement and does not cover linked websites of external webservers. We invite you to contact us if you have questions about this policy at: TU Berlin, Juan Soto, Database Systems and Information Management Group, Electrical Engineering and Computer Science (Faculty IV), Einsteinufer 17, 10587 Berlin, juan dot soto at tu-berlin dot de.
Editorial and Content Responsibility.
datenschutz at dima dot tu-berlin dot de