Data architects create blueprints for data management systems. After assessing a company’s potential data sources (internal and external), architects design a plan to integrate, centralize, protect and maintain them. This allows employees to access critical information in the right place, at the right time.
Southern Methodist University
University of California, Berkeley
Data Architect Responsibilities
A data architect may be required to:
- Collaborate with IT teams and management to devise a data strategy that addresses industry requirements
- Build an inventory of data needed to implement the architecture
- Research new opportunities for data acquisition
- Identify and evaluate current data management technologies
- Create a fluid, end-to-end vision for how data will flow through an organization
- Develop data models for database structures
- Design, document, construct and deploy database architectures and applications (e.g. large relational databases)
- Integrate technical functionality (e.g. scalability, security, performance, data recovery, reliability, etc.)
- Implement measures to ensure data accuracy and accessibility
- Constantly monitor, refine and report on the performance of data management systems
- Meld new systems with existing warehouse structures
- Produce and enforce database development standards
- Maintain a corporate repository of all data architecture artifacts and procedures
You may not be surprised to hear that this is a difficult job. Some companies need data architects who are ninjas in data modeling techniques; others may want experts in data warehousing, ETL tools, SQL databases or data administration. Data architects are likely to be senior-level employees with plenty of years in business intelligence under their belts.
How to Become a Data Architect
1. Pursue a degree in computer science, computer engineering or a related field.
To become a data architect, you should start with a bachelor’s degree in computer science, computer engineering or a related field. Coursework should include coverage of data management, programming, big data developments, systems analysis and technology architectures. For senior positions, a master’s degree is usually preferred.
The key aspect of your employment application may be experience. Top employers are likely to expect job candidates to have spent at least five years dealing with application architecture, network management and performance management.
2. Develop and grow in your technical and business skills from data mining to analytical problem solving.
Technical Skills for Data Architects
- Application server software (e.g. Oracle)
- Database management system software (e.g. Microsoft SQL Server)
- User interface and query software (e.g. IBM DB2)
- Enterprise application integration software (e.g. XML)
- Development environment software
- Backup/archival software
- Agile methodologies and ERP implementation
- Predictive modeling, NLP and text analysis
- Data modeling tools (e.g. ERWin, Enterprise Architect and Visio)
- Data mining
- ETL tools
- Python, C/C++ Java, Perl
- UNIX, Linux, Solaris and MS Windows
- Hadoop and NoSQL databases
- Machine learning
- Data visualization
As always, this list is subject to changes in technology.
Business Skills for Data Architects
- Analytical Problem-Solving: Approaching high-level data challenges with a clear eye on what is important; employing the right approach/methods to make the maximum use of time and human resources.
- Effective Communication: Carefully listening to management, data analysts and relevant staff to come up with the best data design; explaining complex concepts to non-technical colleagues.
- Expert Management: Effectively directing and advising a team of data modelers, data engineers, database administrators and junior architects.
- Industry Knowledge: Understanding the way your chosen industry functions and how data are collected, analyzed and utilized; maintaining flexibility in the face of big data developments.
3. Consider additional certifications and further learning.
There are several opportunities to expand your expertise and learning as a data architect from companies such as IBM, Salesforce and Hortonworks.When in doubt, consult your mentors, examine recent job descriptions and check out similar articles to Tom’s IT Pro Best Database Certifications to decide which acronyms are worth your time and money.
Certified Data Management Professional (CDMP)
Developed by the Data Management Association International (DAMA), the CDMP could be the most frequently listed certification on data architect’s résumés. Since it doesn’t focus on a particular platform or vendor, it’s a solid credential for general database professionals.
The CDMP is offered at four levels – associate, practitioner, master and fellow – and awarded to candidates who provide evidence of education, experience and passing results on the CDMP’s professional knowledge exam. Proof of continuing education and professional activity is required to re-certify.
Hortonworks Data Flow Certified NiFi Architect (HDFCNA)
Before the completion of an exam reviewing data flow design, cluster management and security and configuration management, candidates for the HDFCNA credential should be familiar with system properties, UI components, NiFi connections, controller configurations, process and remote groups, NiFi expression language and other skills outlined in Hortonwork’s objectives.
IBM Certified Data Architect – Big Data
This IBM professional certification program requires that candidates possess a myriad of prerequisite skills from understanding cluster management and data replication to data lineage and LDAP security. The exam for the Certified Data Architect also focuses on BigInsights, BigSQL, Hadoop, and Cloudant.
Salesforce Certified Data Architecture and Management Designer
Designed for candidates with five or more years of experience working with the Force.com platform, the data architecture and management designer certification exam tests understanding of large data volume risks and mitigation strategies, LDV considerations, best practices in a LDV environment, design trade-offs and other skills.
Simplilearn’s Big Data Hadoop Architect Masters Program
The master’s certification program offered by Simplilearn is designed to help expand your skills and understanding of data model creation, database interfaces, SparkSQL, Scala, RDD, replication, scalability, and Hadoop clusters among others. In a recommended path of about 21 weeks, candidates will learn from instructors and self-led methods by participating in projects and lab experiences.
TOGAF® 9 Certification Program
The TOGAF Professional Certification occurs in two paths: foundation then certification. The foundation portion of this credential is to ensure that candidates have assessed knowledge of terms and basic concepts of TOGAF 9 and the core principles of enterprise architecture and TOGAF.
An Interview with a Real Data Architect
We spoke with Craig Statchuk, Big Data Architect at IBM, to learn more about the responsibilities of data architects. Below, Craig discusses the pros and cons of his job, how the data architect position has changed over time, and his advice for students interested in becoming data architects.
A: The good part is you start most days in the new big data world. This includes everything within the role of a big data architect – someone who fulfills the needs of the entire enterprise beyond IT. In effect, this role is about taking care of more users in more places. Therefore, the pro is that most days you’ll start with a clean slate. You may not know what the day holds for you, but by lunchtime, you’ll have a long list of things to work on, to create, and hopefully resolve in a short period of time. There’s a lot of value placed on immediate results. Instead of, “let’s come up with a great design or a great, long-term vision necessarily,” it’s more like, ‘I want to see the answers now.”
The con is that it doesn’t quite work that way. You have to keep the long-term vision in place because you’ll see more and more of the same kinds of questions and the same requests. So the architecture that allows you to respond quickly to a wide variety of questions will serve you very well.
There’s a tendency to do things quick and easy, but the truth is you need time to think and really plan, and time is a scarce resource in a typical workday. So finding time to think about what you’ve done right and how to move forward is really the secret to doing the job well. But right now, it’s the push and a pull that makes the job difficult.
A: That’s exactly it. Everything is short term. It’s the “I need it yesterday” mentality. The problem with this is that it leaves little room to focus on quality or other issues, such as governance or the idea that you need to serve not only your users but also the business. Those two are always in opposition because the users want things now and the business wants things done right. You have to balance the two.
A: The role is changing, and it’s growing quickly. A data scientist 10 years ago built the data warehouses and conducted the analysis in a constrained way. Nowadays, we’re seeing less demand for that, although it still exists in the office of finance, for example, since finance can easily categorize and quantify values according to accounting standards. The rest of the business wants the same kind of analysis so they can look at their data in a variety of ways, but they don’t necessarily have the rigidity or the structure necessary for that.
So what we’re seeing is the classic hype cycle with greater demand for new ways of looking at data. The truth is it’s not going to pan out as well as we hope. There’s going to be some disillusionment or dissatisfaction with the initial results. But at the end of the day, or let’s say at the end of two years, you will have an organization that is agile, able to answer more questions about the business and its customers, and more knowledgeable about what can be accomplished quicker and more accurately than they are today.
More data actually doesn’t make us smarter if we don’t have the ability to consume it. In some ways, it actually makes us less knowledgeable.
For instance, if you have more data, relatively speaking, but you lack the ability to process and understand it, then you know less than you used to. The way to combat that is to have systems that can adapt to the new data, understand and categorize it, and deliver it to more users quicker than before. That lets you turn the tide against big data. Without the proper resources in place, big data can result in significant confusion, but if it’s well-organized and well-provisioned, it can be the source to greater understanding.
So you have to balance that every day, which involves figuring out how to make the data more reusable.
A: Traditionally, we came from the world of Java and structured, traditional languages such as that. This was the language of the server, and we could it use with minimal modifications on the browser; so heritage played a big part in moving us in that direction. But nowadays, we’re seeing movement towards more flexible languages, more data and more statistically oriented language. My personal favorite is Python because it lets me be a computer scientist and have access to the statistics and other analytics that I need. Other people look at using both SPSS and languages such as R on a regular basis because they provide strong statistical packages, and the programming is often much easier and more accessible.
A: I graduated with a mathematics degree in 1983 from the University of Waterloo. At that time, we were very much into data structures, programming, and databases. We were taught things like first normal form and third normal form. This was standard for a good 20 years. We did structured programming and dealt with structured data. We worked with databases in a way that made users happy, and we answered certain types of questions very well. As we move forward, however, that data structure hasn’t been serving us as well.
For instance, we weren’t always able to understand just how quickly data was changing, and it took us a while before realizing that our way of processing could no longer keep up. So what we’ve arrived at is something that I can only call the new normal form. It represents data that’s just good enough and clean enough but not perfect. This is different from the past way of doing it, in which one piece of data points to other pieces of data, which then would lead us back to our understanding or something related to different pieces of data. Today, we have plenty of look-up tables and other things that help serve the business. The problem with those traditional data structures was that they didn’t allow us to answer enough questions.
The domain of questions that they could answer was really limited. But now we’ve come full circle with these giant spreadsheets. They represent rows and columns of data with lots of gaps, many inconsistencies, and lots and lots of columns. The new tools enable us to build good queries and build data that’s even more reliable than the stuff we used to ETO and put into those structured forms that I already mentioned.
The new solution is to take data and make it as reusable and as accurate as possible without sacrificing flexibility. That becomes a new role with a new focus. Now, I have to ask myself, “How do I produce data with maximum reusability within the company while also making it as accurate and as important as possible for the part of the business responsible for the systems of record that run the business?” You still have to service those, but now we have to service what we call “systems of engagement,” which is how we understand our customers, our employees, and even the products we build.
A: Nowadays, data architects come with many different skills and backgrounds. For instance, unlike 20 years ago, a pure data or computer scientist background may not be as helpful. The new skills are to understand the needs of the user so that you can build data and systems that will answer their problems now and in the future. We have to become proactive, and I liken the job to that of an inventor. In other words, we have to invent the solutions that users are going to ask for, not necessarily tomorrow but six month from now or even two years from now. That requires someone who is both innovative and able to focus on the task today but also someone who can look into the future and say, “Hey, here’s what they’re likely going to ask about in two years. How do I create my data? How do I create my business to serve me better down the road?”
We don’t want to be chasing wild geese, but we want to be able to predict and do the kind of processing that users are going to need down the road. That’s a difficult job to do. This opens up opportunities for skilled candidates with a variety of backgrounds. For instance, you could have an art or a business degree, and then you can come in to the technical side of the business. In fact, that may be the best possible way to get the broad understanding of the business and then the ability to actually execute it.
A: So I think the best piece of advice I can give is to become an expert. Become the absolute best you can be at a particular field of interest. I don’t care if that’s accounting, psychology, or data management. In five years, your job will be totally different than it is today. And you’re going to have to learn new skills all over again. The only way to survive is to anticipate that you’ll have to become an expert in a new set of skills in order to meet future demands. I need to do that every few years in my career. You have to get used to it, and you have to get good at it.
The ability to turn your career towards the next big thing, whether it’s Hadoop, Spark or maybe data preparation for a line of business, is essential. You’re going to have to understand that, and you’re going to have to help the people who need to do these functions by being the expert they can trust.
A: I think the data field is exploding, but our ability to process it is falling behind. The solution for the industry is scalability. However, I don’t think it’s in the way that we traditionally think about throwing more hardware at the problem. We need the ability to share and collaborate more so that everyone in a business participates. This allows us to divide the efforts and move in a common direction.
But I look at data science today, and I see we have individual people doing the entire analysis from start to finish – gathering the data, cleaning it, doing the analytics, creating the visual innovation and presenting the results. The problem standing in the way of scalability is that, for each step along the way that I get that data science activity, all that knowledge is lost as soon as I complete my task. We need to move the industry towards sharing the work and sharing the role so that one person produces a reusable data set, the next person produces reusable analytics on top of it, and then we can all present the results more quickly and accurately.
I think this funnel is going to eventually constrict to the point that we will have to do something better. However, hiring more data scientists isn’t necessarily the answer. Creating more processing power or even SPARC 2 [SP], which gives us more processing power than we ever imagined, isn’t part of the solution, either. Rather, it involves getting teams of people to move each processing step forward, and then reusing that work so we don’t keep reinventing the wheel and doing everything from the ground up.
We used to say, in business analytics, you had a single version of the truth. Now we have to move towards the best-supported, most acceptable version of the truth, which allows us to get to the truth quicker and easier. As a team, we are providing the data and then sharing the results so that we can all reuse them and turn them into answers for the benefit of the business.
2018 Data Architect Average Salaries
Home to Silicon Valley, San Francisco tops the list of best-paying cities for data architects. According to information retrieved on January 30, 2018 on PayScale, the median pay for data architects is $112,597.. San Francisco offers an average salary of $171,914, 45% above the national average as of January 2018. Washington and New York are runners-up with about 30% or more median pay than the national average.
Data Architect Salary for 2018: How much does a data architect make?
Average Salary: $112,764 per year
Median Salary: $112,597 per year
Total Pay Range: $73,801 – $152,539
Senior Data Architect
Average Salary: $121,693 per year
Note: Salary information from Glassdoor and PayScale was retrieved as of January 2018.
Jobs Similar to Data Architect
You can take a variety of paths to become a data architect. Folks may get their start working as Database Administrators (DBAs) or entry-level programmers. By concentrating on the day-to-day tasks involved with data management (e.g. installation, upgrades, back-up and recovery, etc.), DBAs gain an understanding of how data are stored and used.
Perhaps the closest job to an architect is a Data Engineer. When reviewing the two careers, we can see that architects and engineers approach their work with data differently. Architects develop the architecture that captures, integrates, organizes, centralizes and maintains data. Engineers engage in development, testing and maintenance to keep that data accessible and primed for analysis.
Data architects do not analyze data. Instead, they make it available to others. If you’re interested in playing in the analyst sandbox, you could consider becoming a:
Data Architect Jobs
Database Administrators, a close position to data architecture, expects an eleven percent increase in jobs from 2016-2026, according to the Bureau of Labor Statistics. With the introduction of data structure designs, business communities began to recognize the value in how data was structured over programs. In A Brief History of Data Architecture: Shifting Paradigms, after the development of SQL in the 1980’s, companies began to provide tools and software like Oracle Development and PowerBuilder to accompany and support data architecture. .
In the past, building back-end data management systems may have been relatively straightforward. Architects might set up a warehouse, structure and consolidate information into an SQL database and make data available to individual departments. Job done.
As information floods the market, analysts are likely to demand access to all kinds of unstructured data (e.g. audio, video, text) that could help in making business decisions. This leaves architects with the task of mixing new technologies (e.g. Hadoop) with existing relational databases to create flexible infrastructures that are cost-effective and secure.
As Dip Kharod notes, big data architects should ask themselves:
“How do I build a platform that provides just enough information in the hands of the business to make timely decisions while processing a massive amount of data that allows advanced analytics to answer never-before-asked questions in a secure environment?”.
Professional Organizations for Data Architects
- International Data Management Association (DAMA)
- Institute for Certified Computing Professionals (ICCP)
- The Data Warehousing Institute (TDWI)