第12章。数据系统的未来
Chapter 12. The Future of Data Systems
If a thing be ordained to another as to its end, its last end cannot consist in the preservation of its being. Hence a captain does not intend as a last end, the preservation of the ship entrusted to him, since a ship is ordained to something else as its end, viz. to navigation.
如果一件事情被指定为另一件事的目标,则它的最终目标不能在保护其存在中。因此,船长不会将委托给他的船的保护作为最后的目标,因为一艘船被指定为其它的目标,即航行。
(Often quoted as: If the highest aim of a captain was the preserve his ship, he would keep it in port forever.)
如果一个船长的最高目标是保护他的船只,那么他会永远把它停靠在港口。
St. Thomas Aquinas, Summa Theologica (1265–1274)
圣托马斯·阿奎那,《神学大全》(1265-1274年)
So far, this book has been mostly about describing things as they are at present. In this final chapter, we will shift our perspective toward the future and discuss how things should be : I will propose some ideas and approaches that, I believe, may fundamentally improve the ways we design and build applications.
迄今为止,这本书大多描述了当前情况。在这个最后的章节中,我们将转向未来,并讨论事物应该如何:我将提出一些想法和方法,我相信这些想法和方法可以基本上改善我们设计和构建应用程序的方式。
Opinions and speculation about the future are of course subjective, and so I will use the first person in this chapter when writing about my personal opinions. You are welcome to disagree with them and form your own opinions, but I hope that the ideas in this chapter will at least be a starting point for a productive discussion and bring some clarity to concepts that are often confused.
关于未来的看法和猜测当然是主观的,因此本章节中我将使用第一人称写下我个人的观点。您可以不同意我的观点,形成自己的看法,但我希望本章节中的一些想法可以作为有建设性的讨论的起点,并为经常令人困惑的概念带来一些清晰度。
The goal of this book was outlined in Chapter 1 : to explore how to create applications and systems that are reliable , scalable , and maintainable . These themes have run through all of the chapters: for example, we discussed many fault-tolerance algorithms that help improve reliability, partitioning to improve scalability, and mechanisms for evolution and abstraction that improve maintainability. In this chapter we will bring all of these ideas together, and build on them to envisage the future. Our goal is to discover how to design applications that are better than the ones of today—robust, correct, evolvable, and ultimately beneficial to humanity.
本书的目标在第1章中已经概述:探索如何创建可靠、可扩展和易于维护的应用程序和系统。这些主题贯穿了所有章节:例如,我们讨论了许多故障容错算法,以提高可靠性,分区以提高可扩展性,以及提高可维护性的进化和抽象机制。在本章中,我们将把所有这些想法汇集起来,并在其基础上构建未来愿景。我们的目标是发现如何设计比今天更好的应用程序-坚固、正确、可发展,并最终有益于人类。
Data Integration
A recurring theme in this book has been that for any given problem, there are several solutions, all of which have different pros, cons, and trade-offs. For example, when discussing storage engines in Chapter 3 , we saw log-structured storage, B-trees, and column-oriented storage. When discussing replication in Chapter 5 , we saw single-leader, multi-leader, and leaderless approaches.
这本书的一个重复出现的主题是,对于任何给定的问题,都有几种解决方案,它们都有不同的优点、缺点和权衡。例如,在第3章讨论存储引擎时,我们看到了日志结构化存储、B树和列式存储。在第5章讨论复制时,我们看到了单个领导者、多个领导者和无领导者方法。
If you have a problem such as “I want to store some data and look it up again later,” there is no one right solution, but many different approaches that are each appropriate in different circumstances. A software implementation typically has to pick one particular approach. It’s hard enough to get one code path robust and performing well—trying to do everything in one piece of software almost guarantees that the implementation will be poor.
如果你有一个问题,比如“我想存储一些数据并在以后查找它们”,那么就没有一个正确的解决方案,而是有许多不同的方法,每种方法在不同的情况下都是合适的。软件实现通常必须选择一种特定的方法。让一个代码路径变得强大并且性能良好已经足够困难了 - 尝试在一个软件中做所有事情几乎保证实现会很差。
Thus, the most appropriate choice of software tool also depends on the circumstances. Every piece of software, even a so-called “general-purpose” database, is designed for a particular usage pattern.
因此,在选择软件工具时,最合适的选择也取决于情况。每个软件,即使是所谓的“通用”数据库,都是为特定的使用模式而设计的。
Faced with this profusion of alternatives, the first challenge is then to figure out the mapping between the software products and the circumstances in which they are a good fit. Vendors are understandably reluctant to tell you about the kinds of workloads for which their software is poorly suited, but hopefully the previous chapters have equipped you with some questions to ask in order to read between the lines and better understand the trade-offs.
面对如此繁多的选择,第一个挑战就是要找出软件产品与其适用的情况之间的映射关系。厂商很自然地不愿意告诉您他们的软件不适用于哪种工作负载,但是希望前面的章节已经为您提供了一些问题,以便更好地阅读信息并了解权衡。
However, even if you perfectly understand the mapping between tools and circumstances for their use, there is another challenge: in complex applications, data is often used in several different ways. There is unlikely to be one piece of software that is suitable for all the different circumstances in which the data is used, so you inevitably end up having to cobble together several different pieces of software in order to provide your application’s functionality.
然而,即使你完全理解工具和使用环境之间的映射关系,还有另一个挑战:在复杂应用中,数据通常以几种不同的方式使用。不太可能有一款软件适用于数据使用的所有不同情况,因此你最终不得不将几个不同的软件组合在一起,以提供应用程序的功能。
Combining Specialized Tools by Deriving Data
For example, it is common to need to integrate an OLTP database with a full-text search index in order to handle queries for arbitrary keywords. Although some databases (such as PostgreSQL) include a full-text indexing feature, which can be sufficient for simple applications [ 1 ], more sophisticated search facilities require specialist information retrieval tools. Conversely, search indexes are generally not very suitable as a durable system of record, and so many applications need to combine two different tools in order to satisfy all of the requirements.
例如,通常需要将OLTP数据库与全文搜索索引集成,以处理任意关键字的查询。虽然一些数据库(如PostgreSQL)包括全文索引功能,可以满足简单应用程序的需求,但更复杂的搜索功能需要专业的信息检索工具。相反,搜索索引通常不适合作为持久的记录系统,因此许多应用程序需要结合两个不同的工具以满足所有要求。
We touched on the issue of integrating data systems in “Keeping Systems in Sync” . As the number of different representations of the data increases, the integration problem becomes harder. Besides the database and the search index, perhaps you need to keep copies of the data in analytics systems (data warehouses, or batch and stream processing systems); maintain caches or denormalized versions of objects that were derived from the original data; pass the data through machine learning, classification, ranking, or recommendation systems; or send notifications based on changes to the data.
我们提及了“保持系统同步”中的数据系统集成问题。随着数据的不同表示增加,集成问题变得更加困难。除了数据库和搜索索引,可能需要在分析系统(数据仓库或批处理和流处理系统)中保留数据副本;维护缓存或由原始数据派生的对象的非正规化版本;通过机器学习、分类、排名或推荐系统对数据进行处理;或基于数据变化发送通知。
Surprisingly often I see software engineers make statements like, “In my experience, 99% of people only need X” or “…don’t need X” (for various values of X). I think that such statements say more about the experience of the speaker than about the actual usefulness of a technology. The range of different things you might want to do with data is dizzyingly wide. What one person considers to be an obscure and pointless feature may well be a central requirement for someone else. The need for data integration often only becomes apparent if you zoom out and consider the dataflows across an entire organization.
令人惊讶的是,我经常看到软件工程师会说:“根据我的经验,99%的人只需要X”或者“不需要X”(X的值是不同的)。我认为这样的说法更多地反映了说话者的经验,而非技术的实际可用性。数据可能有无限多种可能的用途。对于一个人来说,某个看似晦涩无意义的特性,可能对于另一个人来说却是非常重要的需求。只有从整个组织的数据流程的角度来看,你才可能意识到数据集成的需求。
Reasoning about dataflows
When copies of the same data need to be maintained in several storage systems in order to satisfy different access patterns, you need to be very clear about the inputs and outputs: where is data written first, and which representations are derived from which sources? How do you get data into all the right places, in the right formats?
当需要在多个存储系统中维护相同数据的副本以满足不同的访问模式时,您需要非常清楚输入和输出:首先在哪里写入数据,哪些表示来自哪些源?如何以正确的格式将数据放入所有正确的位置?
For example, you might arrange for data to first be written to a system of record database, capturing the changes made to that database (see “Change Data Capture” ) and then applying the changes to the search index in the same order. If change data capture (CDC) is the only way of updating the index, you can be confident that the index is entirely derived from the system of record, and therefore consistent with it (barring bugs in the software). Writing to the database is the only way of supplying new input into this system.
例如,您可以安排数据首先写入记录系统数据库,捕获对该数据库所做的更改(请参见“更改数据捕获”),然后按相同顺序将更改应用于搜索索引。如果更改数据捕获(CDC)是更新索引的唯一方法,则可以确信索引完全源自记录系统,并且因此与其一致(除软件中的错误)。写入数据库是向该系统提供新输入的唯一方法。
Allowing the application to directly write to both the search index and the database introduces the problem shown in Figure 11-4 , in which two clients concurrently send conflicting writes, and the two storage systems process them in a different order. In this case, neither the database nor the search index is “in charge” of determining the order of writes, and so they may make contradictory decisions and become permanently inconsistent with each other.
允许应用程序直接向搜索索引和数据库写入会引入问题,如图11-4所示,其中两个客户端同时发送冲突写入请求,并且两个存储系统以不同的顺序处理它们。在这种情况下,数据库和搜索索引都无法确定写入的顺序,因此它们可能会做出相互矛盾的决定,并且永久地与彼此不一致。
If it is possible for you to funnel all user input through a single system that decides on an ordering for all writes, it becomes much easier to derive other representations of the data by processing the writes in the same order. This is an application of the state machine replication approach that we saw in “Total Order Broadcast” . Whether you use change data capture or an event sourcing log is less important than simply the principle of deciding on a total order.
如果可能的话,您可以将所有用户输入通过单个系统漏斗,该系统决定所有写入的顺序,这样通过以相同的顺序处理写入,就更容易推导出数据的其他表示形式。这是我们在“总序列广播”中看到的状态机复制方法的应用。无论您使用更改数据捕获还是事件源日志,决定总顺序的原则比具体方法更为重要。
Updating a derived data system based on an event log can often be made deterministic and idempotent (see “Idempotence” ), making it quite easy to recover from faults.
更新基于事件日志的派生数据系统通常可以使其具有确定性和幂等性(请参见“幂等性”),从而使其非常容易从故障中恢复。
Derived data versus distributed transactions
The classic approach for keeping different data systems consistent with each other involves distributed transactions, as discussed in “Atomic Commit and Two-Phase Commit (2PC)” . How does the approach of using derived data systems fare in comparison to distributed transactions?
使用派生数据系统的方法与分布式事务相比如何表现?经典的保持不同数据系统之间一致性的方法涉及分布式事务,如“原子提交和二阶段提交(2PC)”中所讨论的。
At an abstract level, they achieve a similar goal by different means. Distributed transactions decide on an ordering of writes by using locks for mutual exclusion (see “Two-Phase Locking (2PL)” ), while CDC and event sourcing use a log for ordering. Distributed transactions use atomic commit to ensure that changes take effect exactly once, while log-based systems are often based on deterministic retry and idempotence.
在抽象层面上,它们不同的方式实现了相似的目标。分布式事务通过使用锁进行互斥来确定写入的顺序(参见“两阶段锁定(2PL)”),而CDC和事件源则使用日志进行排序。分布式事务使用原子提交来确保更改只执行一次,而基于日志的系统通常基于确定性重试和幂等性。
The biggest difference is that transaction systems usually provide linearizability (see “Linearizability” ), which implies useful guarantees such as reading your own writes (see “Reading Your Own Writes” ). On the other hand, derived data systems are often updated asynchronously, and so they do not by default offer the same timing guarantees.
最大的区别在于事务系统通常提供线性化(参见“线性化”),这意味着有用的保证,例如读取您自己的写入(参见“读取您自己的写入”)。另一方面,派生数据系统经常是异步更新的,因此它们默认情况下不提供相同的定时保证。
Within limited environments that are willing to pay the cost of distributed transactions, they have been used successfully. However, I think that XA has poor fault tolerance and performance characteristics (see “Distributed Transactions in Practice” ), which severely limit its usefulness. I believe that it might be possible to create a better protocol for distributed transactions, but getting such a protocol widely adopted and integrated with existing tools would be challenging, and unlikely to happen soon.
在愿意支付分布式事务成本的有限环境中,它们已经成功地被使用。然而,我认为XA协议具有较差的容错性和性能特征(见“实践中的分布式事务”), 这严重限制了它的实用性。我相信可能有可能创建一个更好的分布式事务协议,但是让这样的协议被广泛接受并与现有工具集成将是具有挑战性的,并且不太可能很快发生。
In the absence of widespread support for a good distributed transaction protocol, I believe that log-based derived data is the most promising approach for integrating different data systems. However, guarantees such as reading your own writes are useful, and I don’t think that it is productive to tell everyone “eventual consistency is inevitable—suck it up and learn to deal with it” (at least not without good guidance on how to deal with it).
缺乏广泛支持的良好分布式事务协议,我认为基于日志的派生数据是集成不同数据系统的最有前途的方法。然而,像读取自己的写入这样的保证很有用,我认为告诉每个人“最终一致性是不可避免的-拼命忍受它并学会处理它”并不是有成效的(至少没有好的指导如何处理它)。
In “Aiming for Correctness” we will discuss some approaches for implementing stronger guarantees on top of asynchronously derived systems, and work toward a middle ground between distributed transactions and asynchronous log-based systems.
在“追求正确性”的讨论中,我们将讨论一些方法,以实现在异步派生系统的基础上提供更强的保证,并朝着分布式事务和异步日志基础系统之间的中间地带努力。
The limits of total ordering
With systems that are small enough, constructing a totally ordered event log is entirely feasible (as demonstrated by the popularity of databases with single-leader replication, which construct precisely such a log). However, as systems are scaled toward bigger and more complex workloads, limitations begin to emerge:
随着系统变得越来越大、工作负载变得更加复杂,构建完全有序的事件日志的限制就开始显现(正如单主复制数据库的流行所证明的那样)。但是,如果系统足够小,构建完全有序的事件日志是完全可行的。
-
In most cases, constructing a totally ordered log requires all events to pass through a single leader node that decides on the ordering. If the throughput of events is greater than a single machine can handle, you need to partition it across multiple machines (see “Partitioned Logs” ). The order of events in two different partitions is then ambiguous.
在大多数情况下,构建一个完全有序的日志需要所有事件经过一个决定顺序的领导节点。如果事件吞吐量大于一个机器的处理能力,您需要将其分区到多个机器上(请参阅“分区日志”)。然后,两个不同分区中的事件顺序就是不确定的。
-
If the servers are spread across multiple geographically distributed datacenters, for example in order to tolerate an entire datacenter going offline, you typically have a separate leader in each datacenter, because network delays make synchronous cross-datacenter coordination inefficient (see “Multi-Leader Replication” ). This implies an undefined ordering of events that originate in two different datacenters.
如果服务器分布在多个地理分布的数据中心中,例如为了容忍整个数据中心离线,通常每个数据中心都有一个单独的领导者,因为网络延迟使得跨数据中心同步协调低效(请参见“多领导者复制”)。这意味着来自两个不同数据中心的事件会有未定义的顺序。
-
When applications are deployed as microservices (see “Dataflow Through Services: REST and RPC” ), a common design choice is to deploy each service and its durable state as an independent unit, with no durable state shared between services. When two events originate in different services, there is no defined order for those events.
当应用程序部署为微服务(请参见“通过服务的数据流:REST和RPC”)时,常见的设计选择是将每个服务及其持久状态独立部署为一个独立单元,没有服务之间共享的持久状态。当两个事件来自不同的服务时,没有定义这些事件的顺序。
-
Some applications maintain client-side state that is updated immediately on user input (without waiting for confirmation from a server), and even continue to work offline (see “Clients with offline operation” ). With such applications, clients and servers are very likely to see events in different orders.
一些应用程序会在用户输入时立即更新客户端状态(无需等待服务器确认),甚至可以离线操作(参见“拥有离线操作的客户端”)。对于这样的应用程序,客户端和服务器很有可能看到不同顺序的事件。
In formal terms, deciding on a total order of events is known as total order broadcast , which is equivalent to consensus (see “Consensus algorithms and total order broadcast” ). Most consensus algorithms are designed for situations in which the throughput of a single node is sufficient to process the entire stream of events, and these algorithms do not provide a mechanism for multiple nodes to share the work of ordering the events. It is still an open research problem to design consensus algorithms that can scale beyond the throughput of a single node and that work well in a geographically distributed setting.
在正式术语中,决定事件的完全顺序被称为完全顺序广播,它等同于共识(见“共识算法和完全顺序广播”)。大多数共识算法都是为单个节点的吞吐量足够处理整个事件流的情况而设计的,这些算法不提供多个节点共享排序事件的机制。设计能够扩展到单个节点吞吐量之外并在地理分布设置中良好运行的共识算法仍然是一个开放的研究问题。
Ordering events to capture causality
In cases where there is no causal link between events, the lack of a total order is not a big problem, since concurrent events can be ordered arbitrarily. Some other cases are easy to handle: for example, when there are multiple updates of the same object, they can be totally ordered by routing all updates for a particular object ID to the same log partition. However, causal dependencies sometimes arise in more subtle ways (see also “Ordering and Causality” ).
如果事件之间没有因果联系,缺乏完全排序并不是一个大问题,因为并发事件可以任意排序。有些其他情况很容易处理:例如,当有多个相同对象的更新时,它们可以通过将特定对象ID的所有更新路由到同一日志分区来进行完全排序。但是,因果依赖关系有时以更微妙的方式出现(也请参见“排序和因果性”)。
For example, consider a social networking service, and two users who were in a relationship but have just broken up. One of the users removes the other as a friend, and then sends a message to their remaining friends complaining about their ex-partner. The user’s intention is that their ex-partner should not see the rude message, since the message was sent after the friend status was revoked.
例如,考虑一个社交网络服务和两个曾经恋爱关系但已经分手的用户。其中一个用户将另一个用户从好友列表中删除,然后向其余的朋友发送一条抱怨前任的信息。该用户的意图是让其前任看不到这条粗鲁的信息,因为这条信息是在好友状态被撤销后发送的。
However, in a system that stores friendship status in one place and messages in another place, that ordering dependency between the unfriend event and the message-send event may be lost. If the causal dependency is not captured, a service that sends notifications about new messages may process the message-send event before the unfriend event, and thus incorrectly send a notification to the ex-partner.
然而,在一个将友谊状态存储在一个地方,而将消息存储在另一个地方的系统中,未友事件和消息发送事件之间的排序依赖关系可能会丢失。如果未捕捉到因果依赖关系,一个发送有关新消息的通知的服务可能会在未友事件之前处理消息发送事件,从而错误地向前任发送通知。
In this example, the notifications are effectively a join between the messages and the friend list, making it related to the timing issues of joins that we discussed previously (see “Time-dependence of joins” ). Unfortunately, there does not seem to be a simple answer to this problem [ 2 , 3 ]. Starting points include:
在这个例子中,通知实际上是消息和好友列表之间的连接,这使它与我们之前讨论过的连接的时序问题相关[参见“连接的时间依赖性”]。不幸的是,这个问题似乎没有简单的答案[2,3]。起点包括:
-
Logical timestamps can provide total ordering without coordination (see “Sequence Number Ordering” ), so they may help in cases where total order broadcast is not feasible. However, they still require recipients to handle events that are delivered out of order, and they require additional metadata to be passed around.
逻辑时间戳可以在不需要协调的情况下提供完全的排序(参见“序列号排序”),因此它们可能有助于在无法实现完全排序广播的情况下。但是,它们仍然需要接收方处理按顺序交付的事件,并需要传递附加的元数据。
-
If you can log an event to record the state of the system that the user saw before making a decision, and give that event a unique identifier, then any later events can reference that event identifier in order to record the causal dependency [ 4 ]. We will return to this idea in “Reads are events too” .
如果您可以记录一个事件以记录用户在做出决策之前看到的系统状态,并为该事件分配一个唯一标识符,那么任何以后的事件都可以引用该事件标识符来记录因果依赖关系[4]。我们将在“读取也是事件”中回到这个想法。
-
Conflict resolution algorithms (see “Automatic Conflict Resolution” ) help with processing events that are delivered in an unexpected order. They are useful for maintaining state, but they do not help if actions have external side effects (such as sending a notification to a user).
冲突解决算法(见“自动冲突解决”)有助于处理以出乎意料的顺序传递的事件。 它们对于维护状态很有用,但如果操作具有外部副作用(例如向用户发送通知),则它们无法帮助。
Perhaps, over time, patterns for application development will emerge that allow causal dependencies to be captured efficiently, and derived state to be maintained correctly, without forcing all events to go through the bottleneck of total order broadcast.
也许随着时间的推移,应用程序开发的模式将不断出现,可有效捕获因果依赖关系,并正确维护派生状态,而无需强制所有事件经过总顺序广播瓶颈。
Batch and Stream Processing
I would say that the goal of data integration is to make sure that data ends up in the right form in all the right places. Doing so requires consuming inputs, transforming, joining, filtering, aggregating, training models, evaluating, and eventually writing to the appropriate outputs. Batch and stream processors are the tools for achieving this goal.
数据集成的目标是确保数据以正确的形式出现在所有正确的位置。这需要消耗输入,转换、连接、过滤、汇总、训练模型、评估,最终写入适当的输出。批处理和流处理器是实现这个目标的工具。
The outputs of batch and stream processes are derived datasets such as search indexes, materialized views, recommendations to show to users, aggregate metrics, and so on (see “The Output of Batch Workflows” and “Uses of Stream Processing” ).
批处理和流处理的输出是派生数据集,例如搜索索引、实体视图、推荐展示给用户、聚合指标等(见“批处理工作流程的输出”和“流处理的用途”)。
As we saw in Chapter 10 and Chapter 11 , batch and stream processing have a lot of principles in common, and the main fundamental difference is that stream processors operate on unbounded datasets whereas batch process inputs are of a known, finite size. There are also many detailed differences in the ways the processing engines are implemented, but these distinctions are beginning to blur.
正如我们在第十章和第十一章所看到的那样,批量和流处理有很多共同的原则,主要的基本区别是流处理器处理无界数据集,而批处理输入是已知的有限大小。在处理引擎实现的方式上也有许多细节上的差异,但这些区别正在开始模糊化。
Spark performs stream processing on top of a batch processing engine by breaking the stream into microbatches , whereas Apache Flink performs batch processing on top of a stream processing engine [ 5 ]. In principle, one type of processing can be emulated on top of the other, although the performance characteristics vary: for example, microbatching may perform poorly on hopping or sliding windows [ 6 ].
Spark通过将流数据分解成微批次,利用批处理引擎进行流处理;而Apache Flink则是在流处理引擎上执行批处理[5]。理论上,可以在一个处理引擎上模拟另一种类型的处理,但性能特征不同:例如,微批处理在跳动或滑动窗口上的表现可能较差[6]。
Maintaining derived state
Batch processing has a quite strong functional flavor (even if the code is not written in a functional programming language): it encourages deterministic, pure functions whose output depends only on the input and which have no side effects other than the explicit outputs, treating inputs as immutable and outputs as append-only. Stream processing is similar, but it extends operators to allow managed, fault-tolerant state (see “Rebuilding state after a failure” ).
批处理具有相当强的功能风味(即使代码不是用函数式编程语言编写的):它鼓励确定性、纯函数,其输出仅取决于输入,除了显式输出之外没有副作用,将输入视为不可变的,将输出视为追加型。流处理类似,但它扩展了运算符以允许受控的、容错的状态(参见“失效后重建状态”)。
The principle of deterministic functions with well-defined inputs and outputs is not only good for fault tolerance (see “Idempotence” ), but also simplifies reasoning about the dataflows in an organization [ 7 ]. No matter whether the derived data is a search index, a statistical model, or a cache, it is helpful to think in terms of data pipelines that derive one thing from another, pushing state changes in one system through functional application code and applying the effects to derived systems.
确定性函数的原则不仅有利于容错性(见“幂等性”),而且简化了对组织中数据流的推理。无论派生数据是搜索索引、统计模型还是缓存,都有助于按照数据管道的方式考虑,从一个系统中推导出另一个系统中的状态变化,并通过应用功能代码将影响应用于派生系统。
In principle, derived data systems could be maintained synchronously, just like a relational database updates secondary indexes synchronously within the same transaction as writes to the table being indexed. However, asynchrony is what makes systems based on event logs robust: it allows a fault in one part of the system to be contained locally, whereas distributed transactions abort if any one participant fails, so they tend to amplify failures by spreading them to the rest of the system (see “Limitations of distributed transactions” ).
基本上,派生数据系统可以同步维护,就像关系型数据库在写入索引表时同步更新辅助索引一样。但是,异步是基于事件日志的系统的鲁棒性所在:它允许系统中某个部分的故障在本地得到控制,而分布式事务如果有任何一个参与方发生故障,则会中止,因此它们往往通过将故障扩散到系统的其余部分来放大故障(参见“分布式事务的局限性”)。
We saw in “Partitioning and Secondary Indexes” that secondary indexes often cross partition boundaries. A partitioned system with secondary indexes either needs to send writes to multiple partitions (if the index is term-partitioned) or send reads to all partitions (if the index is document-partitioned). Such cross-partition communication is also most reliable and scalable if the index is maintained asynchronously [ 8 ] (see also “Multi-partition data processing” ).
在“分区和二级索引”中,我们看到二级索引经常跨越分区边界。具有二级索引的分区系统需要将写操作发送到多个分区(如果索引是术语分区),或者将读操作发送到所有分区(如果索引是文档分区)。如果异步维护索引,则此类跨分区通信也是最可靠和可扩展的[8](另请参见“多分区数据处理”)。
Reprocessing data for application evolution
When maintaining derived data, batch and stream processing are both useful. Stream processing allows changes in the input to be reflected in derived views with low delay, whereas batch processing allows large amounts of accumulated historical data to be reprocessed in order to derive new views onto an existing dataset.
在维护派生数据时,批处理和流处理都很有用。流处理允许反映输入中的变化并快速生成派生视图,而批处理则可以重新处理大量已累积的历史数据,以便将新视图导出到现有数据集。
In particular, reprocessing existing data provides a good mechanism for maintaining a system, evolving it to support new features and changed requirements (see Chapter 4 ). Without reprocessing, schema evolution is limited to simple changes like adding a new optional field to a record, or adding a new type of record. This is the case both in a schema-on-write and in a schema-on-read context (see “Schema flexibility in the document model” ). On the other hand, with reprocessing it is possible to restructure a dataset into a completely different model in order to better serve new requirements.
特别是,重新处理现有数据提供了维护系统的良好机制,使其能够支持新特性和更改后的需求(见第四章)。若不重新处理,模式演进仅限于简单的更改,如向记录中添加一个新的可选字段或添加一种新类型的记录。这在模式写入和模式读取的上下文中都是如此(见“文档模型中的模式灵活性”)。另一方面,通过重新处理,可以将数据集重构为完全不同的模型,以更好地满足新需求。
Derived views allow gradual evolution. If you want to restructure a dataset, you do not need to perform the migration as a sudden switch. Instead, you can maintain the old schema and the new schema side by side as two independently derived views onto the same underlying data. You can then start shifting a small number of users to the new view in order to test its performance and find any bugs, while most users continue to be routed to the old view. Gradually, you can increase the proportion of users accessing the new view, and eventually you can drop the old view [ 10 ].
派生视图允许逐步演变。如果您想重组数据集,您不需要进行突然的迁移。相反,您可以将旧模式和新模式作为两个独立的派生视图,同时指向相同的基础数据。然后,您可以开始将少量用户转移到新视图以测试其性能并查找任何错误,而大多数用户继续被路由到旧视图。逐渐地,您可以增加访问新视图的用户比例,最终可以放弃旧视图[10]。
The beauty of such a gradual migration is that every stage of the process is easily reversible if something goes wrong: you always have a working system to go back to. By reducing the risk of irreversible damage, you can be more confident about going ahead, and thus move faster to improve your system [ 11 ].
这种逐步迁移的好处在于,如果出了问题,每个阶段的过程都可以很容易地逆转:你始终有一个可以回到正常工作状态的系统。通过降低不可逆损坏的风险,你可以更有信心地继续前进,从而更快地改进你的系统。
The lambda architecture
If batch processing is used to reprocess historical data, and stream processing is used to process recent updates, then how do you combine the two? The lambda architecture [ 12 ] is a proposal in this area that has gained a lot of attention.
如果批处理用于重新处理历史数据,而流处理用于处理最近的更新,那么如何将两者结合起来呢?Lambda架构是在这个领域提出的一个提案,已经引起了很多关注。
The core idea of the lambda architecture is that incoming data should be recorded by appending immutable events to an always-growing dataset, similarly to event sourcing (see “Event Sourcing” ). From these events, read-optimized views are derived. The lambda architecture proposes running two different systems in parallel: a batch processing system such as Hadoop MapReduce, and a separate stream-processing system such as Storm.
Lambda体系结构的核心思想是,传入的数据应该通过将不可变事件追加到始终增长的数据集中进行记录,类似于事件溯源(见“事件溯源”)。从这些事件中派生出了面向读取优化的视图。Lambda体系结构建议并行运行两个不同的系统:批处理系统(例如Hadoop MapReduce)和单独的流处理系统(例如Storm)。
In the lambda approach, the stream processor consumes the events and quickly produces an approximate update to the view; the batch processor later consumes the same set of events and produces a corrected version of the derived view. The reasoning behind this design is that batch processing is simpler and thus less prone to bugs, while stream processors are thought to be less reliable and harder to make fault-tolerant (see “Fault Tolerance” ). Moreover, the stream process can use fast approximate algorithms while the batch process uses slower exact algorithms.
在 Lambda 架构中,流处理器消耗事件并快速生成视图的近似更新;批处理器随后消耗相同的事件集,并生成导出视图的校正版本。设计背后的理由是批处理更简单,因此更少出错,而流处理器被认为不太可靠且更难以实现容错(参见“容错”)。此外,流处理可以使用快速的近似算法,而批处理使用更慢的精确算法。
The lambda architecture was an influential idea that shaped the design of data systems for the better, particularly by popularizing the principle of deriving views onto streams of immutable events and reprocessing events when needed. However, I also think that it has a number of practical problems:
Lambda架构是一种具有影响力的想法,为设计数据系统提供了更好的方式,特别是通过普及依据不变事件流派生视图并在需要时重新处理事件的原则。然而,我认为它也有一些实际问题:
-
Having to maintain the same logic to run both in a batch and in a stream processing framework is significant additional effort. Although libraries such as Summingbird [ 13 ] provide an abstraction for computations that can be run in either a batch or a streaming context, the operational complexity of debugging, tuning, and maintaining two different systems remains [ 14 ].
在批处理和流处理框架中保持相同的逻辑需要额外的努力。 尽管像Summingbird[13]这样的库提供了在批处理或流处理上下文中运行的计算的抽象,但调试,调整和维护两个不同的系统的操作复杂性仍然存在[14]。
-
Since the stream pipeline and the batch pipeline produce separate outputs, they need to be merged in order to respond to user requests. This merge is fairly easy if the computation is a simple aggregation over a tumbling window, but it becomes significantly harder if the view is derived using more complex operations such as joins and sessionization, or if the output is not a time series.
由于流水线和批处理管道产生不同的输出,它们需要合并才能响应用户请求。如果计算是在滚动窗口上进行简单的聚合,这个合并很容易,但如果视图是使用更复杂的操作如联接和sessionization衍生的,或者如果输出不是时间序列,这个合并将变得更加困难。
-
Although it is great to have the ability to reprocess the entire historical dataset, doing so frequently is expensive on large datasets. Thus, the batch pipeline often needs to be set up to process incremental batches (e.g., an hour’s worth of data at the end of every hour) rather than reprocessing everything. This raises the problems discussed in “Reasoning About Time” , such as handling stragglers and handling windows that cross boundaries between batches. Incrementalizing a batch computation adds complexity, making it more akin to the streaming layer, which runs counter to the goal of keeping the batch layer as simple as possible.
尽管重新处理整个历史数据集的能力很棒,但在大型数据集上频繁进行这样的操作代价高昂。因此,批处理管道通常需要设置为处理增量批次(例如,在每小时结束时处理一小时的数据),而不是重新处理所有数据。这引发了“时间推理”中讨论的问题,例如处理散乱数据和跨越批次之间边界的窗口。将批处理计算的增量化会增加复杂性,使其更类似于流层,这与保持批处理层尽可能简单的目标相违背。
Unifying batch and stream processing
More recent work has enabled the benefits of the lambda architecture to be enjoyed without its downsides, by allowing both batch computations (reprocessing historical data) and stream computations (processing events as they arrive) to be implemented in the same system [ 15 ].
最近的工作使得无需忍受Lambda架构缺点就能享受其优点,方法是在同一个系统中实现批量计算(重新处理历史数据)和流计算(处理事件到达)[15]。
Unifying batch and stream processing in one system requires the following features, which are becoming increasingly widely available:
统一批处理和流处理于一个系统中需要以下功能,这些功能变得越来越普遍可用:
-
The ability to replay historical events through the same processing engine that handles the stream of recent events. For example, log-based message brokers have the ability to replay messages (see “Replaying old messages” ), and some stream processors can read input from a distributed filesystem like HDFS.
能够通过处理最近事件的同一处理引擎来重播历史事件。例如,基于日志的消息代理可以回放消息(请参见“重播旧消息”),并且一些流处理器可以从分布式文件系统(如HDFS)读取输入。
-
Exactly-once semantics for stream processors—that is, ensuring that the output is the same as if no faults had occurred, even if faults did in fact occur (see “Fault Tolerance” ). Like with batch processing, this requires discarding the partial output of any failed tasks.
流处理器的确切一次语义——确保输出与未发生故障时相同,即使确实发生了故障(请参见“容错性”)。 与批处理类似,这需要放弃任何失败任务的部分输出。 简化后: 流处理器确保输出不受到故障影响,需舍弃任何失败任务的部分输出。
-
Tools for windowing by event time, not by processing time, since processing time is meaningless when reprocessing historical events (see “Reasoning About Time” ). For example, Apache Beam provides an API for expressing such computations, which can then be run using Apache Flink or Google Cloud Dataflow.
按事件时间进行窗口化处理的工具,而不是根据处理时间,因为在重新处理历史事件时处理时间是没有意义的(参见“关于时间的推理”)。例如,Apache Beam提供了一个API来表达这样的计算,然后可以使用Apache Flink或Google Cloud Dataflow来运行。
Unbundling Databases
At a most abstract level, databases, Hadoop, and operating systems all perform the same functions: they store some data, and they allow you to process and query that data [ 16 ]. A database stores data in records of some data model (rows in tables, documents, vertices in a graph, etc.) while an operating system’s filesystem stores data in files—but at their core, both are “information management” systems [ 17 ]. As we saw in Chapter 10 , the Hadoop ecosystem is somewhat like a distributed version of Unix.
在最抽象的层面上,数据库、Hadoop和操作系统都执行相同的功能:它们存储一些数据,并允许您处理和查询这些数据。数据库以某些数据模型的记录形式存储数据(表中的行、文档、图中的顶点等),而操作系统的文件系统以文件形式存储数据 - 但从其核心来看,两者都是“信息管理”系统。正如我们在第10章中所看到的,Hadoop生态系统有点像Unix的分布式版本。
Of course, there are many practical differences. For example, many filesystems do not cope very well with a directory containing 10 million small files, whereas a database containing 10 million small records is completely normal and unremarkable. Nevertheless, the similarities and differences between operating systems and databases are worth exploring.
当然,操作系统和数据库之间存在许多实用的不同之处。例如,许多文件系统无法很好地处理包含1千万个小文件的目录,而包含1千万个小记录的数据库则是很正常和平凡的情况。尽管如此,操作系统和数据库之间的相似性和不同之处还是值得探讨的。
Unix and relational databases have approached the information management problem with very different philosophies. Unix viewed its purpose as presenting programmers with a logical but fairly low-level hardware abstraction, whereas relational databases wanted to give application programmers a high-level abstraction that would hide the complexities of data structures on disk, concurrency, crash recovery, and so on. Unix developed pipes and files that are just sequences of bytes, whereas databases developed SQL and transactions.
Unix和关系型数据库以非常不同的哲学方法解决信息管理问题。 Unix认为其目的是向程序员提供一个逻辑但相当低级别的硬件抽象,而关系型数据库希望为应用程序员提供一个高级抽象,隐藏磁盘上的数据结构、并发性、崩溃恢复等复杂性。 Unix开发了管道和文件,它们只是字节序列,而数据库则开发了SQL和事务。
Which approach is better? Of course, it depends what you want. Unix is “simpler” in the sense that it is a fairly thin wrapper around hardware resources; relational databases are “simpler” in the sense that a short declarative query can draw on a lot of powerful infrastructure (query optimization, indexes, join methods, concurrency control, replication, etc.) without the author of the query needing to understand the implementation details.
哪种方法更好?当然,这取决于你想要什么。Unix的“简单”在于它是一个相当薄的硬件资源包装器;关系型数据库的“简单”在于,一个简短的声明性查询可以利用大量强大的基础设施(查询优化、索引、连接方法、并发控制、复制等等),而查询的作者无需理解实现细节。
The tension between these philosophies has lasted for decades (both Unix and the relational model emerged in the early 1970s) and still isn’t resolved. For example, I would interpret the NoSQL movement as wanting to apply a Unix-esque approach of low-level abstractions to the domain of distributed OLTP data storage.
这些哲学之间的紧张关系已经持续了数十年(Unix 和关系模型都出现在 20 世纪 70 年代初),但仍未得到解决。例如,我认为 NoSQL 运动想要将低级抽象的 Unix 式方法应用到分布式 OLTP 数据存储领域。
In this section I will attempt to reconcile the two philosophies, in the hope that we can combine the best of both worlds.
在这个部分,我将尝试调和这两种哲学,希望我们能将两种最好的东西结合起来。
Composing Data Storage Technologies
Over the course of this book we have discussed various features provided by databases and how they work, including:
在本书中,我们讨论了数据库提供的各种功能和它们的运作方式,包括:
-
Secondary indexes, which allow you to efficiently search for records based on the value of a field (see “Other Indexing Structures” )
二级索引可以根据字段值高效搜索记录(参见“其他索引结构”)。
-
Materialized views, which are a kind of precomputed cache of query results (see “Aggregation: Data Cubes and Materialized Views” )
材料化视图是查询结果的预计算缓存,是数据立方体和材料化视图的聚合方法之一。
-
Replication logs, which keep copies of the data on other nodes up to date (see “Implementation of Replication Logs” )
复制日志,它可以使其他节点的数据副本保持最新(参见“复制日志的实现”)。
-
Full-text search indexes, which allow keyword search in text (see “Full-text search and fuzzy indexes” ) and which are built into some relational databases [ 1 ]
全文搜索索引,允许关键词在文本中搜索(见“全文搜索和模糊索引”),并且已经内置在一些关系型数据库中[1]。
In Chapters 10 and 11 , similar themes emerged. We talked about building full-text search indexes (see “The Output of Batch Workflows” ), about materialized view maintenance (see “Maintaining materialized views” ), and about replicating changes from a database to derived data systems (see “Change Data Capture” ).
在第10章和第11章,出现了类似的主题。我们谈论了构建全文搜索索引(请参见“批处理工作流的输出”),关于物化视图的维护(请参见“维护物化视图”),以及从数据库到派生数据系统的复制更改(请参见“更改数据捕捉”)。
It seems that there are parallels between the features that are built into databases and the derived data systems that people are building with batch and stream processors.
似乎数据库中内置的功能和人们使用批处理和流处理器构建的派生数据系统之间存在相似之处。
Creating an index
Think about what happens when you run
CREATE INDEX
to create a new index in a relational database.
The database has to scan over a consistent snapshot of a table, pick out all of the field values
being indexed, sort them, and write out the index. Then it must process the backlog of writes that
have been made since the consistent snapshot was taken (assuming the table was not locked while
creating the index, so writes could continue). Once that is done, the database must continue to keep
the index up to date whenever a transaction writes to the table.
当在关系型数据库中运行CREATE INDEX以创建新索引时,请考虑会发生什么。数据库必须扫描表的一致快照,挑选出所有正在索引的字段值,对它们进行排序,并编写索引。然后,它必须处理自一致快照以来产生的写入积压(假设在创建索引时没有锁定表格,因此可以继续写入)。完成此操作后,数据库必须继续在事务写入表时保持索引最新。
This process is remarkably similar to setting up a new follower replica (see “Setting Up New Followers” ), and also very similar to bootstrapping change data capture in a streaming system (see “Initial snapshot” ).
这个过程与设置新的从属复制(请参见“设置新的从属”)非常相似,也与在流系统中引导变更数据捕获(请参见“初始快照”)非常相似。
Whenever you run
CREATE INDEX
, the database essentially reprocesses the existing dataset (as
discussed in
“Reprocessing data for application evolution”
) and derives the index as a new view onto the existing
data. The existing data may be a snapshot of the state rather than a log of all changes that ever
happened, but the two are closely related (see
“State, Streams, and Immutability”
).
每当运行CREATE INDEX时,数据库本质上重新处理现有数据集(如“重新处理应用程序演变中的数据”中所讨论的),并将索引视为现有数据的新视图。现有数据可能是状态的快照,而不是所有发生变化的日志,但两者密切相关(请参见“状态、流和不可变性”)。
The meta-database of everything
In this light, I think that the dataflow across an entire organization starts looking like one huge database [ 7 ]. Whenever a batch, stream, or ETL process transports data from one place and form to another place and form, it is acting like the database subsystem that keeps indexes or materialized views up to date.
从这个角度来看,我认为整个组织的数据流开始看起来像一个巨大的数据库[7]。无论是批处理、流处理还是 ETL 进程,将数据从一个地方和形式传输到另一个地方和形式时,就像数据库子系统,保持索引或材料化视图最新。
Viewed like this, batch and stream processors are like elaborate implementations of triggers, stored procedures, and materialized view maintenance routines. The derived data systems they maintain are like different index types. For example, a relational database may support B-tree indexes, hash indexes, spatial indexes (see “Multi-column indexes” ), and other types of indexes. In the emerging architecture of derived data systems, instead of implementing those facilities as features of a single integrated database product, they are provided by various different pieces of software, running on different machines, administered by different teams.
从这个角度来看,批处理和流处理器就像是复杂的触发器、存储过程和物化视图维护程序的实现。它们维护的派生数据系统就像是不同的索引类型。例如,关系数据库可以支持B-tree索引、哈希索引、空间索引(参见“多列索引”)和其他类型的索引。在派生数据系统的新兴架构中,它们不再是单一集成数据库产品特性的实现,而是由各种不同的软件提供,运行在不同的机器上,由不同的团队管理。
Where will these developments take us in the future? If we start from the premise that there is no single data model or storage format that is suitable for all access patterns, I speculate that there are two avenues by which different storage and processing tools can nevertheless be composed into a cohesive system:
这些发展将带领我们走向何方?如果我们的前提是没有一个适用于所有访问模式的单一数据模型或存储格式,那么我猜测有两种途径不同的存储和处理工具可以被组合成一个有凝聚力的系统。
- Federated databases: unifying reads
-
It is possible to provide a unified query interface to a wide variety of underlying storage engines and processing methods—an approach known as a federated database or polystore [ 18 , 19 ]. For example, PostgreSQL’s foreign data wrapper feature fits this pattern [ 20 ]. Applications that need a specialized data model or query interface can still access the underlying storage engines directly, while users who want to combine data from disparate places can do so easily through the federated interface.
可以为各种底层存储引擎和处理方法提供统一的查询接口,这种方法被称为联合数据库或多存储[18,19]。 例如,PostgreSQL的外部数据封装功能符合此模式[20]。 需要专门的数据模型或查询接口的应用程序仍然可以直接访问底层存储引擎,而想要从不同地方合并数据的用户可以通过联合接口轻松实现。
A federated query interface follows the relational tradition of a single integrated system with a high-level query language and elegant semantics, but a complicated implementation.
一个联合查询界面遵循单一集成系统的关系传统,具有高级查询语言和优美的语义,但实现复杂。
- Unbundled databases: unifying writes
-
While federation addresses read-only querying across several different systems, it does not have a good answer to synchronizing writes across those systems. We said that within a single database, creating a consistent index is a built-in feature. When we compose several storage systems, we similarly need to ensure that all data changes end up in all the right places, even in the face of faults. Making it easier to reliably plug together storage systems (e.g., through change data capture and event logs) is like unbundling a database’s index-maintenance features in a way that can synchronize writes across disparate technologies [ 7 , 21 ].
联合地址用于只读查询多个不同系统,但它没有一个好的答案来同步这些系统中的写操作。我们说,在一个单一的数据库中,创建一个一致的索引是一个内置的功能。当我们组合多个存储系统时,同样需要确保所有数据更改都出现在所有正确的位置,即使在面对故障的情况下。使可靠地将存储系统连接在一起变得更容易(例如,通过变更数据捕获和事件日志)就像解开数据库的索引维护功能,以在不同的技术中同步写入一样 [7,21]。
The unbundled approach follows the Unix tradition of small tools that do one thing well [ 22 ], that communicate through a uniform low-level API (pipes), and that can be composed using a higher-level language (the shell) [ 16 ].
这种分解的方法遵循了Unix传统:使用小而精的工具 [22],它们通过一个统一的低级API(管道)进行通信,并且可以使用高级语言(shell)进行组合 [16]。
Making unbundling work
Federation and unbundling are two sides of the same coin: composing a reliable, scalable, and maintainable system out of diverse components. Federated read-only querying requires mapping one data model into another, which takes some thought but is ultimately quite a manageable problem. I think that keeping the writes to several storage systems in sync is the harder engineering problem, and so I will focus on it.
联邦化和分解是同一个硬币的两面:通过组合不同的组件来构建可靠、可扩展和易维护的系统。联邦只读查询需要将一个数据模型映射到另一个模型,这需要一些思考,但最终是一个可以管理的问题。我认为让多个存储系统的写入同步是更困难的工程问题,因此我将专注于此。
The traditional approach to synchronizing writes requires distributed transactions across heterogeneous storage systems [ 18 ], which I think is the wrong solution (see “Derived data versus distributed transactions” ). Transactions within a single storage or stream processing system are feasible, but when data crosses the boundary between different technologies, I believe that an asynchronous event log with idempotent writes is a much more robust and practical approach.
传统的同步写入方法需要在异构存储系统中进行分布式事务[18],我认为这是错误的解决方案(请参阅“派生数据与分布式事务”)。在单个存储或流处理系统中的事务是可行的,但当数据越过不同技术之间的界限时,我认为使用具有幂等写入的异步事件日志是更强大和实用的方法。
For example, distributed transactions are used within some stream processors to achieve exactly-once semantics (see “Atomic commit revisited” ), and this can work quite well. However, when a transaction would need to involve systems written by different groups of people (e.g., when data is written from a stream processor to a distributed key-value store or search index), the lack of a standardized transaction protocol makes integration much harder. An ordered log of events with idempotent consumers (see “Idempotence” ) is a much simpler abstraction, and thus much more feasible to implement across heterogeneous systems [ 7 ].
例如,一些流处理器在实现正好一次语义时使用分布式事务(请参见“重新审视原子提交”),这可以很好地工作。但是,当事务需要涉及由不同团队编写的系统时(例如,从流处理器将数据写入分布式键值存储或搜索索引时),缺乏标准化的事务协议会使集成变得更加困难。具有幂等消费者的事件有序日志(请参见“幂等性”)是一个更简单的抽象,因此更容易在异构系统中实现[7]。
The big advantage of log-based integration is loose coupling between the various components, which manifests itself in two ways:
日志为基础的集成的重要优势在于各组件之间的松散耦合性,表现为两方面:
-
At a system level, asynchronous event streams make the system as a whole more robust to outages or performance degradation of individual components. If a consumer runs slow or fails, the event log can buffer messages (see “Disk space usage” ), allowing the producer and any other consumers to continue running unaffected. The faulty consumer can catch up when it is fixed, so it doesn’t miss any data, and the fault is contained. By contrast, the synchronous interaction of distributed transactions tends to escalate local faults into large-scale failures (see “Limitations of distributed transactions” ).
在系统层面上,异步事件流使整个系统更具鲁棒性,能够应对单个组件的故障或性能降级。如果消费者运行缓慢或失败,事件日志可以缓冲消息(参见“磁盘空间使用”),使生产者和任何其他消费者可以继续运行而不受影响。当有故障的消费者修复后,它可以赶上进度,因此不会错过任何数据,同时也可以将故障局限在一个地方。相比之下,分布式事务的同步交互往往会将本地故障升级为大规模故障(参见“分布式事务的限制”)。
-
At a human level, unbundling data systems allows different software components and services to be developed, improved, and maintained independently from each other by different teams. Specialization allows each team to focus on doing one thing well, with well-defined interfaces to other teams’ systems. Event logs provide an interface that is powerful enough to capture fairly strong consistency properties (due to durability and ordering of events), but also general enough to be applicable to almost any kind of data.
从人类的角度来看,数据系统的拆解允许不同的软件组件和服务由不同的团队独立地开发、改进和维护。专业化使得每个团队专注于做好一件事,并与其他团队的系统有良好定义的接口。事件日志提供了一个接口,它足够强大以捕捉相当强的一致性属性(由于事件的持久性和排序),同时又足够通用以适用于几乎任何类型的数据。
Unbundled versus integrated systems
If unbundling does indeed become the way of the future, it will not replace databases in their current form—they will still be needed as much as ever. Databases are still required for maintaining state in stream processors, and in order to serve queries for the output of batch and stream processors (see “The Output of Batch Workflows” and “Processing Streams” ). Specialized query engines will continue to be important for particular workloads: for example, query engines in MPP data warehouses are optimized for exploratory analytic queries and handle this kind of workload very well (see “Comparing Hadoop to Distributed Databases” ).
如果解耦确实成为未来的趋势,它不会取代当前形式的数据库 - 它们仍然像以往一样必不可少。 数据库仍然需要用于在流处理器中维护状态,并为批处理和流处理器的输出提供查询服务(参见“批处理工作流程的输出”和“处理流”)。 专业化的查询引擎将继续对特定工作负载很重要:例如,MPP数据仓库中的查询引擎经过优化,非常适合探索式分析查询负载(请参见“比较Hadoop和分布式数据库”)。
The complexity of running several different pieces of infrastructure can be a problem: each piece of software has a learning curve, configuration issues, and operational quirks, and so it is worth deploying as few moving parts as possible. A single integrated software product may also be able to achieve better and more predictable performance on the kinds of workloads for which it is designed, compared to a system consisting of several tools that you have composed with application code [ 23 ]. As I said in the Preface , building for scale that you don’t need is wasted effort and may lock you into an inflexible design. In effect, it is a form of premature optimization.
运营不同基础设施的复杂性可能会产生问题:每个软件都有其发展过程、配置问题和操作特点,因此部署尽可能少的动态部分是值得的。与由多个工具组成的系统相比,单个集成软件产品也可能能够在其设计的工作负载类型上实现更好且更可预测的性能。正如我在前言中所说,为没有必要的规模而构建是浪费的努力,可能会将您锁定在不灵活的设计中。实际上,这是一种过早优化的形式。
The goal of unbundling is not to compete with individual databases on performance for particular workloads; the goal is to allow you to combine several different databases in order to achieve good performance for a much wider range of workloads than is possible with a single piece of software. It’s about breadth, not depth—in the same vein as the diversity of storage and processing models that we discussed in “Comparing Hadoop to Distributed Databases” .
解绑的目标不是为了在特定工作负载的性能方面与单个数据库竞争;目标是允许您组合几个不同的数据库,以实现比单个软件更宽的工作负载范围的良好性能。它是关于广度,而不是深度 - 就像我们在“比较Hadoop和分布式数据库”中讨论的存储和处理模型的多样性一样。
Thus, if there is a single technology that does everything you need, you’re most likely best off simply using that product rather than trying to reimplement it yourself from lower-level components. The advantages of unbundling and composition only come into the picture when there is no single piece of software that satisfies all your requirements.
因此,如果有一种技术可以满足你所有需求,最好的选择是直接使用该产品,而不是尝试从更低级别的组件重新实现它。只有在没有单一软件满足您所有要求时,拆分和组合的优点才会出现。
What’s missing?
The tools for composing data systems are getting better, but I think one major part is missing: we don’t yet have the unbundled-database equivalent of the Unix shell (i.e., a high-level language for composing storage and processing systems in a simple and declarative way).
数据系统构建工具越来越好,但我认为还缺少一个重要的部分:我们还没有类似于Unix shell的拆分数据库(即,一种高级语言,可以简单声明式地组合存储和处理系统)。
For example, I would love it if we could simply declare
mysql | elasticsearch
, by analogy to Unix
pipes [
22
], which would be the
unbundled equivalent of
CREATE INDEX
: it would take all the documents in a MySQL database and
index them in an Elasticsearch cluster. It would then continually capture all the changes made to
the database and automatically apply them to the search index, without us having to write custom
application code. This kind of integration should be possible with almost any kind of storage or
indexing system.
例如,如果我们能够像Unix管道[22]的类比那样简单地声明mysql | elasticsearch,那将是CREATE INDEX的未捆绑等效项:它将获取MySQL数据库中的所有文档并将它们索引到Elasticsearch群集中。然后,它将不断捕获对数据库所做的所有更改并自动将它们应用于搜索索引,而无需我们编写自定义应用程序代码。几乎任何类型的存储或索引系统都应该能够实现这种集成。
Similarly, it would be great to be able to precompute and update caches more easily. Recall that a materialized view is essentially a precomputed cache, so you could imagine creating a cache by declaratively specifying materialized views for complex queries, including recursive queries on graphs (see “Graph-Like Data Models” ) and application logic. There is interesting early-stage research in this area, such as differential dataflow [ 24 , 25 ], and I hope that these ideas will find their way into production systems.
同样地,能够更轻松地预计算和更新缓存将是非常好的。回顾一下,物化视图本质上就是预计算的缓存,所以你可以通过声明性地指定复杂查询的物化视图来创建缓存,包括对图形递归查询(请参见“类似图形的数据模型”)和应用逻辑。在这个领域已经有了有趣的早期研究,例如差分数据流 [24, 25],我希望这些想法能够应用于生产系统中。
Designing Applications Around Dataflow
The approach of unbundling databases by composing specialized storage and processing systems with application code is also becoming known as the “database inside-out” approach [ 26 ], after the title of a conference talk I gave in 2014 [ 27 ]. However, calling it a “new architecture” is too grandiose. I see it more as a design pattern, a starting point for discussion, and we give it a name simply so that we can better talk about it.
通过将专门的存储和处理系统与应用程序代码组合来分解数据库的方法,也被称为“内部化的数据库”方法[26],这是我在2014年的一次会议演讲的标题[27]。然而,称其为“新架构”太过夸张。我认为它更像是一种设计模式,是讨论的起点,我们只是给它一个名称,以便我们更好地谈论它。
These ideas are not mine; they are simply an amalgamation of other people’s ideas from which I think we should learn. In particular, there is a lot of overlap with dataflow languages such as Oz [ 28 ] and Juttle [ 29 ], functional reactive programming (FRP) languages such as Elm [ 30 , 31 ], and logic programming languages such as Bloom [ 32 ]. The term unbundling in this context was proposed by Jay Kreps [ 7 ].
这些想法不是我的,只是其他人的想法的融合,我认为我们应该从中学习。特别地,与数据流语言(如Oz [28]和Juttle [29])、函数响应式编程(FRP)语言(如Elm [30,31])和逻辑编程语言(如Bloom [32])有很多重叠之处。在这种情况下,“解捆绑”这个术语是由Jay Kreps [7]提出的。
Even spreadsheets have dataflow programming capabilities that are miles ahead of most mainstream programming languages [ 33 ]. In a spreadsheet, you can put a formula in one cell (for example, the sum of cells in another column), and whenever any input to the formula changes, the result of the formula is automatically recalculated. This is exactly what we want at a data system level: when a record in a database changes, we want any index for that record to be automatically updated, and any cached views or aggregations that depend on the record to be automatically refreshed. You should not have to worry about the technical details of how this refresh happens, but be able to simply trust that it works correctly.
即使是电子表格也具有数据流编程能力,这方面甚至比大多数主流编程语言还要强大[33]。在电子表格中,你可以在一个单元格中输入公式(例如,另一列单元格的总和),并且每当公式的任何输入发生改变,公式的结果就会自动重新计算。这正是我们在数据系统级别想要的:当数据库中的记录发生更改时,我们希望自动更新该记录的任何索引,并自动刷新任何依赖于该记录的缓存视图或聚合。你不应该担心这种刷新是如何发生的技术细节,而是应该简单地相信它能正常工作。
Thus, I think that most data systems still have something to learn from the features that VisiCalc already had in 1979 [ 34 ]. The difference from spreadsheets is that today’s data systems need to be fault-tolerant, scalable, and store data durably. They also need to be able to integrate disparate technologies written by different groups of people over time, and reuse existing libraries and services: it is unrealistic to expect all software to be developed using one particular language, framework, or tool.
因此,我认为大多数数据系统仍需从VisiCalc在1979年已有的功能中学习[34]。与电子表格的不同之处在于,现代数据系统需要具备容错性、可扩展性和数据持久性存储。它们还需要能够整合不同时间段内由不同人编写的不同技术,并重用现有的库和服务:期望所有的软件都使用特定的语言、框架或工具进行开发是不现实的。
In this section I will expand on these ideas and explore some ways of building applications around the ideas of unbundled databases and dataflow.
在这个部分,我将进一步阐述这些想法并探讨围绕未捆绑的数据库和数据流构建应用程序的一些方法。
Application code as a derivation function
When one dataset is derived from another, it goes through some kind of transformation function. For example:
当一个数据集来自另一个数据集时,它会经过某种转换函数进行转换。例如:
-
A secondary index is a kind of derived dataset with a straightforward transformation function: for each row or document in the base table, it picks out the values in the columns or fields being indexed, and sorts by those values (assuming a B-tree or SSTable index, which are sorted by key, as discussed in Chapter 3 ).
二级索引是一种派生数据集,可以通过简单的转换函数实现:对于基础表中的每一行或文档,它会挑选出需要建立索引的列或字段的值,并按这些值排序(假设使用基于键的B树或SSTable索引,如第三章所讨论的)。
-
A full-text search index is created by applying various natural language processing functions such as language detection, word segmentation, stemming or lemmatization, spelling correction, and synonym identification, followed by building a data structure for efficient lookups (such as an inverted index).
通过应用各种自然语言处理功能,如语言检测、词语切分、词干提取或词形归并、拼写矫正和同义词识别,可以创建一个全文搜索索引,然后构建有效查找的数据结构(如倒排索引)。
-
In a machine learning system, we can consider the model as being derived from the training data by applying various feature extraction and statistical analysis functions. When the model is applied to new input data, the output of the model is derived from the input and the model (and hence, indirectly, from the training data).
在机器学习系统中,我们可以将模型视为通过应用各种特征提取和统计分析函数从训练数据中派生出来的。当该模型应用于新的输入数据时,模型的输出是从输入和模型(间接地从训练数据)派生出来的。
-
A cache often contains an aggregation of data in the form in which it is going to be displayed in a user interface (UI). Populating the cache thus requires knowledge of what fields are referenced in the UI; changes in the UI may require updating the definition of how the cache is populated and rebuilding the cache.
缓存通常包含聚合的数据,以它将要在用户界面(UI)中显示的形式为基础。因此,填充缓存需要知道在UI中引用了哪些字段;在UI中进行更改可能需要更新填充缓存的定义并重新构建缓存。
The derivation function for a secondary index is so commonly required that it is built into many
databases as a core feature, and you can invoke it by merely saying
CREATE INDEX
. For full-text
indexing, basic linguistic features for common languages may be built into a database, but the more
sophisticated features often require domain-specific tuning. In machine learning, feature
engineering is notoriously application-specific, and often has to incorporate detailed knowledge
about the user interaction and deployment of an application
[
35
].
次要索引的派生函数是如此常用,以至于它被构建进了许多数据库的核心功能,您只需说CREATE INDEX就可以调用它。对于全文索引,通用语言的基本语言特征可能被构建到数据库中,但更复杂的特征通常需要特定于领域的调整。在机器学习中,特征工程因应用程序而异,通常必须结合对用户交互和应用程序部署的详细知识[35]。
When the function that creates a derived dataset is not a standard cookie-cutter function like creating a secondary index, custom code is required to handle the application-specific aspects. And this custom code is where many databases struggle. Although relational databases commonly support triggers, stored procedures, and user-defined functions, which can be used to execute application code within the database, they have been somewhat of an afterthought in database design (see “Transmitting Event Streams” ).
当用于创建派生数据集的函数不是标准的模板函数,例如创建二级索引,需要编写定制代码来处理特定于应用程序的方面。这些定制代码是许多数据库的难点。尽管关系数据库通常支持触发器、存储过程和用户定义函数,可以用于在数据库中执行应用程序代码,但它们在数据库设计中有些被忽略(见“传输事件流”)。
Separation of application code and state
In theory, databases could be deployment environments for arbitrary application code, like an operating system. However, in practice they have turned out to be poorly suited for this purpose. They do not fit well with the requirements of modern application development, such as dependency and package management, version control, rolling upgrades, evolvability, monitoring, metrics, calls to network services, and integration with external systems.
在理论上,数据库可以是任意应用程序代码的部署环境,就像操作系统一样。然而,在实际应用中,它们被证明不适合这个目的。它们与现代应用程序开发的要求不相符,如依赖和包管理、版本控制、滚动升级、可演进性、监控、指标、调用网络服务和与外部系统集成。
On the other hand, deployment and cluster management tools such as Mesos, YARN, Docker, Kubernetes, and others are designed specifically for the purpose of running application code. By focusing on doing one thing well, they are able to do it much better than a database that provides execution of user-defined functions as one of its many features.
另一方面,部署和集群管理工具如Mesos、YARN、Docker、Kubernetes等都是专门为运行应用程序代码而设计的。通过专注于做好一件事情,它们能够比提供执行用户定义功能作为其众多功能之一的数据库做得更好。
I think it makes sense to have some parts of a system that specialize in durable data storage, and other parts that specialize in running application code. The two can interact while still remaining independent.
我认为在系统中有一些部分专门负责持久数据存储,其他部分专门负责运行应用程序代码是有意义的。这两者可以互相交互,同时仍然保持独立。
Most web applications today are deployed as stateless services, in which any user request can be routed to any application server, and the server forgets everything about the request once it has sent the response. This style of deployment is convenient, as servers can be added or removed at will, but the state has to go somewhere: typically, a database. The trend has been to keep stateless application logic separate from state management (databases): not putting application logic in the database and not putting persistent state in the application [ 36 ]. As people in the functional programming community like to joke, “We believe in the separation of Church and state” [ 37 ]. i
当今大多数Web应用程序都作为无状态服务部署,其中任何用户请求都可以路由到任何应用程序服务器,并且服务器在发送响应后就忘记了有关请求的一切。这种部署风格很方便,因为可以随意添加或删除服务器,但状态必须存在某个地方:通常是数据库。趋势是将无状态应用程序逻辑与状态管理(数据库)分开:不将应用程序逻辑放在数据库中,也不将持久状态放在应用程序中。正如函数式编程社区中的人们喜欢开玩笑:“我们相信教会与政府应该分离”。
In this typical web application model, the database acts as a kind of mutable shared variable that can be accessed synchronously over the network. The application can read and update the variable, and the database takes care of making it durable, providing some concurrency control and fault tolerance.
在这种典型的Web应用程序模型中,数据库充当一种可在网络上同步访问的可变共享变量。应用程序可以读取并更新该变量,数据库负责使其持久,提供一些并发控制和容错能力。
However, in most programming languages you cannot subscribe to changes in a mutable variable—you can only read it periodically. Unlike in a spreadsheet, readers of the variable don’t get notified if the value of the variable changes. (You can implement such notifications in your own code—this is known as the observer pattern —but most languages do not have this pattern as a built-in feature.)
然而,在大多数编程语言中,您无法订阅可变变量的更改 - 您只能定期读取它。与电子表格不同,变量的读者如果变量的值发生更改,不会收到通知。 (您可以在自己的代码中实现此类通知 - 这称为观察者模式 - 但大多数语言没有此模式作为内置功能。)
Databases have inherited this passive approach to mutable data: if you want to find out whether the content of the database has changed, often your only option is to poll (i.e., to repeat your query periodically). Subscribing to changes is only just beginning to emerge as a feature (see “API support for change streams” ).
数据库已经继承了这种对可变数据的被动方法:如果您想了解数据库内容是否更改,通常您唯一的选择是轮询(即定期重复查询)。订阅更改仅在最近作为一项功能出现(参见“支持变更流的API”)。
Dataflow: Interplay between state changes and application code
Thinking about applications in terms of dataflow implies renegotiating the relationship between application code and state management. Instead of treating a database as a passive variable that is manipulated by the application, we think much more about the interplay and collaboration between state, state changes, and code that processes them. Application code responds to state changes in one place by triggering state changes in another place.
考虑数据流应用意味着重新协商应用代码和状态管理之间的关系。我们不再将数据库视为应用程序操作的被动变量,而是更多地考虑状态、状态变化和处理它们的代码之间的相互作用和协作。应用代码通过在一个地方响应状态变化来触发另一个地方的状态变化。
We saw this line of thinking in “Databases and Streams” , where we discussed treating the log of changes to a database as a stream of events that we can subscribe to. Message-passing systems such as actors (see “Message-Passing Dataflow” ) also have this concept of responding to events. Already in the 1980s, the tuple spaces model explored expressing distributed computations in terms of processes that observe state changes and react to them [ 38 , 39 ].
我们在“数据库和数据流”中看到了这种思路,我们讨论了将对数据库的更改日志视为我们可以订阅的事件流。消息传递系统如演员(见“消息传递数据流”)也有响应事件的概念。早在20世纪80年代,元组空间模型就探索了用观察状态变化并对其做出反应的进程来表达分布式计算的方法。[38,39]。
As discussed, similar things happen inside a database when a trigger fires due to a data change, or when a secondary index is updated to reflect a change in the table being indexed. Unbundling the database means taking this idea and applying it to the creation of derived datasets outside of the primary database: caches, full-text search indexes, machine learning, or analytics systems. We can use stream processing and messaging systems for this purpose.
讨论过后,当触发器由于数据更改被触发,或者当二级索引更新以反映被索引的表的更改时,类似的事情会发生在数据库内部。将数据库解绑意味着将这个想法应用于在主数据库之外创建派生数据集的过程中:缓存、全文搜索索引、机器学习或分析系统。我们可以使用流处理和消息系统来实现这个目的。
The important thing to keep in mind is that maintaining derived data is not the same as asynchronous job execution, for which messaging systems are traditionally designed (see “Logs compared to traditional messaging” ):
需要记住的重要事情是,维护派生数据并不等同于异步作业执行,传统的消息系统正是为此而设计的(见“与传统消息相比的日志”):
-
When maintaining derived data, the order of state changes is often important (if several views are derived from an event log, they need to process the events in the same order so that they remain consistent with each other). As discussed in “Acknowledgments and redelivery” , many message brokers do not have this property when redelivering unacknowledged messages. Dual writes are also ruled out (see “Keeping Systems in Sync” ).
在维护派生数据时,状态更改的顺序通常很重要(如果从事件日志派生了多个视图,则它们需要按相同的顺序处理事件,以保持彼此一致)。如“确认和重新交付”中所讨论的,许多消息代理在重新传递未经确认的消息时没有此属性。双重写入也被排除(请参见“保持系统同步”)。
-
Fault tolerance is key for derived data: losing just a single message causes the derived dataset to go permanently out of sync with its data source. Both message delivery and derived state updates must be reliable. For example, many actor systems by default maintain actor state and messages in memory, so they are lost if the machine running the actor crashes.
容错性对于派生数据非常关键:仅丢失一条信息就可能导致派生数据集永久失去与其数据源的同步。消息传递和派生状态更新都必须可靠。例如,许多演员系统默认将演员状态和消息保存在内存中,因此如果运行演员的计算机崩溃,则它们将丢失。
Stable message ordering and fault-tolerant message processing are quite stringent demands, but they are much less expensive and more operationally robust than distributed transactions. Modern stream processors can provide these ordering and reliability guarantees at scale, and they allow application code to be run as stream operators.
稳定的消息排序和容错的消息处理是非常严格的要求,但它们比分布式事务更便宜、更具操作韧性。现代的流处理器可以提供这些排序和可靠性保证,还允许应用程序代码作为流操作符运行。
This application code can do the arbitrary processing that built-in derivation functions in databases generally don’t provide. Like Unix tools chained by pipes, stream operators can be composed to build large systems around dataflow. Each operator takes streams of state changes as input, and produces other streams of state changes as output.
该应用程序代码可以进行内置派生函数通常不提供的任意处理。就像通过管道链接的Unix工具一样,流操作器可以组合以构建围绕数据流的大型系统。每个操作器都将状态更改流作为输入,并生成其他状态更改流作为输出。
Stream processors and services
The currently trendy style of application development involves breaking down functionality into a set of services that communicate via synchronous network requests such as REST APIs (see “Dataflow Through Services: REST and RPC” ). The advantage of such a service-oriented architecture over a single monolithic application is primarily organizational scalability through loose coupling: different teams can work on different services, which reduces coordination effort between teams (as long as the services can be deployed and updated independently).
当前流行的应用程序开发风格涉及将功能分解为一组通过同步网络请求进行通信的服务,例如REST API(请参见“服务之间的数据流:REST和RPC”)。与单个单块应用程序相比,这种面向服务的体系结构的优势主要在于通过松散的耦合实现组织可扩展性:不同的团队可以在不同的服务上工作,这降低了团队之间的协调工作(只要可以独立部署和更新服务)。
Composing stream operators into dataflow systems has a lot of similar characteristics to the microservices approach [ 40 ]. However, the underlying communication mechanism is very different: one-directional, asynchronous message streams rather than synchronous request/response interactions.
把流运算符组成数据流系统与微服务方法有很多相似性[40]。然而,底层通信机制非常不同:单向的异步消息流而不是同步的请求/响应交互。
Besides the advantages listed in “Message-Passing Dataflow” , such as better fault tolerance, dataflow systems can also achieve better performance. For example, say a customer is purchasing an item that is priced in one currency but paid for in another currency. In order to perform the currency conversion, you need to know the current exchange rate. This operation could be implemented in two ways [ 40 , 41 ]:
除了“消息传递数据流”中列出的优点之外,例如更好的容错能力,数据流系统还可以实现更好的性能。例如,假设客户购买的商品定价为一种货币,但是用另一种货币支付。为了进行货币转换,您需要知道当前的汇率。此操作可以通过两种方式实现[40,41]:
-
In the microservices approach, the code that processes the purchase would probably query an exchange-rate service or database in order to obtain the current rate for a particular currency.
在微服务的方法中,处理购买的代码可能会查询一个汇率服务或数据库,以获取特定货币的当前汇率。
-
In the dataflow approach, the code that processes purchases would subscribe to a stream of exchange rate updates ahead of time, and record the current rate in a local database whenever it changes. When it comes to processing the purchase, it only needs to query the local database.
在数据流方法中,处理购买的代码将提前订阅汇率更新的流,并在汇率发生变化时将当前汇率记录在本地数据库中。在处理购买时,它只需要查询本地数据库。
The second approach has replaced a synchronous network request to another service with a query to a local database (which may be on the same machine, even in the same process). ii Not only is the dataflow approach faster, but it is also more robust to the failure of another service. The fastest and most reliable network request is no network request at all! Instead of RPC, we now have a stream join between purchase events and exchange rate update events (see “Stream-table join (stream enrichment)” ).
第二种方法已经将同步网络请求另一个服务转换为查询本地数据库(可能在同一台机器上,甚至在同一进程中)。流数据方法不仅更快,而且对于另一个服务的失效也更加稳健。最快、最可靠的网络请求就是根本不发送网络请求!现在,我们不再使用 RPC,而是在购买事件和汇率更新事件之间进行流加入(参见“流表连接(流扩充)”)。
The join is time-dependent: if the purchase events are reprocessed at a later point in time, the exchange rate will have changed. If you want to reconstruct the original output, you will need to obtain the historical exchange rate at the original time of purchase. No matter whether you query a service or subscribe to a stream of exchange rate updates, you will need to handle this time dependence (see “Time-dependence of joins” ).
连接是时间依赖性的:如果购买事件在以后的时间被重新处理,汇率将会发生变化。如果您想重建原始输出,您需要获得购买时刻的历史汇率。无论您查询服务还是订阅汇率更新流,您都需要处理这种时间依赖性(请参见“连接的时间依赖性”)。
Subscribing to a stream of changes, rather than querying the current state when needed, brings us closer to a spreadsheet-like model of computation: when some piece of data changes, any derived data that depends on it can swiftly be updated. There are still many open questions, for example around issues like time-dependent joins, but I believe that building applications around dataflow ideas is a very promising direction to go in.
订阅一组变化流,而不是按需查询当前状态,使我们更接近电子表格式的计算模型:当某些数据发生变化时,依赖于它的任何衍生数据都可以迅速更新。仍然存在许多未解决的问题,例如有关时间依存连接等问题,但我相信以数据流思想为基础构建应用程序是一个非常有前途的方向。
Observing Derived State
At an abstract level, the dataflow systems discussed in the last section give you a process for creating derived datasets (such as search indexes, materialized views, and predictive models) and keeping them up to date. Let’s call that process the write path : whenever some piece of information is written to the system, it may go through multiple stages of batch and stream processing, and eventually every derived dataset is updated to incorporate the data that was written. Figure 12-1 shows an example of updating a search index.
在抽象层面上,上一节中讨论的数据流系统为您提供了一种创建派生数据集(例如搜索索引、物化视图和预测模型)并使它们保持最新的过程。让我们把这个过程称为写入路径:每当某个信息被写入系统时,它可能经过多个批处理和流处理阶段,最终每个派生数据集都会更新以纳入被写入的数据。图12-1显示了更新搜索索引的示例。
But why do you create the derived dataset in the first place? Most likely because you want to query it again at a later time. This is the read path : when serving a user request you read from the derived dataset, perhaps perform some more processing on the results, and construct the response to the user.
但是你为什么首先要创建派生数据集呢?很可能是因为你想在以后的某个时间再次查询它。这是读取路径:在对用户请求进行服务时,您从派生数据集中读取,可能对结果进行更多处理,并构建向用户的响应。
Taken together, the write path and the read path encompass the whole journey of the data, from the point where it is collected to the point where it is consumed (probably by another human). The write path is the portion of the journey that is precomputed—i.e., that is done eagerly as soon as the data comes in, regardless of whether anyone has asked to see it. The read path is the portion of the journey that only happens when someone asks for it. If you are familiar with functional programming languages, you might notice that the write path is similar to eager evaluation, and the read path is similar to lazy evaluation.
总体而言,写路径和读路径涵盖了数据的整个旅程,从收集到消费(可能由另一个人)。写路径是预先计算的旅程,即不管是否有人要查看,只要数据进入,就会急切地完成。阅读路径是只有当有人要求时才发生的旅程。如果您熟悉函数式编程语言,您可能会注意到写路径类似于急切评估,而读取路径类似于延迟评估。
The derived dataset is the place where the write path and the read path meet, as illustrated in Figure 12-1 . It represents a trade-off between the amount of work that needs to be done at write time and the amount that needs to be done at read time.
派生数据集是写路径和读路径相遇的地方,如图12-1所示。它代表了需要在写入时完成的工作量和需要在读取时完成的工作量之间的权衡。
Materialized views and caching
A full-text search index is a good example: the write path updates the index, and the read path
searches the index for keywords. Both reads and writes need to do some work. Writes need to update
the index entries for all terms that appear in the document. Reads need to search for each of the
words in the query, and apply Boolean logic to find documents that contain
all
of the words in the
query (an
AND
operator), or
any
synonym of each of the words (an
OR
operator).
全文搜索索引是一个很好的例子:写入路径更新索引,读取路径搜索关键字的索引。 读取和写入都需要做一些工作。 写入需要更新文档中出现的所有术语的索引条目。 读取需要搜索查询中的每个单词,并应用布尔逻辑以查找包含查询中所有单词(AND运算符)或每个单词的任何同义词(OR运算符)的文档。
If you didn’t have an index, a search query would have to scan over all documents (like
grep
),
which would get very expensive if you had a large number of documents. No index means less work on
the write path (no index to update), but a lot more work on the read path.
如果没有索引,搜索查询将不得不扫描所有文档(就像grep一样),如果您有大量文档,这将非常昂贵。 没有索引意味着在写路径上少一些工作(没有索引需要更新),但在读路径上需要更多的工作。
On the other hand, you could imagine precomputing the search results for all possible queries. In that case, you would have less work to do on the read path: no Boolean logic, just find the results for your query and return them. However, the write path would be a lot more expensive: the set of possible search queries that could be asked is infinite, and thus precomputing all possible search results would require infinite time and storage space. That wouldn’t work so well. iii
另一方面,你可以想象为所有可能的查询预先计算搜索结果。在这种情况下,读取路径上的工作量会减少:没有布尔逻辑,只需查找您的查询结果并返回它们。然而,写入路径会变得更加昂贵:可以被询问的可能的搜索查询集是无穷的,因此预先计算所有可能的搜索结果需要无限的时间和存储空间。那样不会很好。
Another option would be to precompute the search results for only a fixed set of the most common queries, so that they can be served quickly without having to go to the index. The uncommon queries can still be served from the index. This would generally be called a cache of common queries, although we could also call it a materialized view, as it would need to be updated when new documents appear that should be included in the results of one of the common queries.
另一个选择是预先计算仅限一组最常见查询的搜索结果,以便可以快速提供它们,而无需访问索引。不常见的查询仍可以从索引中提供。这通常被称为常见查询的缓存,尽管我们也可以称之为物化视图,因为当新文档出现时,它需要更新以包含其中一个常见查询的结果。
From this example we can see that an index is not the only possible boundary between the write path
and the read path. Caching of common search results is possible, and
grep
-like scanning without
the index is also possible on a small number of documents. Viewed like this, the role of caches,
indexes, and materialized views is simple: they shift the boundary between the read path and the
write path. They allow us to do more work on the write path, by precomputing results, in order to
save effort on the read path.
从这个例子中可以看到,索引不是唯一可能的写入路径和读取路径之间的边界。可以缓存常见的搜索结果,并且类似于grep的扫描也可以在少量文档上进行,这样看来,缓存、索引和物化视图的作用很简单:它们可以将读取路径和写入路径之间的边界转移。它们允许我们在写入路径上更多地进行预计算结果的工作,以便在读取路径上节省工作量。
Shifting the boundary between work done on the write path and the read path was in fact the topic of the Twitter example at the beginning of this book, in “Describing Load” . In that example, we also saw how the boundary between write path and read path might be drawn differently for celebrities compared to ordinary users. After 500 pages we have come full circle!
将在写路径和读路径上完成的工作的边界移动实际上是本书开头的Twitter示例中的主题,在“描述负载”中。在那个例子中,我们还看到了对于名人和普通用户来说,写路径和读路径之间的边界可能会有不同的划分。经过500页,我们已经来了一圈!
Stateful, offline-capable clients
I find the idea of a boundary between write and read paths interesting because we can discuss shifting that boundary and explore what that shift means in practical terms. Let’s look at the idea in a different context.
我认为在写入和阅读路径之间设立边界的想法很有趣,因为我们可以讨论移动边界并探索它在实际方面的意义。让我们在不同的背景下看这个想法。
The huge popularity of web applications in the last two decades has led us to certain assumptions about application development that are easy to take for granted. In particular, the client/server model—in which clients are largely stateless and servers have the authority over data—is so common that we almost forget that anything else exists. However, technology keeps moving on, and I think it is important to question the status quo from time to time.
在过去的二十年中,Web 应用程序的巨大流行使我们产生了一些关于应用程序开发的假设,很容易被视为理所当然。特别是客户端/服务器模型——其中客户端大多数是无状态的,服务器对数据拥有授权——是如此普遍,以至于我们几乎忘记了其他任何东西的存在。然而,技术不断进步,我认为有必要时常质疑现状。
Traditionally, web browsers have been stateless clients that can only do useful things when you have an internet connection (just about the only thing you could do offline was to scroll up and down in a page that you had previously loaded while online). However, recent “single-page” JavaScript web apps have gained a lot of stateful capabilities, including client-side user interface interaction and persistent local storage in the web browser. Mobile apps can similarly store a lot of state on the device and don’t require a round-trip to the server for most user interactions.
传统上,网络浏览器是无状态客户端,只有在连接到互联网时才能执行有用的操作(在离线状态下,你只能浏览你之前加载过的页面)。然而,最近的“单页”JavaScript网络应用程序已经获得了许多有状态的功能,包括客户端用户界面交互和 web 浏览器中的持久本地存储。移动应用程序可以在设备上存储大量状态,并且大多数用户交互都不需要与服务器进行往返通信。
These changing capabilities have led to a renewed interest in offline-first applications that do as much as possible using a local database on the same device, without requiring an internet connection, and sync with remote servers in the background when a network connection is available [ 42 ]. Since mobile devices often have slow and unreliable cellular internet connections, it’s a big advantage for users if their user interface does not have to wait for synchronous network requests, and if apps mostly work offline (see “Clients with offline operation” ).
这些变化的能力已经导致了对离线优先应用的重新关注,这些应用尽可能使用同一设备上的本地数据库,而不需要互联网连接,并在网络连接可用时与远程服务器同步(42)。由于移动设备通常具有缓慢且不可靠的蜂窝互联网连接,如果用户界面不必等待同步网络请求,并且应用程序基本上可以离线使用,那对用户来说是一个巨大的优势(见“具有离线操作的客户端”)。
When we move away from the assumption of stateless clients talking to a central database and toward state that is maintained on end-user devices, a world of new opportunities opens up. In particular, we can think of the on-device state as a cache of state on the server . The pixels on the screen are a materialized view onto model objects in the client app; the model objects are a local replica of state in a remote datacenter [ 27 ].
当我们不再假设无状态客户端与中央数据库交互,而是向维护在终端设备上的状态转移时,就会有许多新机会出现。特别是,我们可以将设备上的状态视为服务器上状态的缓存。屏幕上的像素是客户端应用程序中模型对象的体现视图;模型对象是远程数据中心中状态的本地副本[27]。
Pushing state changes to clients
In a typical web page, if you load the page in a web browser and the data subsequently changes on the server, the browser does not find out about the change until you reload the page. The browser only reads the data at one point in time, assuming that it is static—it does not subscribe to updates from the server. Thus, the state on the device is a stale cache that is not updated unless you explicitly poll for changes. (HTTP-based feed subscription protocols like RSS are really just a basic form of polling.)
在典型的网页中,如果您在网页浏览器中加载了页面,然后数据在服务器上发生了变化,直到您重新加载页面,浏览器才能发现更改。浏览器只在某一时间点读取数据,假定它是静态的-它不订阅来自服务器的更新。因此,设备上的状态是过时的缓存,除非您显式地轮询更改,否则不会更新。(基于HTTP的订阅协议,如RSS,实际上只是一种基本的轮询形式。)
More recent protocols have moved beyond the basic request/response pattern of HTTP: server-sent events (the EventSource API) and WebSockets provide communication channels by which a web browser can keep an open TCP connection to a server, and the server can actively push messages to the browser as long as it remains connected. This provides an opportunity for the server to actively inform the end-user client about any changes to the state it has stored locally, reducing the staleness of the client-side state.
较新的协议已经超越了HTTP的基本请求/响应模式:服务器推送事件(EventSource API)和WebSockets提供了通信渠道,通过它,Web浏览器可以保持与服务器的开放TCP连接,并且只要它保持连接,服务器就可以主动将消息推送到浏览器。这为服务器提供了机会,主动向最终用户客户端通知其本地存储的任何状态更改,减少客户端状态过时的可能性。
In terms of our model of write path and read path, actively pushing state changes all the way to client devices means extending the write path all the way to the end user. When a client is first initialized, it would still need to use a read path to get its initial state, but thereafter it could rely on a stream of state changes sent by the server. The ideas we discussed around stream processing and messaging are not restricted to running only in a datacenter: we can take the ideas further, and extend them all the way to end-user devices [ 43 ].
就我们的写路径和读路径模型而言,积极地将状态变化推送到客户端设备意味着将写路径延伸到最终用户。当客户端首次初始化时,它仍需要使用读路径来获取初始状态,但之后它可以依靠服务器发送的状态更改流。我们讨论过的关于流处理和消息传递的想法并不仅限于仅在数据中心运行:我们可以进一步发挥这些想法,并将它们延伸到最终用户设备[43]。
The devices will be offline some of the time, and unable to receive any notifications of state changes from the server during that time. But we already solved that problem: in “Consumer offsets” we discussed how a consumer of a log-based message broker can reconnect after failing or becoming disconnected, and ensure that it doesn’t miss any messages that arrived while it was disconnected. The same technique works for individual users, where each device is a small subscriber to a small stream of events.
设备将有一段时间处于离线状态,无法接收服务器的状态更改通知。但我们已经解决了这个问题:在“消费者偏移量”中我们讨论了如何重新连接基于日志的消息代理的消费者,在失败或断开连接后确保不会错过任何到达的消息。同样的技术也适用于个人用户,在这里每个设备都是一个小的订阅者,订阅小的事件流。
End-to-end event streams
Recent tools for developing stateful clients and user interfaces, such as the Elm language [ 30 ] and Facebook’s toolchain of React, Flux, and Redux [ 44 ], already manage internal client-side state by subscribing to a stream of events representing user input or responses from a server, structured similarly to event sourcing (see “Event Sourcing” ).
最近用于开发有状态客户端和用户界面的工具,例如Elm编程语言[30]和Facebook的React、Flux和Redux工具链[44],已经通过订阅一系列事件来管理客户端内部状态,这些事件代表用户输入或来自服务器的响应,类似于事件溯源(见“事件溯源”)。
It would be very natural to extend this programming model to also allow a server to push state-change events into this client-side event pipeline. Thus, state changes could flow through an end-to-end write path: from the interaction on one device that triggers a state change, via event logs and through several derived data systems and stream processors, all the way to the user interface of a person observing the state on another device. These state changes could be propagated with fairly low delay—say, under one second end to end.
将该编程模型扩展到允许服务器将状态更改事件推入客户端事件管道中,这将非常自然。因此,状态更改可以通过端到端的写入路径流动:从触发状态更改的设备上的交互,通过事件日志并通过多个派生数据系统和流处理器,一直到观察另一个设备上状态的人的用户界面。这些状态更改可以以相当低的延迟(例如,端到端低于1秒)传播。
Some applications, such as instant messaging and online games, already have such a “real-time” architecture (in the sense of interactions with low delay, not in the sense of “Response time guarantees” ). But why don’t we build all applications this way?
一些应用程序,例如即时通讯和在线游戏,已经拥有这样的“实时”架构(以低延迟的相互作用为特点,而不是“响应时间保证”的意义上)。但为什么我们不把所有的应用程序都建成这样呢?
The challenge is that the assumption of stateless clients and request/response interactions is very deeply ingrained in our databases, libraries, frameworks, and protocols. Many datastores support read and write operations where a request returns one response, but much fewer provide an ability to subscribe to changes—i.e., a request that returns a stream of responses over time (see “API support for change streams” ).
挑战在于,状态无关客户端和请求/响应交互的假设已经深深地根植于我们的数据库、库、框架和协议中。许多数据存储支持读写操作,其中一个请求返回一个响应,但能够订阅更改(即,一个请求随时间返回一系列响应的能力)的能力则更少(请参见“支持更改流的API”)。
In order to extend the write path all the way to the end user, we would need to fundamentally rethink the way we build many of these systems: moving away from request/response interaction and toward publish/subscribe dataflow [ 27 ]. I think that the advantages of more responsive user interfaces and better offline support would make it worth the effort. If you are designing data systems, I hope that you will keep in mind the option of subscribing to changes, not just querying the current state.
为了将写入路径延伸到最终用户,我们需要彻底重新思考如何构建这些系统:从请求/响应交互向发布/订阅数据流转移[27]。我认为更响应的用户界面和更好的离线支持的优点会使得这个努力值得。如果您正在设计数据系统,我希望您能记住订阅更改的选项,而不仅仅是查询当前状态。
Reads are events too
We discussed that when a stream processor writes derived data to a store (database, cache, or index), and when user requests query that store, the store acts as the boundary between the write path and the read path. The store allows random-access read queries to the data that would otherwise require scanning the whole event log.
当流处理器将派生数据写入存储(数据库,缓存或索引)时,我们讨论了存储行为在写路径和读路径之间的边界。存储允许对数据进行随机访问读取查询,否则需要扫描整个事件日志。
In many cases, the data storage is separate from the streaming system. But recall that stream processors also need to maintain state to perform aggregations and joins (see “Stream Joins” ). This state is normally hidden inside the stream processor, but some frameworks allow it to also be queried by outside clients [ 45 ], turning the stream processor itself into a kind of simple database.
在许多情况下,数据存储是与流系统分开的。但要记住,流处理器也需要维护状态来执行聚合和连接(请参见“流连接”)。这种状态通常隐藏在流处理器中,但是一些框架也允许外部客户端查询它,将流处理器本身转变为一种简单的数据库。
I would like to take that idea further. As discussed so far, the writes to the store go through an event log, while reads are transient network requests that go directly to the nodes that store the data being queried. This is a reasonable design, but not the only possible one. It is also possible to represent read requests as streams of events, and send both the read events and the write events through a stream processor; the processor responds to read events by emitting the result of the read to an output stream [ 46 ].
我想进一步探讨这个想法。到目前为止,对存储的写入都经过了事件日志,而读取则是瞬态的网络请求,直接发送到存储被查询的节点。这是一个合理的设计,但不是唯一可能的设计。还可以将读请求表示为事件流,将读事件和写事件都发送到流处理器;处理器通过将读事件的结果发射到输出流来响应读事件。
When both the writes and the reads are represented as events, and routed to the same stream operator in order to be handled, we are in fact performing a stream-table join between the stream of read queries and the database. The read event needs to be sent to the database partition holding the data (see “Request Routing” ), just like batch and stream processors need to copartition inputs on the same key when joining (see “Reduce-Side Joins and Grouping” ).
当读和写都被表示为事件,并路由到同一个流运算符以进行处理时,实际上我们正在执行流表连接,将读查询流和数据库连接起来。需要发送读事件到持有数据的数据库分区(请参见“请求路由”),就像批处理和流处理器在连接时必须在相同的键上进行共同分区输入一样(请参见“Reduce-Side连接和分组”)。
This correspondence between serving requests and performing joins is quite fundamental [ 47 ]. A one-off read request just passes the request through the join operator and then immediately forgets it; a subscribe request is a persistent join with past and future events on the other side of the join.
在服务请求和执行连接之间的这种对应关系是相当基础的[47]。一次性读取请求只需通过连接运算符传递请求,然后立即忘记它;订阅请求是一种具有过去和未来事件的持久连接。
Recording a log of read events potentially also has benefits with regard to tracking causal dependencies and data provenance across a system: it would allow you to reconstruct what the user saw before they made a particular decision. For example, in an online shop, it is likely that the predicted shipping date and the inventory status shown to a customer affect whether they choose to buy an item [ 4 ]. To analyze this connection, you need to record the result of the user’s query of the shipping and inventory status.
记录所读事件的日志还可能具有跟踪系统中因果依赖和数据来源的好处:它将使您能够重建用户做出特定决策之前所看到的内容。例如,在在线商店中,预计的发货日期和向客户显示的库存状态可能会影响他们是否选择购买物品[4]。为了分析这种联系,您需要记录用户查询的发货和库存状态的结果。
Writing read events to durable storage thus enables better tracking of causal dependencies (see “Ordering events to capture causality” ), but it incurs additional storage and I/O cost. Optimizing such systems to reduce the overhead is still an open research problem [ 2 ]. But if you already log read requests for operational purposes, as a side effect of request processing, it is not such a great change to make the log the source of the requests instead.
将读取事件写入持久存储,从而实现更好的因果依赖跟踪(请参见“排序事件以捕获因果关系”),但会产生额外的存储和I/O成本。优化这种系统以减少开销仍然是一个开放的研究问题[2]。但如果您已经为运营目的记录了读取请求,作为请求处理的副作用,使日志成为请求的来源并不是一个很大的改变。
Multi-partition data processing
For queries that only touch a single partition, the effort of sending queries through a stream and collecting a stream of responses is perhaps overkill. However, this idea opens the possibility of distributed execution of complex queries that need to combine data from several partitions, taking advantage of the infrastructure for message routing, partitioning, and joining that is already provided by stream processors.
对于仅涉及单个分区的查询,通过流发送查询并收集响应的工作可能有些杀鸡焉用。然而,这个想法打开了多个分区数据结合的复杂查询的分布式执行的可能性,利用流处理器已经提供的消息路由、分区和连接基础设施。
Storm’s distributed RPC feature supports this usage pattern (see “Message passing and RPC” ). For example, it has been used to compute the number of people who have seen a URL on Twitter—i.e., the union of the follower sets of everyone who has tweeted that URL [ 48 ]. As the set of Twitter users is partitioned, this computation requires combining results from many partitions.
Storm分布式RPC功能支持这种使用模式(参见“消息传递和RPC”)。例如,它已经用于计算在Twitter上看到URL的人数,即推文该URL的每个人的追随者集的并集。由于Twitter用户集被分区,因此此计算需要组合许多分区的结果。
Another example of this pattern occurs in fraud prevention: in order to assess the risk of whether a particular purchase event is fraudulent, you can examine the reputation scores of the user’s IP address, email address, billing address, shipping address, and so on. Each of these reputation databases is itself partitioned, and so collecting the scores for a particular purchase event requires a sequence of joins with differently partitioned datasets [ 49 ].
另一个这种模式的例子发生在防止欺诈方面:为了评估特定购买事件是否欺诈的风险,您可以检查用户的IP地址,电子邮件地址,账单地址,发货地址等的声誉分数。每个声誉数据库本身都是分区的,因此收集特定购买事件的分数需要与不同分区数据集的序列连接[49]。
The internal query execution graphs of MPP databases have similar characteristics (see “Comparing Hadoop to Distributed Databases” ). If you need to perform this kind of multi-partition join, it is probably simpler to use a database that provides this feature than to implement it using a stream processor. However, treating queries as streams provides an option for implementing large-scale applications that run against the limits of conventional off-the-shelf solutions.
MPP数据库的内部查询执行图具有类似的特征(参见“将Hadoop与分布式数据库进行比较”)。如果需要执行此类多分区连接,则使用提供此功能的数据库可能比使用流处理器实现更简单。然而,将查询视为流提供了一种实现大规模应用程序的选项,这些应用程序针对传统现成解决方案的限制运行。
Aiming for Correctness
With stateless services that only read data, it is not a big deal if something goes wrong: you can fix the bug and restart the service, and everything returns to normal. Stateful systems such as databases are not so simple: they are designed to remember things forever (more or less), so if something goes wrong, the effects also potentially last forever—which means they require more careful thought [ 50 ].
对于仅读取数据的无状态服务而言,如果出了问题,也不会有太大的影响:可以修复错误并重新启动服务,一切都恢复正常。而针对数据库等有状态系统,则不那么简单:它们被设计为永久地(或多或少)保存信息,因此如果出现故障,其影响也具有潜在的永久性,这意味着需要更加谨慎地考虑。
We want to build applications that are reliable and correct (i.e., programs whose semantics are well defined and understood, even in the face of various faults). For approximately four decades, the transaction properties of atomicity, isolation, and durability ( Chapter 7 ) have been the tools of choice for building correct applications. However, those foundations are weaker than they seem: witness for example the confusion of weak isolation levels (see “Weak Isolation Levels” ).
我们希望构建可靠和正确的应用程序(即,程序具有明确定义和理解的语义,即使面对各种故障)。在大约四十年的时间里,原子性、隔离性和持久性(第7章)的事务属性一直是构建正确应用程序的首选工具。然而,这些基础比它们看起来要脆弱得多:例如,弱隔离级别的混淆(参见“弱隔离级别”)。
In some areas, transactions are being abandoned entirely and replaced with models that offer better performance and scalability, but much messier semantics (see for example “Leaderless Replication” ). Consistency is often talked about, but poorly defined (see “Consistency” and Chapter 9 ). Some people assert that we should “embrace weak consistency” for the sake of better availability, while lacking a clear idea of what that actually means in practice.
在一些领域,交易被完全放弃,被更好的性能和可伸缩性模型所取代,但语义变得更加混乱(例如“无主复制”)。一致性经常被谈论,但定义不清(参见“一致性”和第9章)。有些人声称我们应该为了更好的可用性“拥抱弱一致性”,但缺乏在实践中确切的理解。
For a topic that is so important, our understanding and our engineering methods are surprisingly flaky. For example, it is very difficult to determine whether it is safe to run a particular application at a particular transaction isolation level or replication configuration [ 51 , 52 ]. Often simple solutions appear to work correctly when concurrency is low and there are no faults, but turn out to have many subtle bugs in more demanding circumstances.
对于一个如此重要的主题,我们的理解和工程方法异常脆弱。例如,很难确定在特定事务隔离级别或复制配置下运行特定应用程序是否安全。通常,在并发性低且没有故障的情况下,简单的解决方案似乎可以正常工作,但在更苛刻的情况下却可能有许多微妙的错误。
For example, Kyle Kingsbury’s Jepsen experiments [ 53 ] have highlighted the stark discrepancies between some products’ claimed safety guarantees and their actual behavior in the presence of network problems and crashes. Even if infrastructure products like databases were free from problems, application code would still need to correctly use the features they provide, which is error-prone if the configuration is hard to understand (which is the case with weak isolation levels, quorum configurations, and so on).
例如,Kyle Kingsbury的Jepsen实验[53]已经突显了一些产品所声称的安全保证与它们在网络问题和崩溃情况下的实际行为之间的明显差异。即使基础架构产品如数据库没有问题,应用程序代码仍然需要正确地使用它们提供的功能,如果配置难以理解,这很容易出现错误(这在弱隔离级别、仲裁配置等情况下是普遍存在的)。
If your application can tolerate occasionally corrupting or losing data in unpredictable ways, life is a lot simpler, and you might be able to get away with simply crossing your fingers and hoping for the best. On the other hand, if you need stronger assurances of correctness, then serializability and atomic commit are established approaches, but they come at a cost: they typically only work in a single datacenter (ruling out geographically distributed architectures), and they limit the scale and fault-tolerance properties you can achieve.
如果你的应用程序能容忍偶尔发生的数据损坏或不可预测的丢失,那么生活就简单多了,你可能只需祈祷一下,希望一切顺利。另一方面,如果你需要更强的正确性保证,那么串行化和原子提交是已经确立的方法,但它们也有成本:它们通常只在单个数据中心中有效(排除了地理分布式架构),并且限制了你可以实现的规模和容错性能。
While the traditional transaction approach is not going away, I also believe it is not the last word in making applications correct and resilient to faults. In this section I will suggest some ways of thinking about correctness in the context of dataflow architectures.
尽管传统的交易方式并没有消失,但我相信它并非是应用程序的正确性和容错性的最终解决方案。在本节中,我将提出一些在数据流体系结构上思考正确性的方式。
The End-to-End Argument for Databases
Just because an application uses a data system that provides comparatively strong safety properties, such as serializable transactions, that does not mean the application is guaranteed to be free from data loss or corruption. For example, if an application has a bug that causes it to write incorrect data, or delete data from a database, serializable transactions aren’t going to save you.
仅仅因为一个应用程序使用了一个提供了相对较强的安全性能的数据系统(例如可串行化交易),这并不意味着该应用程序就一定没有数据丢失或损坏的风险。例如,如果一个应用程序存在错误导致其写入不正确的数据,或从数据库中删除数据,则可串行化交易并不能解救你。
This example may seem frivolous, but it is worth taking seriously: application bugs occur, and people make mistakes. I used this example in “State, Streams, and Immutability” to argue in favor of immutable and append-only data, because it is easier to recover from such mistakes if you remove the ability of faulty code to destroy good data.
这个例子可能看起来不太重要,但它值得认真对待:应用程序漏洞会发生,并且人们会犯错。在“状态、流和不可变性”中使用这个例子是为了支持不可变和仅追加数据,因为如果您消除了有故障的代码破坏良好数据的能力,那么从这些错误中恢复起来就更容易。
Although immutability is useful, it is not a cure-all by itself. Let’s look at a more subtle example of data corruption that can occur.
尽管不可变性很有用,但它本身并不能解决所有问题。让我们来看一个更微妙的数据破坏的例子。
Exactly-once execution of an operation
In “Fault Tolerance” we encountered an idea called exactly-once (or effectively-once ) semantics. If something goes wrong while processing a message, you can either give up (drop the message—i.e., incur data loss) or try again. If you try again, there is the risk that it actually succeeded the first time, but you just didn’t find out about the success, and so the message ends up being processed twice.
在“容错性”中,我们遇到了一个叫做“仅一次”(或有效执行一次)语义的概念。如果在处理消息时出现问题,你可以选择放弃(丢弃消息,即数据损失)或者重试。如果你选择重试,存在这样的风险,即第一次就成功了,但你却没有发现这个成功,导致消息被处理了两次。
Processing twice is a form of data corruption: it is undesirable to charge a customer twice for the same service (billing them too much) or increment a counter twice (overstating some metric). In this context, exactly-once means arranging the computation such that the final effect is the same as if no faults had occurred, even if the operation actually was retried due to some fault. We previously discussed a few approaches for achieving this goal.
处理两次是数据损坏的一种形式:两次收费给客户同样的服务是不可取的(过多收费),或是将计数器增加两次(夸大某些指标)。在这个情境下,确保仅处理一次的意思是将计算安排得妥当,以便在发生故障时,最终效果与未发生故障时相同,即使操作确实因某些故障而被重试。我们之前讨论过一些实现这个目标的方法。
One of the most effective approaches is to make the operation idempotent (see “Idempotence” ); that is, to ensure that it has the same effect, no matter whether it is executed once or multiple times. However, taking an operation that is not naturally idempotent and making it idempotent requires some effort and care: you may need to maintain some additional metadata (such as the set of operation IDs that have updated a value), and ensure fencing when failing over from one node to another (see “The leader and the lock” ).
其中最有效的方法之一是使操作幂等(参见“幂等性”);也就是要确保无论执行一次还是多次,它都具有相同的影响。然而,将一个本质上不幂等的操作变成幂等操作需要一些努力和小心:你可能需要维护一些额外的元数据(比如更新值的操作ID集合),并且在从一个节点故障转移到另一个节点时确保隔离(参见“领导者和锁”)。
Duplicate suppression
The same pattern of needing to suppress duplicates occurs in many other places besides stream processing. For example, TCP uses sequence numbers on packets to put them in the correct order at the recipient, and to determine whether any packets were lost or duplicated on the network. Any lost packets are retransmitted and any duplicates are removed by the TCP stack before it hands the data to an application.
许多其他场合除了流处理外,也需要压制重复模式。例如,TCP 在数据包上使用序列号将它们正确排序至接收端,并确定在网络上传输过程中是否有任何遗失或重复的数据包。任何遗失的数据包将被重新传输,任何重复的数据包将在 TCP 堆栈将数据传递给应用程序之前被删除。
However, this duplicate suppression only works within the context of a single TCP connection.
Imagine the TCP connection is a client’s connection to a database, and it is currently executing the
transaction in
Example 12-1
. In many databases, a transaction is tied to a client
connection (if the client sends several queries, the database knows that they belong to the same
transaction because they are sent on the same TCP connection). If the client suffers a network
interruption and connection timeout after sending the
COMMIT
, but before hearing back from the
database server, it does not know whether the transaction has been committed or aborted
(
Figure 8-1
).
然而,这种重复抑制仅适用于单个TCP连接的上下文中。 想象一下TCP连接是客户端连接到数据库,目前正在执行示例12-1中的事务。在许多数据库中,事务与客户端连接相关联(如果客户端发送多个查询,则数据库知道它们属于同一事务,因为它们在同一TCP连接上发送)。如果客户端在提交后但在听到数据库服务器响应之前经历了网络中断和连接超时,则无法确定事务是已提交还是已中止(图8-1)。
Example 12-1. A nonidempotent transfer of money from one account to another
BEGIN
TRANSACTION
;
UPDATE
accounts
SET
balance
=
balance
+
11
.
00
WHERE
account_id
=
1234
;
UPDATE
accounts
SET
balance
=
balance
-
11
.
00
WHERE
account_id
=
4321
;
COMMIT
;
The client can reconnect to the database and retry the transaction, but now it is outside of the scope of TCP duplicate suppression. Since the transaction in Example 12-1 is not idempotent, it could happen that $22 is transferred instead of the desired $11. Thus, even though Example 12-1 is a standard example for transaction atomicity, it is actually not correct, and real banks do not work like this [ 3 ].
客户端可以重新连接到数据库并重试事务,但此时它已经超出了TCP重复抑制的范围。由于示例12-1中的事务不是幂等的,可能会转移22美元而不是所需的11美元。因此,尽管示例12-1是事务原子性的标准示例,但实际上是不正确的,真正的银行不会像这样运作。
Two-phase commit (see “Atomic Commit and Two-Phase Commit (2PC)” ) protocols break the 1:1 mapping between a TCP connection and a transaction, since they must allow a transaction coordinator to reconnect to a database after a network fault, and tell it whether to commit or abort an in-doubt transaction. Is this sufficient to ensure that the transaction will only be executed once? Unfortunately not.
两阶段提交(参见“原子提交和两阶段提交(2PC)”)协议打破了TCP连接和事务之间的1:1映射,因为它们必须允许事务协调器在网络故障后重新连接到数据库,并告诉它是否提交或中止一个不确定的事务。这是否足以确保事务只执行一次?不幸的是,不是。
Even if we can suppress duplicate transactions between the database client and server, we still need to worry about the network between the end-user device and the application server. For example, if the end-user client is a web browser, it probably uses an HTTP POST request to submit an instruction to the server. Perhaps the user is on a weak cellular data connection, and they succeed in sending the POST, but the signal becomes too weak before they are able to receive the response from the server.
即使我们可以在数据库客户端和服务器之间抑制重复交易,我们仍然需要担心终端用户设备和应用服务器之间的网络。例如,如果终端用户客户端是Web浏览器,它可能使用HTTP POST请求向服务器提交指令。也许用户正在使用弱信号的蜂窝数据连接,他们在成功发送POST之前信号变得太弱,无法接收来自服务器的响应。
In this case, the user will probably be shown an error message, and they may retry manually. Web browsers warn, “Are you sure you want to submit this form again?”—and the user says yes, because they wanted the operation to happen. (The Post/Redirect/Get pattern [ 54 ] avoids this warning message in normal operation, but it doesn’t help if the POST request times out.) From the web server’s point of view the retry is a separate request, and from the database’s point of view it is a separate transaction. The usual deduplication mechanisms don’t help.
在这种情况下,用户可能会看到一个错误消息,并可以手动重试。Web浏览器会提醒:“您确定要再次提交此表单吗?”-用户会选择“是”,因为他们想要进行操作。(Post/Redirect/Get模式[54]可以在正常操作中避免此警告消息,但如果POST请求超时则无法起作用。)从Web服务器的角度来看,重试是一个单独的请求,而从数据库的角度来看,它是一个单独的事务。通常的去重机制对此无效。
Operation identifiers
To make the operation idempotent through several hops of network communication, it is not sufficient to rely just on a transaction mechanism provided by a database—you need to consider the end-to-end flow of the request.
使操作在网络通信的多次跳跃中具有幂等性,仅依赖数据库提供的事务机制是不够的 - 您需要考虑请求的端到端流程。
For example, you could generate a unique identifier for an operation (such as a UUID) and include it as a hidden form field in the client application, or calculate a hash of all the relevant form fields to derive the operation ID [ 3 ]. If the web browser submits the POST request twice, the two requests will have the same operation ID. You can then pass that operation ID all the way through to the database and check that you only ever execute one operation with a given ID, as shown in Example 12-2 .
例如,您可以为操作生成唯一标识符(例如UUID),并将其包括为客户端应用程序中的隐藏表单字段,或计算所有相关表单字段的哈希以推导操作ID [3]。如果浏览器提交POST请求两次,则这两个请求将具有相同的操作ID。然后,您可以将该操作ID传递到数据库,并检查您只执行具有给定ID的一个操作,如示例12-2所示。
Example 12-2. Suppressing duplicate requests using a unique ID
ALTER
TABLE
requests
ADD
UNIQUE
(
request_id
);
BEGIN
TRANSACTION
;
INSERT
INTO
requests
(
request_id
,
from_account
,
to_account
,
amount
)
VALUES
(
'0286FDB8-D7E1-423F-B40B-792B3608036C'
,
4321
,
1234
,
11
.
00
);
UPDATE
accounts
SET
balance
=
balance
+
11
.
00
WHERE
account_id
=
1234
;
UPDATE
accounts
SET
balance
=
balance
-
11
.
00
WHERE
account_id
=
4321
;
COMMIT
;
Example 12-2
relies on a uniqueness constraint on the
request_id
column. If a
transaction attempts to insert an ID that already exists, the
INSERT
fails and the transaction is
aborted, preventing it from taking effect twice. Relational databases can generally maintain a
uniqueness constraint correctly, even at weak isolation levels (whereas an application-level
check-then-insert may fail under nonserializable isolation, as discussed in
“Write Skew and Phantoms”
).
示例12-2依赖于request_id列上的唯一性约束。如果一个事务试图插入已经存在的ID,那么插入操作失败,并且事务被中止,防止其重复生效。关系型数据库通常可以正确维护唯一性约束,即使在弱隔离级别下(而应用程序级别的检查-然后插入可能在不可串行隔离下失败,如“写偏斜和幻象”中讨论的那样)。
Besides suppressing duplicate requests, the
requests
table in
Example 12-2
acts as a
kind of event log, hinting in the direction of event sourcing (see
“Event Sourcing”
).
The updates to the account balances don’t actually have to happen in the same transaction as the
insertion of the event, since they are redundant and could be derived from the request event in a
downstream consumer—as long as the event is processed exactly once, which can again be enforced
using the request ID.
除了压制重复的请求之外,示例12-2中的请求表还充当一种事件日志,提示事件源(请参见“事件源”)的方向。账户余额的更新实际上不必在与事件插入相同的事务中发生,因为它们是多余的并且可以从下游消费者的请求事件中派生出来,只要确保事件被处理一次就可以了,这可以再次使用请求ID来强制执行。
The end-to-end argument
This scenario of suppressing duplicate transactions is just one example of a more general principle called the end-to-end argument , which was articulated by Saltzer, Reed, and Clark in 1984 [ 55 ]:
压制重复交易的情景只是更为普遍的“端到端原则”的一个例子,这一原则是由Saltzer、Reed和Clark在1984年提出的。
The function in question can completely and correctly be implemented only with the knowledge and help of the application standing at the endpoints of the communication system. Therefore, providing that questioned function as a feature of the communication system itself is not possible. (Sometimes an incomplete version of the function provided by the communication system may be useful as a performance enhancement.)
该功能只有在通信系统端点应用的知识和帮助下才能完整且正确地实现。因此,将该功能作为通信系统本身的特性提供是不可能的。(有时,通信系统提供的不完整版本功能可能有益于提高性能。)
In our example, the function in question was duplicate suppression. We saw that TCP suppresses duplicate packets at the TCP connection level, and some stream processors provide so-called exactly-once semantics at the message processing level, but that is not enough to prevent a user from submitting a duplicate request if the first one times out. By themselves, TCP, database transactions, and stream processors cannot entirely rule out these duplicates. Solving the problem requires an end-to-end solution: a transaction identifier that is passed all the way from the end-user client to the database.
在我们的示例中,问题函数是重复抑制。 我们看到TCP在TCP连接级别抑制重复数据包,一些流处理器在消息处理级别提供所谓的恰好一次语义,但这还不足以防止用户在第一个超时的情况下提交重复请求。 TCP,数据库事务和流处理器本身无法完全排除这些重复。 解决问题需要一种端到端的解决方案:从最终用户客户端到数据库传递的事务标识符。
The end-to-end argument also applies to checking the integrity of data: checksums built into Ethernet, TCP, and TLS can detect corruption of packets in the network, but they cannot detect corruption due to bugs in the software at the sending and receiving ends of the network connection, or corruption on the disks where the data is stored. If you want to catch all possible sources of data corruption, you also need end-to-end checksums.
端到端的原则同样适用于检查数据的完整性:以太网、TCP和TLS内置的校验和可以检测网络中数据包的损坏,但无法检测发送和接收端连接的软件中存在的错误或数据存储盘上的损坏。如果您想要捕捉所有可能的数据损坏源,则还需要端到端校验和。
A similar argument applies with encryption [ 55 ]: the password on your home WiFi network protects against people snooping your WiFi traffic, but not against attackers elsewhere on the internet; TLS/SSL between your client and the server protects against network attackers, but not against compromises of the server. Only end-to-end encryption and authentication can protect against all of these things.
类似的论点也适用于加密。家庭 WiFi 网络的密码可以保护您的 WiFi 流量不被窥探,但无法防范互联网上的攻击者;客户端和服务器之间的 TLS/SSL 可以防范网络攻击,但无法防范服务器妥协。只有端对端加密和身份验证才能保护所有这些事情。 只有端对端加密和身份验证才能保护所有这些事情。
Although the low-level features (TCP duplicate suppression, Ethernet checksums, WiFi encryption) cannot provide the desired end-to-end features by themselves, they are still useful, since they reduce the probability of problems at the higher levels. For example, HTTP requests would often get mangled if we didn’t have TCP putting the packets back in the right order. We just need to remember that the low-level reliability features are not by themselves sufficient to ensure end-to-end correctness.
尽管底层特征(TCP重复抑制,以太网校验和,WiFi加密)本身不能提供所需的端到端功能,但它们仍然很有用,因为它们可以减少高层级出现问题的概率。例如,如果没有TCP将数据包按正确顺序排列,HTTP请求就很容易出错。我们只需要记住,仅靠底层可靠性特征是不足以确保端到端正确性的。
Applying end-to-end thinking in data systems
This brings me back to my original thesis: just because an application uses a data system that provides comparatively strong safety properties, such as serializable transactions, that does not mean the application is guaranteed to be free from data loss or corruption. The application itself needs to take end-to-end measures, such as duplicate suppression, as well.
这让我回到我的原始论点:即使一个应用程序使用了一个提供了相对较强的安全属性的数据系统,比如可串行化事务,也不意味着该应用程序必然没有数据丢失或损坏的风险。应用程序本身需要采取端到端的措施,例如重复抑制。
That is a shame, because fault-tolerance mechanisms are hard to get right. Low-level reliability mechanisms, such as those in TCP, work quite well, and so the remaining higher-level faults occur fairly rarely. It would be really nice to wrap up the remaining high-level fault-tolerance machinery in an abstraction so that application code needn’t worry about it—but I fear that we have not yet found the right abstraction.
很遗憾,因为容错机制很难正确实现。低层可靠性机制(例如 TCP 中的机制)工作得相当不错,因此剩余的高层故障出现得相当罕见。将剩余的高层容错机制封装在一个抽象层中,以使应用程序代码无需担心它,这将非常好,但我担心我们还没有找到正确的抽象层。
Transactions have long been seen as a good abstraction, and I do believe that they are useful. As discussed in the introduction to Chapter 7 , they take a wide range of possible issues (concurrent writes, constraint violations, crashes, network interruptions, disk failures) and collapse them down to two possible outcomes: commit or abort. That is a huge simplification of the programming model, but I fear that it is not enough.
交易长期以来被视为一个很好的抽象层,我相信它们是有用的。就像第七章引言中所讨论的那样,它们解决了一系列可能出现的问题(并发写入、约束违规、崩溃、网络中断、磁盘故障)并将它们简化成两种可能的结果:提交或中止。这是程序模型的一个巨大简化,但我担心这还不够。
Transactions are expensive, especially when they involve heterogeneous storage technologies (see “Distributed Transactions in Practice” ). When we refuse to use distributed transactions because they are too expensive, we end up having to reimplement fault-tolerance mechanisms in application code. As numerous examples throughout this book have shown, reasoning about concurrency and partial failure is difficult and counterintuitive, and so I suspect that most application-level mechanisms do not work correctly. The consequence is lost or corrupted data.
交易很昂贵,特别是涉及异构存储技术(参见“实践中的分布式事务”)。当我们拒绝使用分布式事务,因为它们太昂贵时,我们最终不得不在应用程序代码中重新实现容错机制。正如本书中的许多例子所示,推理并发性和部分故障是困难和反直觉的,因此我怀疑大多数应用级机制都不正确地工作。其结果是丢失或损坏的数据。
For these reasons, I think it is worth exploring fault-tolerance abstractions that make it easy to provide application-specific end-to-end correctness properties, but also maintain good performance and good operational characteristics in a large-scale distributed environment.
出于这些原因,我认为值得探索容错抽象,以实现易于提供特定于应用程序的端到端正确性属性,同时在大规模分布式环境中维护良好的性能和运行特性。
Enforcing Constraints
Let’s think about correctness in the context of the ideas around unbundling databases ( “Unbundling Databases” ). We saw that end-to-end duplicate suppression can be achieved with a request ID that is passed all the way from the client to the database that records the write. What about other kinds of constraints?
让我们在“数据库分离”这个概念的语境下考虑正确性。我们发现,通过将请求ID从客户端传递到记录写入操作的数据库,可以实现端到端的重复数据抑制。那么其他类型的约束条件呢?
In particular, let’s focus on uniqueness constraints—such as the one we relied on in Example 12-2 . In “Constraints and uniqueness guarantees” we saw several other examples of application features that need to enforce uniqueness: a username or email address must uniquely identify a user, a file storage service cannot have more than one file with the same name, and two people cannot book the same seat on a flight or in a theater.
特别是,让我们专注于唯一性约束——例如我们在示例12-2中所依赖的那个。在“约束和唯一性保证”中,我们看到了其他几个需要强制唯一性的应用程序特性的例子:用户名或电子邮件地址必须唯一地标识用户,文件存储服务不能有多个具有相同名称的文件,两个人不能预订同一班机票或剧院的座位。
Other kinds of constraints are very similar: for example, ensuring that an account balance never goes negative, that you don’t sell more items than you have in stock in the warehouse, or that a meeting room does not have overlapping bookings. Techniques that enforce uniqueness can often be used for these kinds of constraints as well.
其他类型的限制非常相似:例如,确保账户余额从不为负,仓库中不售出超过库存的物品,或者会议室不出现时间冲突的预订。强制唯一性的技术通常也可以用于这些类型的限制。
Uniqueness constraints require consensus
In Chapter 9 we saw that in a distributed setting, enforcing a uniqueness constraint requires consensus: if there are several concurrent requests with the same value, the system somehow needs to decide which one of the conflicting operations is accepted, and reject the others as violations of the constraint.
在第9章中,我们看到在分布式环境中,强制唯一性约束需要达成一致:如果有多个具有相同值的并发请求,系统需要决定哪个相冲突的操作被接受,并拒绝其他违反约束的操作。
The most common way of achieving this consensus is to make a single node the leader, and put it in charge of making all the decisions. That works fine as long as you don’t mind funneling all requests through a single node (even if the client is on the other side of the world), and as long as that node doesn’t fail. If you need to tolerate the leader failing, you’re back at the consensus problem again (see “Single-leader replication and consensus” ).
达成这种共识的最常见方法是让一个单点成为领导者,并让其负责做出所有决策。只要您不介意将所有请求(即使客户端在世界的另一端)通过一个节点进行传递,并且只要该节点不会失败,这种方法就可以很好地工作。如果您需要容忍领导者失败,则又回到了共识问题(请参见“单领导者复制和共识”)。
Uniqueness checking can be scaled out by partitioning based on the value that needs to be unique. For example, if you need to ensure uniqueness by request ID, as in Example 12-2 , you can ensure all requests with the same request ID are routed to the same partition (see Chapter 6 ). If you need usernames to be unique, you can partition by hash of username.
可以通过基于需要保证唯一性的值进行分区来扩展唯一性检查。例如,如果您需要像示例12-2中所示通过请求ID保证唯一性,您可以确保具有相同请求ID的所有请求路由到同一个分区(参见第6章)。如果您需要用户名唯一,则可以通过用户名的哈希进行分区。
However, asynchronous multi-master replication is ruled out, because it could happen that different masters concurrently accept conflicting writes, and thus the values are no longer unique (see “Implementing Linearizable Systems” ). If you want to be able to immediately reject any writes that would violate the constraint, synchronous coordination is unavoidable [ 56 ].
然而,异步多主复制被排除,因为不同的主机同时接受冲突写入可能会发生,导致值不再唯一(参见“实现线性化系统”)。 如果您想立即拒绝任何违反约束的写入,则无法避免同步协调 [56]。
Uniqueness in log-based messaging
The log ensures that all consumers see messages in the same order—a guarantee that is formally known as total order broadcast and is equivalent to consensus (see “Total Order Broadcast” ). In the unbundled database approach with log-based messaging, we can use a very similar approach to enforce uniqueness constraints.
日志确保所有消费者按照相同顺序查看消息,这是一种正式称为全序广播的保证,并等同于共识(参见“全序广播”)。在基于日志的消息传递的非捆绑数据库方法中,我们可以使用非常相似的方法来强制执行唯一性约束。
A stream processor consumes all the messages in a log partition sequentially on a single thread (see “Logs compared to traditional messaging” ). Thus, if the log is partitioned based on the value that needs to be unique, a stream processor can unambiguously and deterministically decide which one of several conflicting operations came first. For example, in the case of several users trying to claim the same username [ 57 ]:
流处理器会在单个线程上按顺序消费日志分区中的所有消息(请参见“日志与传统消息比较”)。因此,如果日志根据需要唯一的值进行分区,流处理器可以明确且确定地判断几个发生冲突的操作中的哪一个先发生。例如,在多个用户试图声明相同用户名的情况下[57]:`,请返回仅翻译内容,不包括原始文本。
-
Every request for a username is encoded as a message, and appended to a partition determined by the hash of the username.
每个用户名的请求都被编码为消息,并附加到由用户名哈希确定的分区。
-
A stream processor sequentially reads the requests in the log, using a local database to keep track of which usernames are taken. For every request for a username that is available, it records the name as taken and emits a success message to an output stream. For every request for a username that is already taken, it emits a rejection message to an output stream.
流处理器按顺序读取日志中的请求,使用本地数据库来跟踪哪些用户名已被占用。对于每一个请求可用的用户名,它记录为已占用的名称,并向输出流发出成功消息。对于已经被占用的用户名的每一个请求,它向输出流发出拒绝消息。
-
The client that requested the username watches the output stream and waits for a success or rejection message corresponding to its request.
请求用户名的客户端会监视输出流,并等待与其请求对应的成功或拒绝信息。
This algorithm is basically the same as in “Implementing linearizable storage using total order broadcast” . It scales easily to a large request throughput by increasing the number of partitions, as each partition can be processed independently.
该算法与“使用总序广播实现可线性化存储”的算法基本相同,通过增加分区的数量可以轻松扩展到大量请求吞吐量,因为每个分区可以独立处理。
The approach works not only for uniqueness constraints, but also for many other kinds of constraints. Its fundamental principle is that any writes that may conflict are routed to the same partition and processed sequentially. As discussed in “What is a conflict?” and “Write Skew and Phantoms” , the definition of a conflict may depend on the application, but the stream processor can use arbitrary logic to validate a request. This idea is similar to the approach pioneered by Bayou in the 1990s [ 58 ].
这种方法不仅适用于唯一性约束,也适用于许多其他类型的约束。其基本原则是任何可能产生冲突的写入都被路由到同一个分区并按顺序处理。如在“什么是冲突?”和“写入偏斜和虚幻数据”中所讨论的,冲突的定义可能取决于应用程序,但流处理器可以使用任意逻辑来验证请求。这个想法类似于1990年代Bayou开创的方法。
Multi-partition request processing
Ensuring that an operation is executed atomically, while satisfying constraints, becomes more interesting when several partitions are involved. In Example 12-2 , there are potentially three partitions: the one containing the request ID, the one containing the payee account, and the one containing the payer account. There is no reason why those three things should be in the same partition, since they are all independent from each other.
当涉及到多个分区时,确保操作在满足约束条件的情况下原子执行变得更加有趣。在示例12-2中,可能有三个分区:包含请求ID的分区,包含收款人账户的分区和包含付款人账户的分区。这三件事没有理由在同一个分区中,因为它们彼此独立。
In the traditional approach to databases, executing this transaction would require an atomic commit across all three partitions, which essentially forces it into a total order with respect to all other transactions on any of those partitions. Since there is now cross-partition coordination, different partitions can no longer be processed independently, so throughput is likely to suffer.
在传统数据库的方法中,执行此交易将需要在所有三个分区上执行原子提交,这基本上会将其强制变成与任何一个分区上的其他交易相对应的总顺序。由于现在存在跨分区协调,因此不同分区不能再独立处理,因此吞吐量很可能会受到影响。
However, it turns out that equivalent correctness can be achieved with partitioned logs, and without an atomic commit:
然而,实际上可以通过分区日志实现等效的正确性,而无需进行原子提交。
-
The request to transfer money from account A to account B is given a unique request ID by the client, and appended to a log partition based on the request ID.
客户在从账户A转账至账户B时,为该请求指定唯一请求ID并根据该ID将其附加到日志分区中。
-
A stream processor reads the log of requests. For each request message it emits two messages to output streams: a debit instruction to the payer account A (partitioned by A), and a credit instruction to the payee account B (partitioned by B). The original request ID is included in those emitted messages.
流处理器读取请求日志。对于每个请求消息,它向输出流发出两条消息:一条向付款人账户A发出的借方指令(按A分区),以及一条向收款人账户B发出的贷方指令(按B分区)。原始请求ID包含在这些发出的消息中。
-
Further processors consume the streams of credit and debit instructions, deduplicate by request ID, and apply the changes to the account balances.
进一步的处理器消耗信用和借记指令流,按请求ID进行去重,并将更改应用于账户余额。
Steps 1 and 2 are necessary because if the client directly sent the credit and debit instructions, it would require an atomic commit across those two partitions to ensure that either both or neither happen. To avoid the need for a distributed transaction, we first durably log the request as a single message, and then derive the credit and debit instructions from that first message. Single-object writes are atomic in almost all data systems (see “Single-object writes” ), and so the request either appears in the log or it doesn’t, without any need for a multi-partition atomic commit.
步骤1和2是必要的,因为如果客户直接发送信用和借记指令,它将需要跨这两个分区进行原子提交,以确保两者都发生或都不发生。为避免需要分布式事务,我们首先将请求持久记录为单个消息,然后从第一个消息派生信用和借记指令。 在几乎所有数据系统中,单一物体写入是原子的(请参阅“单一物体写入”),因此请求要么出现在日志中,要么不出现,没有需要进行多个分区的原子提交。
If the stream processor in step 2 crashes, it resumes processing from its last checkpoint. In doing so, it does not skip any request messages, but it may process requests multiple times and produce duplicate credit and debit instructions. However, since it is deterministic, it will just produce the same instructions again, and the processors in step 3 can easily deduplicate them using the end-to-end request ID.
如果步骤2中的流处理器崩溃,它会从最后一个检查点恢复处理。这样做,它不会跳过任何请求消息,但可能会处理多个请求并产生重复的信用和借记指令。但是,由于它是确定性的,它将再次生成相同的指令,步骤3中的处理器可以使用端到端的请求ID轻松地去重。
If you want to ensure that the payer account is not overdrawn by this transfer, you can additionally have a stream processor (partitioned by payer account number) that maintains account balances and validates transactions. Only valid transactions would then be placed in the request log in step 1.
如果您想确保通过此转账不会使支付帐户透支,您还可以使用流处理器(按付款人帐户号分区)来维护帐户余额和验证交易。然后,只有有效交易才会被放置在步骤1的请求日志中。 如需翻译其他内容,请告知。
By breaking down the multi-partition transaction into two differently partitioned stages and using the end-to-end request ID, we have achieved the same correctness property (every request is applied exactly once to both the payer and payee accounts), even in the presence of faults, and without using an atomic commit protocol. The idea of using multiple differently partitioned stages is similar to what we discussed in “Multi-partition data processing” (see also “Concurrency control” ).
通过将多分区事务分解为两个不同分区的阶段并使用端到端请求ID,我们已经实现了相同的正确性属性(每个请求都会应用于付款者和收款人帐户,并且即使存在故障,也不需要使用原子提交协议)。使用多个不同分区的阶段的想法类似于我们在“多分区数据处理”中讨论的内容(请参见“并发控制”)。
Timeliness and Integrity
A convenient property of transactions is that they are typically linearizable (see “Linearizability” ): that is, a writer waits until a transaction is committed, and thereafter its writes are immediately visible to all readers.
事务的一个方便性质是它们通常是可线性化(见“线性化”)的:即写者等待事务提交后,它的写入立即对所有读者可见。
This is not the case when unbundling an operation across multiple stages of stream processors: consumers of a log are asynchronous by design, so a sender does not wait until its message has been processed by consumers. However, it is possible for a client to wait for a message to appear on an output stream. This is what we did in “Uniqueness in log-based messaging” when checking whether a uniqueness constraint was satisfied.
当在流处理器的多个阶段中分解操作时,情况并非如此: 日志的消费者通常是异步的设计,所以发送者不必等待消息被消费者处理。但是,客户端可以等待消息出现在输出流上。这就是我们在“基于日志的消息传递中的唯一性”中检查唯一性约束是否满足时所做的事情。 (Translated: 在将操作拆分到多个流处理器阶段时,情况则不同:日志的消费者通常是异步的设计,因此发送者无需等待消息被消费者处理。但是,客户端可以等待在输出流中出现消息。这就是在“基于日志的消息传递中的唯一性”中检查唯一性约束是否满足时所做的操作。)
In this example, the correctness of the uniqueness check does not depend on whether the sender of the message waits for the outcome. The waiting only has the purpose of synchronously informing the sender whether or not the uniqueness check succeeded, but this notification can be decoupled from the effects of processing the message.
在这个例子中,唯一性检查的正确性不取决于消息的发送者是否等待结果。等待只是为了同步地通知发送者唯一性检查是否成功,但这种通知可以与处理消息的效果分离。
More generally, I think the term consistency conflates two different requirements that are worth considering separately:
更一般地说,我认为一致性一词混淆了两个不同的要求,值得分别考虑:
- Timeliness
-
Timeliness means ensuring that users observe the system in an up-to-date state. We saw previously that if a user reads from a stale copy of the data, they may observe it in an inconsistent state (see “Problems with Replication Lag” ). However, that inconsistency is temporary, and will eventually be resolved simply by waiting and trying again.
及时性意味着确保用户在系统处于最新状态。我们之前看到,如果用户从过时的数据副本中读取,他们可能会以不一致的状态观察它(请参见“复制滞后的问题”)。但是,这种不一致性是暂时的,只需等待并再次尝试即可解决。
The CAP theorem (see “The Cost of Linearizability” ) uses consistency in the sense of linearizability, which is a strong way of achieving timeliness. Weaker timeliness properties like read-after-write consistency (see “Reading Your Own Writes” ) can also be useful.
CAP定理(参见“ 线性化的成本”)使用线性可达性的一致性,这是实现及时性的强有力的方式。像读取后写入一致性(参见“读取您自己的写入”)这样的较弱及时性属性也可以很有用。
- Integrity
-
Integrity means absence of corruption; i.e., no data loss, and no contradictory or false data. In particular, if some derived dataset is maintained as a view onto some underlying data (see “Deriving current state from the event log” ), the derivation must be correct. For example, a database index must correctly reflect the contents of the database—an index in which some records are missing is not very useful.
诚信意味着没有腐败;即没有数据丢失,也没有矛盾或虚假的数据。特别是,如果某个派生数据集作为对某些基本数据的视图进行维护(请参见“从事件日志中推导当前状态”),则此推导必须是正确的。例如,数据库索引必须正确反映数据库的内容 - 如果索引缺少某些记录,则它并不是很有用。
If integrity is violated, the inconsistency is permanent: waiting and trying again is not going to fix database corruption in most cases. Instead, explicit checking and repair is needed. In the context of ACID transactions (see “The Meaning of ACID” ), consistency is usually understood as some kind of application-specific notion of integrity. Atomicity and durability are important tools for preserving integrity.
如果完整性受到破坏,不一致性将是永久的:在大多数情况下,等待和重试不能修复数据库损坏。相反,需要进行明确的检查和修复。在ACID事务的上下文中(见“ACID的含义”),一致性通常被理解为某种应用程序特定的完整性概念。原子性和耐久性是保持完整性的重要工具。 如果完整性受到破坏,无法再次尝试修复,需要进行明确的检查和修复。在ACID事务的上下文中,一致性是完整性的应用程序特定概念,而原子性和耐久性则是保持完整性的重要工具。
In slogan form: violations of timeliness are “eventual consistency,” whereas violations of integrity are “perpetual inconsistency.”
准时性的违规行为是“最终一致性”,而完整性的违规行为是“永久不一致性”。
I am going to assert that in most applications, integrity is much more important than timeliness. Violations of timeliness can be annoying and confusing, but violations of integrity can be catastrophic.
在大多数应用中,诚实比及时更重要。时效性的违反可能会令人恼火和困惑,但诚实的违反可能会造成灾难性的后果。
For example, on your credit card statement, it is not surprising if a transaction that you made within the last 24 hours does not yet appear—it is normal that these systems have a certain lag. We know that banks reconcile and settle transactions asynchronously, and timeliness is not very important here [ 3 ]. However, it would be very bad if the statement balance was not equal to the sum of the transactions plus the previous statement balance (an error in the sums), or if a transaction was charged to you but not paid to the merchant (disappearing money). Such problems would be violations of the integrity of the system.
例如,在您的信用卡账单上,如果您在过去的 24 小时内进行的交易尚未出现,这并不奇怪——这些系统通常会有一定的滞后。我们知道,银行会异步协调和结算交易,时间性并不是非常重要 [3]。然而,如果账单余额与交易金额加上上期账单余额不相等(求和错误),或者如果某笔交易已经从您的账户中扣款但没有支付给商家(款项消失),这将非常糟糕。这些问题会违反系统的完整性。
Correctness of dataflow systems
ACID transactions usually provide both timeliness (e.g., linearizability) and integrity (e.g., atomic commit) guarantees. Thus, if you approach application correctness from the point of view of ACID transactions, the distinction between timeliness and integrity is fairly inconsequential.
ACID交易通常提供及时性(例如,线性可操作性)和完整性(例如,原子提交)保证。因此,如果您从ACID交易的角度考虑应用程序的正确性,那么及时性和完整性之间的区别相当不重要。
On the other hand, an interesting property of the event-based dataflow systems that we have discussed in this chapter is that they decouple timeliness and integrity. When processing event streams asynchronously, there is no guarantee of timeliness, unless you explicitly build consumers that wait for a message to arrive before returning. But integrity is in fact central to streaming systems.
另一方面,我们在本章中讨论的基于事件的数据流系统具有一个有趣的特性,即它们将时效性和完整性解耦。异步处理事件流时,除非显式构建等待消息到达的消费者,否则无法保证时效性。但事实上,完整性对于流系统而言是至关重要的。
Exactly-once or effectively-once semantics (see “Fault Tolerance” ) is a mechanism for preserving integrity. If an event is lost, or if an event takes effect twice, the integrity of a data system could be violated. Thus, fault-tolerant message delivery and duplicate suppression (e.g., idempotent operations) are important for maintaining the integrity of a data system in the face of faults.
确保一次性或有效性语义(参见“容错”)是保持完整性的机制。如果事件丢失或事件有效两次,则可能破坏数据系统的完整性。因此,在面对故障时,具有容错的消息传递和消除重复项(例如幂等操作)对于维护数据系统的完整性非常重要。
As we saw in the last section, reliable stream processing systems can preserve integrity without requiring distributed transactions and an atomic commit protocol, which means they can potentially achieve comparable correctness with much better performance and operational robustness. We achieved this integrity through a combination of mechanisms:
正如我们在上一节中所看到的,可靠的流处理系统可以在不需要分布式事务和原子提交协议的情况下保持完整性,这意味着它们可以潜在地实现可比性强得多的正确性,同时具有更好的性能和操作韧性。我们通过以下机制实现了这种完整性:
-
Representing the content of the write operation as a single message, which can easily be written atomically—an approach that fits very well with event sourcing (see “Event Sourcing” )
将写操作的内容表示为单个消息,可以轻松地原子性地写入 - 这种方法非常适合事件溯源(请参阅“事件溯源”)。
-
Deriving all other state updates from that single message using deterministic derivation functions, similarly to stored procedures (see “Actual Serial Execution” and “Application code as a derivation function” )
使用确定性派生函数从那个单一的消息中推导出所有其他的状态更新,类似于存储过程(参见“实际串行执行”和“应用代码作为派生函数”)
-
Passing a client-generated request ID through all these levels of processing, enabling end-to-end duplicate suppression and idempotence
将由客户端生成的请求ID通过所有级别的处理,实现端到端的重复抑制和幂等性。
-
Making messages immutable and allowing derived data to be reprocessed from time to time, which makes it easier to recover from bugs (see “Advantages of immutable events” )
使消息不可变,并允许从时间到时间重新处理派生数据,这使得从错误中恢复更加容易(见“不可变事件的优点”)。
This combination of mechanisms seems to me a very promising direction for building fault-tolerant applications in the future.
这种机制的组合在未来构建容错性应用方面似乎是一个非常有前途的方向。
Loosely interpreted constraints
As discussed previously, enforcing a uniqueness constraint requires consensus, typically implemented by funneling all events in a particular partition through a single node. This limitation is unavoidable if we want the traditional form of uniqueness constraint, and stream processing cannot avoid it.
如先前所讨论的,强制执行唯一性约束需要共识,通常通过将特定分区中的所有事件导向单个节点来实现。如果我们想要传统形式的唯一性约束,那么这个限制是无法避免的,流处理也无法避免它。
However, another thing to realize is that many real applications can actually get away with much weaker notions of uniqueness:
然而,需要认识到的另一件事情是许多真实应用实际上可以使用较弱的唯一性概念来满足。
-
If two people concurrently register the same username or book the same seat, you can send one of them a message to apologize, and ask them to choose a different one. This kind of change to correct a mistake is called a compensating transaction [ 59 , 60 ].
如果两个人同时注册相同的用户名或预订相同的座位,您可以向其中一个发送一条道歉信息,并请他们选择另一个。 这种更改以纠正错误的方式被称为补偿性交易[59,60]。
-
If customers order more items than you have in your warehouse, you can order in more stock, apologize to customers for the delay, and offer them a discount. This is actually the same as what you’d have to do if, say, a forklift truck ran over some of the items in your warehouse, leaving you with fewer items in stock than you thought you had [ 61 ]. Thus, the apology workflow already needs to be part of your business processes anyway, and so it might be unnecessary to require a linearizable constraint on the number of items in stock.
如果客户订购的货品超出了仓库存货数量,你可以订购更多的存货,为客户道歉并提供折扣。这其实与如果货车在仓库里压坏了部分货物,导致你的存货比你预计的要少所遵循的程序是一样的。因此,道歉的流程已经需要成为你业务流程的一部分,所以可能不需要在存货数量方面施加线性化约束。
-
Similarly, many airlines overbook airplanes in the expectation that some passengers will miss their flight, and many hotels overbook rooms, expecting that some guests will cancel. In these cases, the constraint of “one person per seat” is deliberately violated for business reasons, and compensation processes (refunds, upgrades, providing a complimentary room at a neighboring hotel) are put in place to handle situations in which demand exceeds supply. Even if there was no overbooking, apology and compensation processes would be needed in order to deal with flights being cancelled due to bad weather or staff on strike—recovering from such issues is just a normal part of business [ 3 ].
同样地,许多航空公司会为了预计会有一些乘客会错过航班而超售。许多酒店也会超预订客房,预计会有一些客人会取消预订。在这些情况下,“每个座位一人”的限制出于商业原因被故意违反,并采取了赔偿措施(退款、升级、提供附近酒店的免费客房)来处理需求超过供应的情况。即使没有超售,由于恶劣天气或工人罢工导致航班取消,也需要道歉和补偿处理流程——恢复这些问题只是业务中的常规流程。[3]
-
If someone withdraws more money than they have in their account, the bank can charge them an overdraft fee and ask them to pay back what they owe. By limiting the total withdrawals per day, the risk to the bank is bounded.
如果有人从他们的账户中提取比他们拥有的钱更多的钱,银行可以收取透支费并要求他们偿还所欠款项。通过限制每日的总提款额,银行的风险得到了限制。
In many business contexts, it is actually acceptable to temporarily violate a constraint and fix it up later by apologizing. The cost of the apology (in terms of money or reputation) varies, but it is often quite low: you can’t unsend an email, but you can send a follow-up email with a correction. If you accidentally charge a credit card twice, you can refund one of the charges, and the cost to you is just the processing fees and perhaps a customer complaint. Once money has been paid out of an ATM, you can’t directly get it back, although in principle you can send debt collectors to recover the money if the account was overdrawn and the customer won’t pay it back.
在许多商业环境中,暂时违反某个限制并稍后通过道歉进行修复实际上是可以接受的。道歉的成本(以金钱或声誉为代价)各不相同,但通常相当低:您无法撤回一封电子邮件,但可以发送一封纠正性的后续电子邮件。如果您无意中多次收取了信用卡,您可以退还其中一笔费用,成本只有处理费和可能的客户投诉。一旦从自动取款机中支付了钱,您无法直接收回,但原则上如果账户透支且客户不愿意偿还,您可以派遣公司收债人员来收回钱款。
Whether the cost of the apology is acceptable is a business decision. If it is acceptable, the traditional model of checking all constraints before even writing the data is unnecessarily restrictive, and a linearizable constraint is not needed. It may well be a reasonable choice to go ahead with a write optimistically, and to check the constraint after the fact. You can still ensure that the validation occurs before doing things that would be expensive to recover from, but that doesn’t imply you must do the validation before you even write the data.
道歉的成本是否可接受是一個商業決策。如果可接受,傳統的模式在寫入數據之前檢查所有限制是不必要的限制,並且不需要可線性化約束。可能會合理地選擇樂觀地寫入,並在事實發生後檢查約束。您仍然可以確保驗證在進行昂貴的恢復操作之前發生,但這並不意味著您必須在寫入數據之前進行驗證。
These applications do require integrity: you would not want to lose a reservation, or have money disappear due to mismatched credits and debits. But they don’t require timeliness on the enforcement of the constraint: if you have sold more items than you have in the warehouse, you can patch up the problem after the fact by apologizing. Doing so is similar to the conflict resolution approaches we discussed in “Handling Write Conflicts” .
这些应用程序需要完整性:您不希望因不匹配的借贷记录而失去预订或丢失资金。但是,它们不需要在约束执行方面及时:如果您已经销售了比仓库中还要多的物品,则可以事后通过道歉来解决问题。这样做类似于我们在“处理写操作冲突”中讨论的冲突解决方法。
Coordination-avoiding data systems
We have now made two interesting observations:
我们现在做出了两个有趣的观察。
-
Dataflow systems can maintain integrity guarantees on derived data without atomic commit, linearizability, or synchronous cross-partition coordination.
数据流系统可以在不进行原子提交、线性化或同步跨分区协调的情况下,保持派生数据的完整性保证。
-
Although strict uniqueness constraints require timeliness and coordination, many applications are actually fine with loose constraints that may be temporarily violated and fixed up later, as long as integrity is preserved throughout.
尽管严格的唯一性约束需要及时性和协调,但是许多应用程序实际上可以使用宽松的约束,这些约束可能会暂时违反并稍后修复,只要在整个过程中维护完整性即可。
Taken together, these observations mean that dataflow systems can provide the data management services for many applications without requiring coordination, while still giving strong integrity guarantees. Such coordination-avoiding data systems have a lot of appeal: they can achieve better performance and fault tolerance than systems that need to perform synchronous coordination [ 56 ].
综上所述,这些观察结果意味着,数据流系统可以为许多应用程序提供数据管理服务,而不需要协调,同时仍然提供强大的完整性保证。这样避免协调的数据系统具有很大的吸引力:它们可以比需要进行同步协调的系统实现更好的性能和容错性。
For example, such a system could operate distributed across multiple datacenters in a multi-leader configuration, asynchronously replicating between regions. Any one datacenter can continue operating independently from the others, because no synchronous cross-region coordination is required. Such a system would have weak timeliness guarantees—it could not be linearizable without introducing coordination—but it can still have strong integrity guarantees.
例如,这样一个系统可以在多个数据中心中以多个领导者的配置进行分布式运作,在区域之间异步复制。任何一个数据中心都可以继续独立运作,因为不需要同步的跨区域协调。这样一个系统将具有较弱的时效性保证——如果不引入协调,它就无法变成可线性化的,但它仍然可以具有强大的完整性保证。
In this context, serializable transactions are still useful as part of maintaining derived state, but they can be run at a small scope where they work well [ 8 ]. Heterogeneous distributed transactions such as XA transactions (see “Distributed Transactions in Practice” ) are not required. Synchronous coordination can still be introduced in places where it is needed (for example, to enforce strict constraints before an operation from which recovery is not possible), but there is no need for everything to pay the cost of coordination if only a small part of an application needs it [ 43 ].
在这种情况下,可序列化的事务仍然可以作为维护派生状态的一部分而有用,但它们可以在适当的小范围内运行[8]。不需要异构分布式事务,如XA事务(见“实践中的分布式事务”)。同步协调仍然可以在需要的地方引入(例如,在无法从中恢复的操作之前强制执行严格约束),但是如果应用程序的只有一小部分需要,则不需要为所有内容支付协调成本[43]。
Another way of looking at coordination and constraints: they reduce the number of apologies you have to make due to inconsistencies, but potentially also reduce the performance and availability of your system, and thus potentially increase the number of apologies you have to make due to outages. You cannot reduce the number of apologies to zero, but you can aim to find the best trade-off for your needs—the sweet spot where there are neither too many inconsistencies nor too many availability problems.
另一种看待协调和约束的方式:它们减少了由于不一致而必须道歉的次数,但可能会降低系统的性能和可用性,从而可能增加因故障而必须道歉的次数。你不能将道歉次数降到零,但可以努力寻找最佳平衡点,即既没有太多的不一致,也没有太多的可用性问题。
Trust, but Verify
All of our discussion of correctness, integrity, and fault-tolerance has been under the assumption
that certain things might go wrong, but other things won’t. We call these assumptions our
system
model
(see
“Mapping system models to the real world”
): for example, we should assume that processes can
crash, machines can suddenly lose power, and the network can arbitrarily delay or drop messages. But
we might also assume that data written to disk is not lost after
fsync
, that data in memory is not
corrupted, and that the multiplication instruction of our CPU always returns the correct result.
我们所讨论的正确性、完整性和容错性都是在假设某些东西可能出错,但其他东西不会出错的前提下进行的。我们称这些假设为我们的系统模型(参见“将系统模型映射到真实世界”):例如,我们应该假设进程可能会崩溃,机器可能会突然断电,网络可能会任意延迟或丢弃消息。但我们也可以假设,经过fsync写入磁盘的数据不会丢失,内存中的数据不会损坏,我们的CPU的乘法指令始终返回正确的结果。
These assumptions are quite reasonable, as they are true most of the time, and it would be difficult to get anything done if we had to constantly worry about our computers making mistakes. Traditionally, system models take a binary approach toward faults: we assume that some things can happen, and other things can never happen. In reality, it is more a question of probabilities: some things are more likely, other things less likely. The question is whether violations of our assumptions happen often enough that we may encounter them in practice.
这些假设是相当合理的,因为它们大多数时候都是正确的,如果我们不停地担心电脑出错会很难继续工作。传统的系统模型对故障采取二元方法:我们假设有些问题可能发生,有些则永远不会发生。实际上,这更多地是一个概率问题:有些事情更可能发生,而其他事情则不太可能。问题在于我们的假设是否经常被违反,以至于我们在实践中会遇到它们。
We have seen that data can become corrupted while it is sitting untouched on disks (see “Replication and Durability” ), and data corruption on the network can sometimes evade the TCP checksums (see “Weak forms of lying” ). Maybe this is something we should be paying more attention to?
我们已经看到数据在磁盘上闲置时可能会变得损坏(参见“复制和耐用性”),而网络上的数据损坏有时可能会逃避TCP校验和(参见“虚假的弱形式”)。也许这是我们应该更加关注的事情?
One application that I worked on in the past collected crash reports from clients, and some of the reports we received could only be explained by random bit-flips in the memory of those devices. It seems unlikely, but if you have enough devices running your software, even very unlikely things do happen. Besides random memory corruption due to hardware faults or radiation, certain pathological memory access patterns can flip bits even in memory that has no faults [ 62 ]—an effect that can be used to break security mechanisms in operating systems [ 63 ] (this technique is known as rowhammer ). Once you look closely, hardware isn’t quite the perfect abstraction that it may seem.
过去我曾经开发过一个应用程序,收集客户机的崩溃报告,我们收到的一些报告只能通过设备内存中的随机位翻转来解释。虽然看起来很不可能,但如果你有足够多的设备运行你的软件,即使非常不可能的事情也会发生。 除了由于硬件故障或辐射而导致的随机内存损坏,某些病理性的内存访问模式甚至会翻转没有故障的内存位。这种效果可以用来破解操作系统中的安全机制(这种技术被称为行哈默)。一旦你仔细观察,硬件并不是它看起来那么完美的抽象概念。
To be clear, random bit-flips are still very rare on modern hardware [ 64 ]. I just want to point out that they are not beyond the realm of possibility, and so they deserve some attention.
要明确的是,对于现代硬件来说,随机位翻转仍然是非常罕见的[64]。我只是想指出它们不是不可能发生的,因此应该引起一些关注。 请注意,这是翻译的内容,不包括原文。
Maintaining integrity in the face of software bugs
Besides such hardware issues, there is always the risk of software bugs, which would not be caught by lower-level network, memory, or filesystem checksums. Even widely used database software has bugs: I have personally seen cases of MySQL failing to correctly maintain a uniqueness constraint [ 65 ] and PostgreSQL’s serializable isolation level exhibiting write skew anomalies [ 66 ], even though MySQL and PostgreSQL are robust and well-regarded databases that have been battle-tested by many people for many years. In less mature software, the situation is likely to be much worse.
除了这类硬件问题,还存在软件错误的风险,这些错误不会被低级网络、内存或文件系统校验和捕捉到。即使广泛使用的数据库软件也存在错误:我个人曾见过MySQL未能正确维护唯一性约束[65],而PostgreSQL的可串行化隔离级别会展现出写入偏差异常[66],尽管MySQL和PostgreSQL是经过许多人多年的实践检验的强大和评价极高的数据库。在不太成熟的软件中,情况很可能会更糟。
Despite considerable efforts in careful design, testing, and review, bugs still creep in. Although they are rare, and they eventually get found and fixed, there is still a period during which such bugs can corrupt data.
尽管进行了仔细的设计、测试和审查,但错误仍会悄悄地潜入。虽然它们很少见,最终会被发现和修复,但在此期间,这些错误仍可能破坏数据。
When it comes to application code, we have to assume many more bugs, since most applications don’t receive anywhere near the amount of review and testing that database code does. Many applications don’t even correctly use the features that databases offer for preserving integrity, such as foreign key or uniqueness constraints [ 36 ].
谈到应用程序代码,我们必须假设存在更多的错误,因为大多数应用程序不接受任何接近数据库代码的审查和测试。许多应用程序甚至没有正确使用数据库提供的维护完整性的功能,例如外键或唯一性约束[36]。
Consistency in the sense of ACID (see “Consistency” ) is based on the idea that the database starts off in a consistent state, and a transaction transforms it from one consistent state to another consistent state. Thus, we expect the database to always be in a consistent state. However, this notion only makes sense if you assume that the transaction is free from bugs. If the application uses the database incorrectly in some way, for example using a weak isolation level unsafely, the integrity of the database cannot be guaranteed.
ACID的一致性(请参阅“一致性”)是基于这样的想法,即数据库始终处于一种一致的状态,并且事务将其从一种一致的状态转换为另一种一致的状态。因此,我们期望数据库始终处于一种一致的状态。然而,只有在假定事务没有漏洞的情况下,这种概念才有意义。如果应用程序以某种方式不正确地使用数据库,例如不安全地使用弱隔离级别,则无法保证数据库的完整性。
Don’t just blindly trust what they promise
With both hardware and software not always living up to the ideal that we would like them to be, it seems that data corruption is inevitable sooner or later. Thus, we should at least have a way of finding out if data has been corrupted so that we can fix it and try to track down the source of the error. Checking the integrity of data is known as auditing .
由于硬件和软件不总是如我们期望的那样完美,数据损坏似乎迟早是不可避免的。因此,我们至少应该有一种方法,能够检测数据是否已被破坏,以便我们可以修复它并尝试追踪错误的源头。检查数据的完整性被称为审计。
As discussed in “Advantages of immutable events” , auditing is not just for financial applications. However, auditability is highly important in finance precisely because everyone knows that mistakes happen, and we all recognize the need to be able to detect and fix problems.
正如“不可变事件的优点”中所讨论的,审计不仅仅适用于财务应用。然而,审计在财务领域中非常重要,因为每个人都知道错误会发生,我们都认识到需要能够检测和解决问题。
Mature systems similarly tend to consider the possibility of unlikely things going wrong, and manage that risk. For example, large-scale storage systems such as HDFS and Amazon S3 do not fully trust disks: they run background processes that continually read back files, compare them to other replicas, and move files from one disk to another, in order to mitigate the risk of silent corruption [ 67 ].
成熟的系统通常考虑到不太可能出错的情况,并管理风险。例如,大规模存储系统(如HDFS和Amazon S3)不完全信任磁盘:它们运行后台进程,不断读取文件,将其与其他副本进行比较,并将文件从一个磁盘移动到另一个磁盘,以减轻静默损坏的风险 [67]。
If you want to be sure that your data is still there, you have to actually read it and check. Most of the time it will still be there, but if it isn’t, you really want to find out sooner rather than later. By the same argument, it is important to try restoring from your backups from time to time—otherwise you may only find out that your backup is broken when it is too late and you have already lost data. Don’t just blindly trust that it is all working.
如果您想确保您的数据仍然存在,您必须实际阅读并检查它。大多数情况下它仍然存在,但如果它没有了,最好尽早找出来。同样的道理,定期尝试从备份恢复数据也很重要 - 否则当您已经丢失数据时才发现备份已损坏就太晚了。不要盲目相信所有东西都正常运行。
A culture of verification
Systems like HDFS and S3 still have to assume that disks work correctly most of the time—which is a reasonable assumption, but not the same as assuming that they always work correctly. However, not many systems currently have this kind of “trust, but verify” approach of continually auditing themselves. Many assume that correctness guarantees are absolute and make no provision for the possibility of rare data corruption. I hope that in the future we will see more self-validating or self-auditing systems that continually check their own integrity, rather than relying on blind trust [ 68 ].
像HDFS和S3这样的系统仍然需要假设磁盘大部分时间都能正常工作 - 这是一个合理的假设,但不同于假设它们总是正常工作。然而,目前不多有系统采取这种“信任但验证”的方法不断地审核自己。许多人假定正确性保证是绝对的,并没有为稀有数据损坏的可能性作出任何安排。我希望未来会看到更多自我验证或自我审核的系统,它们会不断检查自己的完整性,而不是依赖盲目的信任。[68]
I fear that the culture of ACID databases has led us toward developing applications on the basis of blindly trusting technology (such as a transaction mechanism), and neglecting any sort of auditability in the process. Since the technology we trusted worked well enough most of the time, auditing mechanisms were not deemed worth the investment.
我担心ACID数据库文化导致我们盲目信任技术(如事务机制)来开发应用程序,忽略了任何形式的审计。由于我们信任的技术大部分时间表现良好,审计机制被认为不值得投资。
But then the database landscape changed: weaker consistency guarantees became the norm under the banner of NoSQL, and less mature storage technologies became widely used. Yet, because the audit mechanisms had not been developed, we continued building applications on the basis of blind trust, even though this approach had now become more dangerous. Let’s think for a moment about designing for auditability.
然而,数据库格局发生了变化:在NoSQL的旗帜下,更弱的一致性保证成为了常态,不太成熟的存储技术也被广泛使用。然而,由于审计机制还未开发出来,我们继续在盲目信任的基础上构建应用程序,即使这种方法现在变得更加危险。让我们为可审计性设计思考一会儿。
Designing for auditability
If a transaction mutates several objects in a database, it is difficult to tell after the fact what that transaction means. Even if you capture the transaction logs (see “Change Data Capture” ), the insertions, updates, and deletions in various tables do not necessarily give a clear picture of why those mutations were performed. The invocation of the application logic that decided on those mutations is transient and cannot be reproduced.
如果一项交易导致数据库中的多个对象发生变化,事后很难确定该交易的含义。即使您捕获了交易日志(请参阅“更改数据捕获”),各表格中的插入、更新和删除并不一定能清楚地解释为什么要进行这些变更。决定这些变更的应用程序逻辑的调用是瞬时的,无法重现。
By contrast, event-based systems can provide better auditability. In the event sourcing approach, user input to the system is represented as a single immutable event, and any resulting state updates are derived from that event. The derivation can be made deterministic and repeatable, so that running the same log of events through the same version of the derivation code will result in the same state updates.
相比之下,基于事件的系统可以提供更好的审计能力。在事件溯源的方法中,用户对系统的输入被表示为单个不可变的事件,并且任何由该事件导致的状态更新均来自该事件。可以使推导过程具有确定性和可重复性,以便通过相同版本的推导代码运行相同的事件日志将导致相同的状态更新。
Being explicit about dataflow (see “Philosophy of batch process outputs” ) makes the provenance of data much clearer, which makes integrity checking much more feasible. For the event log, we can use hashes to check that the event storage has not been corrupted. For any derived state, we can rerun the batch and stream processors that derived it from the event log in order to check whether we get the same result, or even run a redundant derivation in parallel.
明确数据流程(参见“批处理输出哲学”),使数据来源更加清晰,从而使完整性检查更加可行。对于事件日志,我们可以使用哈希来检查事件存储是否已经损坏。对于任何派生状态,我们可以重新运行批处理和流处理器,以从事件日志中派生它,以检查是否获得相同的结果,甚至可以并行运行冗余派生过程。
A deterministic and well-defined dataflow also makes it easier to debug and trace the execution of a system in order to determine why it did something [ 4 , 69 ]. If something unexpected occurred, it is valuable to have the diagnostic capability to reproduce the exact circumstances that led to the unexpected event—a kind of time-travel debugging capability.
确定性和明确定义的数据流也使得调试和跟踪系统执行更容易,以确定为什么系统执行特定动作。如果发生了意外事件,拥有诊断能力可以重现导致意外事件发生的确切环境,这种时间旅行般的调试能力非常有价值。
The end-to-end argument again
If we cannot fully trust that every individual component of the system will be free from corruption—that every piece of hardware is fault-free and that every piece of software is bug-free—then we must at least periodically check the integrity of our data. If we don’t check, we won’t find out about corruption until it is too late and it has caused some downstream damage, at which point it will be much harder and more expensive to track down the problem.
如果我们不能完全信任系统的每个组件都没有受到损坏 - 每个硬件部件无故障,每个软件部分无漏洞 - 那么我们至少必须定期检查我们的数据的完整性。如果我们不检查,我们将无法发现损坏,直到它太晚并且已经引起了某些下游损失,此时,追踪问题将更加困难和昂贵。
Checking the integrity of data systems is best done in an end-to-end fashion (see “The End-to-End Argument for Databases” ): the more systems we can include in an integrity check, the fewer opportunities there are for corruption to go unnoticed at some stage of the process. If we can check that an entire derived data pipeline is correct end to end, then any disks, networks, services, and algorithms along the path are implicitly included in the check.
检查数据系统的完整性最好以端到端的方式进行(参见“数据库的端到端论证”):我们可以包括在完整性检查中的系统越多,就越少机会让某个阶段的数据损坏未被发现。如果我们可以检查整个导出数据管道的端到端是否正确,那么路径上的任何磁盘、网络、服务和算法都会被隐含地包括在检查中。
Having continuous end-to-end integrity checks gives you increased confidence about the correctness of your systems, which in turn allows you to move faster [ 70 ]. Like automated testing, auditing increases the chances that bugs will be found quickly, and thus reduces the risk that a change to the system or a new storage technology will cause damage. If you are not afraid of making changes, you can much better evolve an application to meet changing requirements.
持续的端到端完整性检查让您对系统的正确性更加有信心,从而让您能够更快地前进[70]。 像自动化测试一样,审计增加了发现错误的机会,从而减少了对系统更改或新存储技术造成损害的风险。 如果您不害怕进行更改,您可以更好地发展一个应用程序,以满足不断变化的需求。
Tools for auditable data systems
At present, not many data systems make auditability a top-level concern. Some applications implement their own audit mechanisms, for example by logging all changes to a separate audit table, but guaranteeing the integrity of the audit log and the database state is still difficult. A transaction log can be made tamper-proof by periodically signing it with a hardware security module, but that does not guarantee that the right transactions went into the log in the first place.
目前,不多的数据系统将可审计性作为最高层次的关注点。有些应用程序通过将所有更改记录到单独的审计表中来实现自己的审计机制,但保证审计日志和数据库状态的完整性仍然很困难。一个事务日志可以通过定期使用硬件安全模块签名来防篡改,但这并不保证正确的事务首先进入日志。
It would be interesting to use cryptographic tools to prove the integrity of a system in a way that is robust to a wide range of hardware and software issues, and even potentially malicious actions. Cryptocurrencies, blockchains, and distributed ledger technologies such as Bitcoin, Ethereum, Ripple, Stellar, and various others [ 71 , 72 , 73 ] have sprung up to explore this area.
使用密码工具以一种坚固的方式来证明系统的完整性,可以应对各种硬件、软件问题,甚至潜在的恶意行为,这将是十分有趣的。此领域相继出现了各种加密货币,区块链以及分布式分类账技术,如比特币,以太坊,瑞波币,恒星币等等。
I am not qualified to comment on the merits of these technologies as currencies or mechanisms for agreeing contracts. However, from a data systems point of view they contain some interesting ideas. Essentially, they are distributed databases, with a data model and transaction mechanism, in which different replicas can be hosted by mutually untrusting organizations. The replicas continually check each other’s integrity and use a consensus protocol to agree on the transactions that should be executed.
我不具备评论这些技术作为货币或合同达成机制的优点的资格。然而,从数据系统的角度来看,它们包含了一些有趣的想法。本质上,它们是分布式数据库,具有数据模型和事务机制,在其中不同副本可以由相互不信任的组织托管。这些副本不断检查彼此的完整性,并使用共识协议来达成应执行的交易。
I am somewhat skeptical about the Byzantine fault tolerance aspects of these technologies (see “Byzantine Faults” ), and I find the technique of proof of work (e.g., Bitcoin mining) extraordinarily wasteful. The transaction throughput of Bitcoin is rather low, albeit for political and economic reasons more than for technical ones. However, the integrity checking aspects are interesting.
我对这些技术的拜占庭容错方面持有一定的怀疑(参见“拜占庭错误”),而且我发现工作证明技术(例如比特币挖矿)非常浪费。比特币的交易吞吐量相对较低,尽管原因更多是政治和经济因素,而非技术原因。然而,完整性检查方面很有趣。
Cryptographic auditing and integrity checking often relies on Merkle trees [ 74 ], which are trees of hashes that can be used to efficiently prove that a record appears in some dataset (and a few other things). Outside of the hype of cryptocurrencies, certificate transparency is a security technology that relies on Merkle trees to check the validity of TLS/SSL certificates [ 75 , 76 ].
数字加密审计和完整性检查通常依赖于默克尔树[74],这是一种可用于有效地证明记录出现在某个数据集中(以及其他一些事情)的哈希树。除了加密货币的炒作外,证书透明度是一种依赖于Merkle树来检查TLS/SSL证书有效性的安全技术[75,76]。
I could imagine integrity-checking and auditing algorithms, like those of certificate transparency and distributed ledgers, becoming more widely used in data systems in general. Some work will be needed to make them equally scalable as systems without cryptographic auditing, and to keep the performance penalty as low as possible. But I think this is an interesting area to watch in the future.
我可以想象诸如证书透明度和分布式账本之类的完整性检查和审计算法在数据系统中更广泛地使用。需要进行一些工作,使它们与没有加密审计的系统同样具有可扩展性,并将性能惩罚降至最低。但我认为这是未来值得关注的一个有趣领域。
Doing the Right Thing
In the final section of this book, I would like to take a step back. Throughout this book we have examined a wide range of different architectures for data systems, evaluated their pros and cons, and explored techniques for building reliable, scalable, and maintainable applications. However, we have left out an important and fundamental part of the discussion, which I would now like to fill in.
在这本书的最后一节中,我想退后一步。在整本书中,我们审查了各种不同的数据系统架构,评估了它们的优缺点,并探索了建立可靠、可扩展和易于维护的应用程序的技术。然而,我们忽略了讨论的一个重要和基本部分,现在我想填补这个空缺。
Every system is built for a purpose; every action we take has both intended and unintended consequences. The purpose may be as simple as making money, but the consequences for the world may reach far beyond that original purpose. We, the engineers building these systems, have a responsibility to carefully consider those consequences and to consciously decide what kind of world we want to live in.
每个系统都是为一个目的而建立的;我们所采取的每个行动都有意图和无意图的后果。这个目的可能只是为了赚钱,但对于世界的影响可能远远超出了最初的目的。作为构建这些系统的工程师,我们有责任认真考虑这些后果,并有意识地决定我们想要生活在什么样的世界中。
We talk about data as an abstract thing, but remember that many datasets are about people: their behavior, their interests, their identity. We must treat such data with humanity and respect. Users are humans too, and human dignity is paramount.
我们谈论数据是一件抽象的事情,但是请记住,许多数据集关乎人们:他们的行为、他们的兴趣、他们的身份。我们必须以人性和尊重对待这样的数据。用户也是人类,人类的尊严至关重要。
Software development increasingly involves making important ethical choices. There are guidelines to help software engineers navigate these issues, such as the ACM’s Software Engineering Code of Ethics and Professional Practice [ 77 ], but they are rarely discussed, applied, and enforced in practice. As a result, engineers and product managers sometimes take a very cavalier attitude to privacy and potential negative consequences of their products [ 78 , 79 , 80 ].
软件开发越来越需要做出重要的道德选择。有一些指南可以帮助软件工程师应对这些问题,比如ACM的《软件工程师道德和职业实践准则》[77],但实际上很少有讨论、应用和执行。因此,工程师和产品经理有时会对隐私和潜在的负面影响采取非常草率的态度[78, 79, 80]。
A technology is not good or bad in itself—what matters is how it is used and how it affects people. This is true for a software system like a search engine in much the same way as it is for a weapon like a gun. I think it is not sufficient for software engineers to focus exclusively on the technology and ignore its consequences: the ethical responsibility is ours to bear also. Reasoning about ethics is difficult, but it is too important to ignore.
技术本身并不是好或坏的,重要的是它如何被使用以及它对人们的影响。这一点对于像搜索引擎这样的软件系统与像枪这样的武器一样正确。我认为,仅仅把注意力集中在技术上而忽略其后果是不够的:我们也要承担道德责任。道德推理是很困难的,但是它太重要了不能忽视。
Predictive Analytics
For example, predictive analytics is a major part of the “Big Data” hype. Using data analysis to predict the weather, or the spread of diseases, is one thing [ 81 ]; it is another matter to predict whether a convict is likely to reoffend, whether an applicant for a loan is likely to default, or whether an insurance customer is likely to make expensive claims. The latter have a direct effect on individual people’s lives.
预测分析是“大数据”热潮的重要组成部分。利用数据分析预测天气或疾病传播一事是一回事;而预测罪犯是否有可能再次犯罪、贷款申请人是否有可能违约以及保险客户是否可能提出高额索赔则另当别论。后者直接影响个人的生活。
Naturally, payment networks want to prevent fraudulent transactions, banks want to avoid bad loans, airlines want to avoid hijackings, and companies want to avoid hiring ineffective or untrustworthy people. From their point of view, the cost of a missed business opportunity is low, but the cost of a bad loan or a problematic employee is much higher, so it is natural for organizations to want to be cautious. If in doubt, they are better off saying no.
自然而然,支付网络希望防止欺诈交易,银行希望避免不良贷款,航空公司希望避免劫机,公司希望避免雇用无效或不可信任的人。从他们的角度来看,错过商机的成本很低,但不良贷款或有问题的员工的成本要高得多,因此组织希望保持谨慎。如果有疑问,他们最好拒绝。
However, as algorithmic decision-making becomes more widespread, someone who has (accurately or falsely) been labeled as risky by some algorithm may suffer a large number of those “no” decisions. Systematically being excluded from jobs, air travel, insurance coverage, property rental, financial services, and other key aspects of society is such a large constraint of the individual’s freedom that it has been called “algorithmic prison” [ 82 ]. In countries that respect human rights, the criminal justice system presumes innocence until proven guilty; on the other hand, automated systems can systematically and arbitrarily exclude a person from participating in society without any proof of guilt, and with little chance of appeal.
然而,随着算法决策越来越普及,某些算法准确或错误地将某人标记为风险,可能导致大量"不"的决定。系统性地被排除在就业机会、航空旅行、保险覆盖、物业租赁、金融服务以及社会的其他关键方面之外,是对个人自由的如此巨大的限制,以至于被称为"算法监狱"。在尊重人权的国家,刑事司法系统假定无罪,除非被证明有罪;另一方面,自动化系统可以系统性和武断地排除一个人参与社会,而没有任何有罪证据,并且几乎没有上诉的机会。
Bias and discrimination
Decisions made by an algorithm are not necessarily any better or any worse than those made by a human. Every person is likely to have biases, even if they actively try to counteract them, and discriminatory practices can become culturally institutionalized. There is hope that basing decisions on data, rather than subjective and instinctive assessments by people, could be more fair and give a better chance to people who are often overlooked in the traditional system [ 83 ].
算法做出的决定并不一定比人类做出的更好或更差。即使人们积极尝试抵消偏见,每个人都可能有偏见,而歧视性做法可能会成为文化制度。有希望的是,基于数据而不是人们主观和本能的评估来做出决策,可能会更公平,并为那些在传统系统中经常被忽视的人提供更好的机会。
When we develop predictive analytics systems, we are not merely automating a human’s decision by using software to specify the rules for when to say yes or no; we are even leaving the rules themselves to be inferred from data. However, the patterns learned by these systems are opaque: even if there is some correlation in the data, we may not know why. If there is a systematic bias in the input to an algorithm, the system will most likely learn and amplify that bias in its output [ 84 ].
当我们开发预测分析系统时,我们不仅仅是通过软件来指定何时说“是”或“否”来自动化人类的决策;我们甚至将规则本身留给了从数据中推断。然而,这些系统学习到的模式是不透明的:即使数据中存在某种相关性,我们可能也不知道为什么。如果算法的输入存在系统性偏见,系统很可能会在其输出中学习并放大这种偏见。
In many countries, anti-discrimination laws prohibit treating people differently depending on protected traits such as ethnicity, age, gender, sexuality, disability, or beliefs. Other features of a person’s data may be analyzed, but what happens if they are correlated with protected traits? For example, in racially segregated neighborhoods, a person’s postal code or even their IP address is a strong predictor of race. Put like this, it seems ridiculous to believe that an algorithm could somehow take biased data as input and produce fair and impartial output from it [ 85 ]. Yet this belief often seems to be implied by proponents of data-driven decision making, an attitude that has been satirized as “machine learning is like money laundering for bias” [ 86 ].
在许多国家,反歧视法律禁止因为受保护特征(如种族、年龄、性别、性取向、残疾或信仰)而对人们进行不同待遇。尽管可以分析一个人的其他特征,但如果这些特征与受保护特征相关联会发生什么呢?例如,在种族隔离的社区中,一个人的邮政编码甚至其IP地址是种族的强预测因素。这样说听起来似乎荒谬,认为算法可以从有偏见的数据中产生公正和客观的结果。然而,支持数据驱动决策的倡导者似乎常常暗示这种信念,这种态度被讽刺为“机器学习就像是对偏见进行的洗钱”。
Predictive analytics systems merely extrapolate from the past; if the past is discriminatory, they codify that discrimination. If we want the future to be better than the past, moral imagination is required, and that’s something only humans can provide [ 87 ]. Data and models should be our tools, not our masters.
预测分析系统只是从过去进行推断;如果过去存在歧视,它们会将这种歧视编码下来。如果我们想让未来比过去更好,就需要道德想象力,而这只有人类才能提供。数据和模型应该成为我们的工具,而不是我们的主人。
Responsibility and accountability
Automated decision making opens the question of responsibility and accountability [ 87 ]. If a human makes a mistake, they can be held accountable, and the person affected by the decision can appeal. Algorithms make mistakes too, but who is accountable if they go wrong [ 88 ]? When a self-driving car causes an accident, who is responsible? If an automated credit scoring algorithm systematically discriminates against people of a particular race or religion, is there any recourse? If a decision by your machine learning system comes under judicial review, can you explain to the judge how the algorithm made its decision?
自动化决策引发了责任和问责的问题[87]。如果人类犯错,他们可以被追究责任,而受到决策影响的人可以上诉。算法也会犯错,但如果出现问题,谁来负责[88]?自动驾驶汽车造成事故,谁应负责?如果自动信用评分算法系统性地歧视某个种族或宗教的人,有没有任何救济措施?如果你的机器学习系统的决策受到司法审查,你能向法官认真解释算法是如何做出决策的吗?
Credit rating agencies are an old example of collecting data to make decisions about people. A bad credit score makes life difficult, but at least a credit score is normally based on relevant facts about a person’s actual borrowing history, and any errors in the record can be corrected (although the agencies normally do not make this easy). However, scoring algorithms based on machine learning typically use a much wider range of inputs and are much more opaque, making it harder to understand how a particular decision has come about and whether someone is being treated in an unfair or discriminatory way [ 89 ].
信用评级机构是收集关于个人的数据以做出决策的经典例子。糟糕的信用评分会让生活变得难以承受,但至少信用评分通常基于一个人实际借贷历史的相关事实,而且记录中的任何错误都可以纠正(尽管这些机构通常不会让这个过程变得简单)。但是,基于机器学习的评分算法通常使用更广泛的输入内容,并且更加难以理解一个特定决策是如何产生的,以及某人是否受到了不公或歧视的待遇。[89]
A credit score summarizes “How did you behave in the past?” whereas predictive analytics usually work on the basis of “Who is similar to you, and how did people like you behave in the past?” Drawing parallels to others’ behavior implies stereotyping people, for example based on where they live (a close proxy for race and socioeconomic class). What about people who get put in the wrong bucket? Furthermore, if a decision is incorrect due to erroneous data, recourse is almost impossible [ 87 ].
信用评分总结了“您过去的行为如何?”而预测分析通常基于“谁与您相似,人们如何像您过去的行为?”将其他人的行为与您进行比较意味着对人进行刻板印象,例如基于他们居住的地方(接近代表种族和社会经济阶层)。那些被错误分类的人怎么办?此外,如果决策由于错误数据而不正确,几乎没有任何救济[87]。
Much data is statistical in nature, which means that even if the probability distribution on the whole is correct, individual cases may well be wrong. For example, if the average life expectancy in your country is 80 years, that doesn’t mean you’re expected to drop dead on your 80th birthday. From the average and the probability distribution, you can’t say much about the age to which one particular person will live. Similarly, the output of a prediction system is probabilistic and may well be wrong in individual cases.
许多数据具有统计性质,这就意味着即使整体的概率分布是正确的,个别情况也可能是错误的。例如,如果你所在国家的平均寿命是80岁,这并不意味着你在80岁生日那一天就会离世。从平均值和概率分布来看,你并不能太多地说出一个人会活多少岁。同样地,预测系统的输出是概率性的,也可能在个别情况下出现错误。
A blind belief in the supremacy of data for making decisions is not only delusional, it is positively dangerous. As data-driven decision making becomes more widespread, we will need to figure out how to make algorithms accountable and transparent, how to avoid reinforcing existing biases, and how to fix them when they inevitably make mistakes.
对于盲目相信数据占据决策至高无上地位的观点,不仅是一厢情愿的,而且是极其危险的。随着数据驱动的决策越来越普及,我们需要想办法让算法具有责任和透明度,避免强化现有偏见,以及在它们必然犯错时如何进行修复。
We will also need to figure out how to prevent data being used to harm people, and realize its positive potential instead. For example, analytics can reveal financial and social characteristics of people’s lives. On the one hand, this power could be used to focus aid and support to help those people who most need it. On the other hand, it is sometimes used by predatory business seeking to identify vulnerable people and sell them risky products such as high-cost loans and worthless college degrees [ 87 , 90 ].
我们还需要想办法防止数据被用来伤害人们,而是要发挥其正面潜力。例如,分析可以揭示人们生活的财务和社会特征。一方面,这种力量可以用来集中援助和支持,帮助那些最需要帮助的人。另一方面,这有时会被掠夺性企业用来识别脆弱的人并向他们销售高成本贷款和毫无价值的大学学位。
Feedback loops
Even with predictive applications that have less immediately far-reaching effects on people, such as recommendation systems, there are difficult issues that we must confront. When services become good at predicting what content users want to see, they may end up showing people only opinions they already agree with, leading to echo chambers in which stereotypes, misinformation, and polarization can breed. We are already seeing the impact of social media echo chambers on election campaigns [ 91 ].
即使是具有对人们影响较少的预测应用程序,如推荐系统,我们也必须面对困难的问题。当服务变得擅长预测用户想要看到的内容时,它们可能最终只向人们展示他们已经同意的观点,导致回声室,其中刻板印象,错误信息和极化可能滋生。我们已经看到社交媒体回音室对选举活动的影响。
When predictive analytics affect people’s lives, particularly pernicious problems arise due to self-reinforcing feedback loops. For example, consider the case of employers using credit scores to evaluate potential hires. You may be a good worker with a good credit score, but suddenly find yourself in financial difficulties due to a misfortune outside of your control. As you miss payments on your bills, your credit score suffers, and you will be less likely to find work. Joblessness pushes you toward poverty, which further worsens your scores, making it even harder to find employment [ 87 ]. It’s a downward spiral due to poisonous assumptions, hidden behind a camouflage of mathematical rigor and data.
当预测性分析影响人们的生活时,由于自我加强的反馈循环,特别是会导致恶性问题。例如,考虑雇主使用信用评分来评估潜在的雇员。你可能是一位有着良好信用评分的好工作者,但是由于在你的控制之外的不幸事件而陷入财务困境。随着你未能按时支付账单,你的信用评分会受到影响,你将不太可能找到工作。失业将把你推向贫困,进一步恶化你的得分,使找到工作更加困难。这是由有毒的假设引起的恶性循环,隐藏在数学严谨和数据的伪装下。
We can’t always predict when such feedback loops happen. However, many consequences can be predicted by thinking about the entire system (not just the computerized parts, but also the people interacting with it)—an approach known as systems thinking [ 92 ]. We can try to understand how a data analysis system responds to different behaviors, structures, or characteristics. Does the system reinforce and amplify existing differences between people (e.g., making the rich richer or the poor poorer), or does it try to combat injustice? And even with the best intentions, we must beware of unintended consequences.
我们不能总是预测这样的反馈循环何时发生。然而,通过考虑整个系统(不仅仅是计算机化部分,还有与之交互的人),可以预测许多后果——这被称为系统思维的方法[92]。 我们可以尝试了解数据分析系统对不同行为、结构或特征的反应。系统是否加强和放大人们之间的现有差异(例如,让富人更富或贫穷更贫),还是试图对抗不公正?即使有着最好的意图,我们也必须谨防意外后果。 我们无法始终预测此类反馈循环何时发生。但是,通过考虑整个系统(不仅仅是计算机化部分,而且还有与之交互的人),可以预测许多后果,这是一种称为系统思维的方法[92]。 我们可以尝试了解数据分析系统对不同行为、结构或特征的反应。该系统是否加强和放大人们之间的现有差异(例如,让富人更富或穷人更穷),或者试图对抗不公正?即使有着最好的意图,我们也必须小心意外后果。
Privacy and Tracking
Besides the problems of predictive analytics—i.e., using data to make automated decisions about people—there are ethical problems with data collection itself. What is the relationship between the organizations collecting data and the people whose data is being collected?
除了预测分析的问题——即使用数据来做出有关人的自动化决策之外,数据收集本身也存在伦理问题。收集数据的组织与数据被收集的人之间的关系是什么?
When a system only stores data that a user has explicitly entered, because they want the system to store and process it in a certain way, the system is performing a service for the user: the user is the customer. But when a user’s activity is tracked and logged as a side effect of other things they are doing, the relationship is less clear. The service no longer just does what the user tells it to do, but it takes on interests of its own, which may conflict with the user’s interests.
当系统仅存储用户显式输入的数据,因为他们希望系统以特定的方式存储和处理它时,系统正在为用户提供服务:用户是客户。但是,当用户的活动被跟踪和记录为他们正在做其他事情的副作用时,关系就不太清楚了。该服务不再仅仅是按照用户的指示执行操作,而是具有自己的利益,这可能与用户的利益冲突。
Tracking behavioral data has become increasingly important for user-facing features of many online services: tracking which search results are clicked helps improve the ranking of search results; recommending “people who liked X also liked Y” helps users discover interesting and useful things; A/B tests and user flow analysis can help indicate how a user interface might be improved. Those features require some amount of tracking of user behavior, and users benefit from them.
追踪行为数据对于许多在线服务的用户界面功能变得越来越重要:跟踪点击哪些搜索结果有助于改善搜索结果的排名;推荐“喜欢X的人也喜欢Y”有助于用户发现有趣和有用的东西;A/B测试和用户流分析可以帮助指示如何改进用户界面。这些功能需要追踪用户行为的一定量,用户也从中受益。
However, depending on a company’s business model, tracking often doesn’t stop there. If the service is funded through advertising, the advertisers are the actual customers, and the users’ interests take second place. Tracking data becomes more detailed, analyses become further-reaching, and data is retained for a long time in order to build up detailed profiles of each person for marketing purposes.
不过,根据企业的商业模式,跟踪往往不止于此。如果服务是通过广告资助的,那么广告客户才是真正的客户,而用户的利益则排在第二位。跟踪数据变得更加详细,分析变得更加深入,数据也会长时间保留,以建立每个人的详细营销档案。
Now the relationship between the company and the user whose data is being collected starts looking quite different. The user is given a free service and is coaxed into engaging with it as much as possible. The tracking of the user serves not primarily that individual, but rather the needs of the advertisers who are funding the service. I think this relationship can be appropriately described with a word that has more sinister connotations: surveillance .
现在,公司和被收集数据的用户之间的关系开始变得截然不同。用户获得免费服务,并被诱导尽可能多地与之互动。对用户的跟踪主要为了广告商的需求而不是个人服务。我认为这种关系可以用一个具有更加险恶含义的词来适当描述:监视。
Surveillance
As a thought experiment, try replacing the word data with surveillance , and observe if common phrases still sound so good [ 93 ]. How about this: “In our surveillance-driven organization we collect real-time surveillance streams and store them in our surveillance warehouse. Our surveillance scientists use advanced analytics and surveillance processing in order to derive new insights.”
作为一个思维实验,尝试用“监视”代替“数据”,看看常见短语是否仍然听起来不错。比如这样:“在我们的监视驱动组织中,我们收集实时监视流并将其存储在我们的监视仓库中。我们的监视科学家使用先进的分析和监视处理技术,以获取新的见解。”
This thought experiment is unusually polemic for this book, Designing Surveillance-Intensive Applications , but I think that strong words are needed to emphasize this point. In our attempts to make software “eat the world” [ 94 ], we have built the greatest mass surveillance infrastructure the world has ever seen. Rushing toward an Internet of Things, we are rapidly approaching a world in which every inhabited space contains at least one internet-connected microphone, in the form of smartphones, smart TVs, voice-controlled assistant devices, baby monitors, and even children’s toys that use cloud-based speech recognition. Many of these devices have a terrible security record [ 95 ].
这个思想实验在这本《设计监控密集型应用》的书中显得特别争议,但我认为需要强烈的措辞来强调这一点。在我们努力让软件“吞噬世界”的同时,我们建造了史上最大规模的大规模监控基础设施。在向物联网迅速迈进的时候,我们正接近一个每个居住空间都至少包含一个网络连接麦克风的世界,如智能手机、智能电视、语音控制的助理设备、婴儿监控器,甚至使用基于云的语音识别的儿童玩具。其中很多设备都有可怕的安全记录。
Even the most totalitarian and repressive regimes could only dream of putting a microphone in every room and forcing every person to constantly carry a device capable of tracking their location and movements. Yet we apparently voluntarily, even enthusiastically, throw ourselves into this world of total surveillance. The difference is just that the data is being collected by corporations rather than government agencies [ 96 ].
即使是最极权和压制性的政权也只能梦想着在每个房间里安装麦克风,并强迫每个人不断携带能够跟踪其位置和行动的设备。然而,我们显然自愿甚至热情地投身于这个完全监控的世界。不同之处仅在于数据是由企业而非政府机构收集的[96]。
Not all data collection necessarily qualifies as surveillance, but examining it as such can help us understand our relationship with the data collector. Why are we seemingly happy to accept surveillance by corporations? Perhaps you feel you have nothing to hide—in other words, you are totally in line with existing power structures, you are not a marginalized minority, and you needn’t fear persecution [ 97 ]. Not everyone is so fortunate. Or perhaps it’s because the purpose seems benign—it’s not overt coercion and conformance, but merely better recommendations and more personalized marketing. However, combined with the discussion of predictive analytics from the last section, that distinction seems less clear.
不是所有的数据收集都符合监视的标准,但是将其视为监视可以帮助我们了解数据收集者与我们的关系。为什么我们似乎很乐意接受公司的监视?也许你觉得自己没有什么好隐瞒的,换句话说,你完全符合现有的权力结构,你不是被边缘化的少数派,也不需要担心迫害。并非每个人都那么幸运。或者可能是因为目的看起来是善意的——不是明显的强迫和服从,而是更好的建议和更个性化的营销。但是,与上一节的预测分析讨论相结合,这种区别似乎不那么明显。
We are already seeing car insurance premiums linked to tracking devices in cars, and health insurance coverage that depends on people wearing a fitness tracking device. When surveillance is used to determine things that hold sway over important aspects of life, such as insurance coverage or employment, it starts to appear less benign. Moreover, data analysis can reveal surprisingly intrusive things: for example, the movement sensor in a smartwatch or fitness tracker can be used to work out what you are typing (for example, passwords) with fairly good accuracy [ 98 ]. And algorithms for analysis are only going to get better.
我们已经看到了汽车保险费与汽车内追踪设备相关联,以及一些健康保险的保险费取决于人们佩戴健康追踪设备。当监视用于决定生活中重要方面的事情,比如保险覆盖范围或就业时,它开始显得不那么良善。此外,数据分析可以揭示出令人惊讶的入侵性事情:例如,智能手表或健身跟踪器中的运动传感器可以用来推断出您正在键入的内容(例如密码),准确率相当高 [98]。而且,分析算法只会变得越来越好。
Consent and freedom of choice
We might assert that users voluntarily choose to use a service that tracks their activity, and they have agreed to the terms of service and privacy policy, so they consent to data collection. We might even claim that users are receiving a valuable service in return for the data they provide, and that the tracking is necessary in order to provide the service. Undoubtedly, social networks, search engines, and various other free online services are valuable to users—but there are problems with this argument.
我们可以主张,用户自愿选择使用跟踪他们活动的服务,他们已同意服务条款和隐私政策,所以他们同意数据收集。我们甚至可以声称用户通过提供数据获得了有价值的服务,而跟踪是为了提供该服务而必要的。毫无疑问,社交网络、搜索引擎和其他各种免费在线服务对用户来说是有价值的——但是这个论点存在问题。
Users have little knowledge of what data they are feeding into our databases, or how it is retained and processed—and most privacy policies do more to obscure than to illuminate. Without understanding what happens to their data, users cannot give any meaningful consent. Often, data from one user also says things about other people who are not users of the service and who have not agreed to any terms. The derived datasets that we discussed in this part of the book—in which data from the entire user base may have been combined with behavioral tracking and external data sources—are precisely the kinds of data of which users cannot have any meaningful understanding.
用户对于他们输入到我们数据库中的数据以及如何保存和处理此类数据知道得很少,很多隐私政策更多的是模糊而非明确说明。如果用户不了解他们的数据被如何处理,他们就无法对此作出有意义的同意。通常,从一个用户得到的数据也会涉及到其他未同意如何使用此类数据的人。我们在本部分中讨论的数据派生集,它们将整个用户群的数据与行为跟踪和外部数据源结合在一起,恰恰是用户无法有任何有意义理解的数据类型。
Moreover, data is extracted from users through a one-way process, not a relationship with true reciprocity, and not a fair value exchange. There is no dialog, no option for users to negotiate how much data they provide and what service they receive in return: the relationship between the service and the user is very asymmetric and one-sided. The terms are set by the service, not by the user [ 99 ].
此外,数据是通过单向过程从用户提取的,而不是通过真正互惠的关系和公平价值的交换。没有对话,用户没有选择权来谈判提供多少数据以及以何种服务回报:服务与用户之间的关系非常不对称和单向的。条款由服务方设定,而不是用户[99]。
For a user who does not consent to surveillance, the only real alternative is simply not to use a service. But this choice is not free either: if a service is so popular that it is “regarded by most people as essential for basic social participation” [ 99 ], then it is not reasonable to expect people to opt out of this service—using it is de facto mandatory. For example, in most Western social communities, it has become the norm to carry a smartphone, to use Facebook for socializing, and to use Google for finding information. Especially when a service has network effects, there is a social cost to people choosing not to use it.
对于不同意监视的用户来说,唯一真正的选择就是简单地不使用该服务。但是这种选择也并不自由:如果一个服务非常流行,以至于“大多数人都认为它是基本社交参与所必需的”[99],那么期望人们选择退出该服务是不合理的——使用它是事实上的强制性。例如,在大多数西方社交群体中,携带智能手机、使用Facebook社交以及使用Google搜索信息已经成为常态。尤其是当一个服务具有网络效应时,如果人们选择不使用它,将会面临社交成本。 若用户不同意监视,唯一的真正替代选择便是不使用该服务。但这也不是一个自由的选择:若该服务如此普及,以至于“大多数人认为它对基础社交参与至关重要”[99],那么期望用户选择退出不合理——毕竟使用它已成为一种事实上的强制。例如,在大多数西方社交圈子中,携带智能手机、使用Facebook社交、以及使用Google搜索信息已经成为常规。特别是对于有网络效应的服务,选择不使用将面临社交成本。
Declining to use a service due to its tracking of users is only an option for the small number of people who are privileged enough to have the time and knowledge to understand its privacy policy, and who can afford to potentially miss out on social participation or professional opportunities that may have arisen if they had participated in the service. For people in a less privileged position, there is no meaningful freedom of choice: surveillance becomes inescapable.
拒绝使用一个追踪用户的服务只是少数人的选择,这些人足够有时间和知识理解其隐私政策,并有能力承担可能错失社交或职业机会的风险。对于处于相对较弱势地位的人来说,没有实质的选择自由:监视变得不可避免。
Privacy and use of data
Sometimes people claim that “privacy is dead” on the grounds that some users are willing to post all sorts of things about their lives to social media, sometimes mundane and sometimes deeply personal. However, this claim is false and rests on a misunderstanding of the word privacy .
有时人们声称“隐私已死”,理由是一些用户愿意在社交媒体上发布各种各样关于他们生活的事情,有时是平凡无奇的,有时是非常个人的。然而,这种说法是错误的,这基于对隐私这个词的误解。
Having privacy does not mean keeping everything secret; it means having the freedom to choose which things to reveal to whom, what to make public, and what to keep secret. The right to privacy is a decision right: it enables each person to decide where they want to be on the spectrum between secrecy and transparency in each situation [ 99 ]. It is an important aspect of a person’s freedom and autonomy.
拥有隐私并不意味着保持所有事情的秘密;它意味着有选择权,可以决定对谁透露什么,公开什么,保守什么。隐私权是一项决策权:它使每个人能够决定在每种情况下他们想要在保密和透明之间处于哪个区间[99]。它是一个人自由和自治的重要方面。
When data is extracted from people through surveillance infrastructure, privacy rights are not necessarily eroded, but rather transferred to the data collector. Companies that acquire data essentially say “trust us to do the right thing with your data,” which means that the right to decide what to reveal and what to keep secret is transferred from the individual to the company.
当通过监控基础设施从人们那里提取数据时,隐私权不一定会被侵蚀,而是转移到数据收集者手中。获取数据的公司基本上会说“相信我们会妥善处理你的数据”,这意味着选择何时透露、何时保密的权利已被从个人转移到公司手中。
The companies in turn choose to keep much of the outcome of this surveillance secret, because to reveal it would be perceived as creepy, and would harm their business model (which relies on knowing more about people than other companies do). Intimate information about users is only revealed indirectly, for example in the form of tools for targeting advertisements to specific groups of people (such as those suffering from a particular illness).
公司们选择保密大部分监视结果的原因是它们被视为令人不安的且会损害它们的商业模式(依赖于比其他公司更多地了解人们)。用户的私人信息只会以间接的方式透露,例如以面向特定人群(例如患有某种疾病的人)的广告定位工具的形式。
Even if particular users cannot be personally reidentified from the bucket of people targeted by a particular ad, they have lost their agency about the disclosure of some intimate information, such as whether they suffer from some illness. It is not the user who decides what is revealed to whom on the basis of their personal preferences—it is the company that exercises the privacy right with the goal of maximizing its profit.
即使特定用户无法从特定广告针对的人群中被重新识别,他们仍会失去有关披露某些私人信息的权利,例如他们是否患有某种疾病。这不是用户根据个人喜好决定向谁透露信息,而是公司以最大化其利润为目标行使隐私权。
Many companies have a goal of not being perceived as creepy—avoiding the question of how intrusive their data collection actually is, and instead focusing on managing user perceptions. And even these perceptions are often managed poorly: for example, something may be factually correct, but if it triggers painful memories, the user may not want to be reminded about it [ 100 ]. With any kind of data we should expect the possibility that it is wrong, undesirable, or inappropriate in some way, and we need to build mechanisms for handling those failures. Whether something is “undesirable” or “inappropriate” is of course down to human judgment; algorithms are oblivious to such notions unless we explicitly program them to respect human needs. As engineers of these systems we must be humble, accepting and planning for such failings.
许多公司的目标是要避免被视为“可怕”,而不是考虑他们的数据收集实际上有多么侵入性,而是专注于管理用户印象。而即使是这些印象通常也管理得很差:例如,某些东西可能是事实正确的,但如果它触发了痛苦的记忆,用户可能不想被提醒 [100]。 对于任何类型的数据,我们都应该预期可能存在错误、不良或不适当的情况,并建立处理这些失败的机制。无论某事物是“不良”或“不适当”,当然都是由人类判断的;除非我们明确地编程让算法尊重人类的需要,否则算法是无意识的这些概念的。作为这些系统的工程师,我们必须谦虚、接受并计划这样的失败。
Privacy settings that allow a user of an online service to control which aspects of their data other users can see are a starting point for handing back some control to users. However, regardless of the setting, the service itself still has unfettered access to the data, and is free to use it in any way permitted by the privacy policy. Even if the service promises not to sell the data to third parties, it usually grants itself unrestricted rights to process and analyze the data internally, often going much further than what is overtly visible to users.
隐私设置允许在线服务的用户控制其他用户可以看到其数据的哪些方面,这是将一些控制权交还给用户的起点。然而,无论设置如何,服务本身仍能自由访问并使用此数据,且在隐私政策允许的情况下可以用于任何用途。即使服务承诺不向第三方出售数据,通常也会授予其自身无限制的内部处理和分析数据的权利,往往超出用户明显可见的范围。
This kind of large-scale transfer of privacy rights from individuals to corporations is historically unprecedented [ 99 ]. Surveillance has always existed, but it used to be expensive and manual, not scalable and automated. Trust relationships have always existed, for example between a patient and their doctor, or between a defendant and their attorney—but in these cases the use of data has been strictly governed by ethical, legal, and regulatory constraints. Internet services have made it much easier to amass huge amounts of sensitive information without meaningful consent, and to use it at massive scale without users understanding what is happening to their private data.
这种大规模的隐私权转移从个人到企业的现象,在历史上是前所未有的[99]。监视一直存在,但曾经很昂贵和手动化,而不是可扩展和自动化的。信任关系一直存在,例如患者和医生之间,或被告和律师之间的关系,但在这些情况下,数据使用受到了道德、法律和监管限制。互联网服务使得大规模收集敏感信息变得更加容易,而用户却没有真正的同意,并且在大规模使用其私人数据时,用户也没有理解发生了什么。
Data as assets and power
Since behavioral data is a byproduct of users interacting with a service, it is sometimes called “data exhaust”—suggesting that the data is worthless waste material. Viewed this way, behavioral and predictive analytics can be seen as a form of recycling that extracts value from data that would have otherwise been thrown away.
由于行为数据是用户与服务互动的副产品,因此有时被称为“数据废气”,暗示这些数据是无用的废弃物料。从这个角度来看,行为和预测分析可以被视为一种回收利用的形式,从本来会被丢弃的数据中提取价值。
More correct would be to view it the other way round: from an economic point of view, if targeted advertising is what pays for a service, then behavioral data about people is the service’s core asset. In this case, the application with which the user interacts is merely a means to lure users into feeding more and more personal information into the surveillance infrastructure [ 99 ]. The delightful human creativity and social relationships that often find expression in online services are cynically exploited by the data extraction machine.
更正确的做法是反过来看:从经济角度来看,如果定向广告是为一项服务买单的方式,那么关于用户行为的数据就是该服务的核心资产。在这种情况下,用户与之交互的应用程序仅仅是吸引用户向监视基础设施提供越来越多个人信息的手段。在线服务中经常表现出来的迷人人类创造力和社交关系被数据提取机器愚弄地利用。
The assertion that personal data is a valuable asset is supported by the existence of data brokers, a shady industry operating in secrecy, purchasing, aggregating, analyzing, inferring, and reselling intrusive personal data about people, mostly for marketing purposes [ 90 ]. Startups are valued by their user numbers, by “eyeballs”—i.e., by their surveillance capabilities.
私人資料是有價值的資產這種說法得到了證實,因為資料經紀人的存在。這是一個在秘密運作的陰暗行業,他們購買、匯總、分析、推測和轉售人們的侵犯性個人資料,主要是為了市場營銷目的 [90]。創業公司通常根據其用戶數來評估其價值,通過它們的監控能力來評估。
Because the data is valuable, many people want it. Of course companies want it—that’s why they collect it in the first place. But governments want to obtain it too: by means of secret deals, coercion, legal compulsion, or simply stealing it [ 101 ]. When a company goes bankrupt, the personal data it has collected is one of the assets that get sold. Moreover, the data is difficult to secure, so breaches happen disconcertingly often [ 102 ].
因为数据很有价值,许多人都想要它。当然,公司也想要它,这就是为什么他们首先收集它的原因。但政府也想获得它:通过秘密交易、强迫、法律约束或仅仅是偷窃[101]。 当公司破产时,它收集的个人数据是被出售的资产之一。此外,数据很难保护,所以泄漏发生的频率令人不安[102]。
These observations have led critics to saying that data is not just an asset, but a “toxic asset” [ 101 ], or at least “hazardous material” [ 103 ]. Even if we think that we are capable of preventing abuse of data, whenever we collect data, we need to balance the benefits with the risk of it falling into the wrong hands: computer systems may be compromised by criminals or hostile foreign intelligence services, data may be leaked by insiders, the company may fall into the hands of unscrupulous management that does not share our values, or the country may be taken over by a regime that has no qualms about compelling us to hand over the data.
这些观察结果导致评论家们称数据不仅仅是一种资产,而是“有毒资产”[101],或者至少是“有害物质”[103]。即使我们认为自己能够防止数据被滥用,每当我们收集数据时,我们都需要权衡收益与风险。计算机系统可能会被罪犯或敌对外国情报服务机构入侵,数据可能会被内部人员泄露,公司可能会落入不道德的管理层手中,他们不会分享我们的价值观,或者国家可能会被一个毫不犹豫地迫使我们交出数据的政权所接管。
When collecting data, we need to consider not just today’s political environment, but all possible future governments. There is no guarantee that every government elected in future will respect human rights and civil liberties, so “it is poor civic hygiene to install technologies that could someday facilitate a police state” [ 104 ].
在收集数据时,我们需要考虑不仅是今天的政治环境,而且所有可能的未来政府。无法保证未来选举出的每个政府都会尊重人权和公民自由,因此,“安装可能有助于警察国家的技术是一种贫乏的公民卫生”[104]。
“Knowledge is power,” as the old adage goes. And furthermore, “to scrutinize others while avoiding scrutiny oneself is one of the most important forms of power” [ 105 ]. This is why totalitarian governments want surveillance: it gives them the power to control the population. Although today’s technology companies are not overtly seeking political power, the data and knowledge they have accumulated nevertheless gives them a lot of power over our lives, much of which is surreptitious, outside of public oversight [ 106 ].
据古谚语所说,“知识就是力量”。而且,“深入审查他人,同时避免接受审查本身,是最重要的权力形式之一”[105]。这就是为什么极权政府需要监视:它给了他们控制人民的权力。尽管当今的科技公司并非公开寻求政治权力,但他们积累的数据和知识仍然在我们生活中拥有很大的影响力,其中很大一部分是隐秘的,没有公众监督[106]。
Remembering the Industrial Revolution
Data is the defining feature of the information age. The internet, data storage, processing, and software-driven automation are having a major impact on the global economy and human society. As our daily lives and social organization have changed in the past decade, and will probably continue to radically change in the coming decades, comparisons to the Industrial Revolution come to mind [ 87 , 96 ].
数据是信息时代的决定性特征。互联网、数据存储、处理和软件驱动的自动化对全球经济和人类社会产生了重大影响。随着我们日常生活和社会组织在过去十年发生了变化,而且可能在未来几十年发生根本性变化,不禁让人想起工业革命的比较。
The Industrial Revolution came about through major technological and agricultural advances, and it brought sustained economic growth and significantly improved living standards in the long run. Yet it also came with major problems: pollution of the air (due to smoke and chemical processes) and the water (from industrial and human waste) was dreadful. Factory owners lived in splendor, while urban workers often lived in very poor housing and worked long hours in harsh conditions. Child labor was common, including dangerous and poorly paid work in mines.
工业革命是通过重大的技术和农业进步实现的,从而带来了经济持续增长和明显改善的长期生活水平。但是,它也带来了重大问题:由于烟雾和化学过程导致的空气污染和(由工业和人类废物)的水污染是可怕的。工厂老板生活在华丽之中,而城市工人通常生活在非常糟糕的住房中,在恶劣的条件下长时间工作。童工很普遍,包括在矿山中从事危险和低薪工作。
It took a long time before safeguards were established, such as environmental protection regulations, safety protocols for workplaces, outlawing child labor, and health inspections for food. Undoubtedly the cost of doing business increased when factories could no longer dump their waste into rivers, sell tainted foods, or exploit workers. But society as a whole benefited hugely, and few of us would want to return to a time before those regulations [ 87 ].
很长时间过去了,人们才建立了许多保障措施,比如环境保护法规、工作场所安全协议、禁止童工,并对食品进行卫生检查。当工厂不能再将废弃物排放到河流中、销售污染食品或剥削工人时,经商成本无疑会增加。但整个社会从中获益匪浅,很少有人愿意返回这些规定实施之前的时代。
Just as the Industrial Revolution had a dark side that needed to be managed, our transition to the information age has major problems that we need to confront and solve. I believe that the collection and use of data is one of those problems. In the words of Bruce Schneier [ 96 ]:
正如工业革命存在需要管理的阴暗面一样,我们向信息时代的过渡也存在着一系列需要我们应对和解决的重大问题。我认为数据的收集和使用就是其中之一。用布鲁斯·施奈尔(Bruce Schneier)的话来说:“数据就是新时代的权力”,这种权力必须得到有效的监管和管控。
Data is the pollution problem of the information age, and protecting privacy is the environmental challenge. Almost all computers produce information. It stays around, festering. How we deal with it—how we contain it and how we dispose of it—is central to the health of our information economy. Just as we look back today at the early decades of the industrial age and wonder how our ancestors could have ignored pollution in their rush to build an industrial world, our grandchildren will look back at us during these early decades of the information age and judge us on how we addressed the challenge of data collection and misuse.
数据是信息时代的污染问题,保护隐私则是环境方面的挑战。几乎所有的计算机都会产生信息,而信息却一直存在,潜在地危害着我们的经济健康。我们如何处理这些信息,如何加以限制和处理,对于信息经济的健康至关重要。正如我们今天回顾工业时代早期几十年的历史,在我们先辈们忙于建立工业化世界时为何会置污染于不顾,我们的后代将来审视我们这个信息时代早期几十年历史时,会根据我们如何应对数据收集和滥用的挑战来评判我们。
We should try to make them proud.
我们应该努力让他们感到骄傲。
Legislation and self-regulation
Data protection laws might be able to help preserve individuals’ rights. For example, the 1995 European Data Protection Directive states that personal data must be “collected for specified, explicit and legitimate purposes and not further processed in a way incompatible with those purposes,” and furthermore that data must be “adequate, relevant and not excessive in relation to the purposes for which they are collected” [ 107 ].
数据保护法可以帮助保护个人权利。例如,1995年欧洲数据保护指令规定,个人数据必须“为特定、明确和合法的目的收集,不得以与这些目的不兼容的方式进一步处理”,并且数据必须“与收集目的相关,而且不应过度”。[107]。
However, it is doubtful whether this legislation is effective in today’s internet context [ 108 ]. These rules run directly counter to the philosophy of Big Data, which is to maximize data collection, to combine it with other datasets, to experiment and to explore in order to generate new insights. Exploration means using data for unforeseen purposes, which is the opposite of the “specified and explicit” purposes for which the user gave their consent (if we can meaningfully speak of consent at all [ 109 ]). Updated regulations are now being developed [ 89 ].
然而,在当前互联网背景下,这种立法是否有效仍存在疑问[108]。这些规定直接违反了大数据的哲学,即最大化数据采集,将其与其他数据集合并,进行实验和探索以产生新的见解。探索意味着将数据用于未预料的目的,这与用户给出明确和明确的同意目的(如果我们能有意义地谈论同意的话[109])相反。正在制定更新的规定[89]。
Companies that collect lots of data about people oppose regulation as being a burden and a hindrance to innovation. To some extent that opposition is justified. For example, when sharing medical data, there are clear risks to privacy, but there are also potential opportunities: how many deaths could be prevented if data analysis was able to help us achieve better diagnostics or find better treatments [ 110 ]? Over-regulation may prevent such breakthroughs. It is difficult to balance such potential opportunities with the risks [ 105 ].
大量收集人们数据的公司反对管制,认为管制会成为负担和创新的阻碍。从某种程度上说,这种反对是有道理的。比如,在分享医疗数据时,虽然会存在隐私风险,但也存在潜在的机会:如果数据分析能够帮助我们实现更好的诊断或找到更好的治疗方法,那将有助于挽救多少生命。过度管制可能会阻碍这种突破。很难平衡这种潜在机会和风险。
Fundamentally, I think we need a culture shift in the tech industry with regard to personal data. We should stop regarding users as metrics to be optimized, and remember that they are humans who deserve respect, dignity, and agency. We should self-regulate our data collection and processing practices in order to establish and maintain the trust of the people who depend on our software [ 111 ]. And we should take it upon ourselves to educate end users about how their data is used, rather than keeping them in the dark.
从根本上讲,我认为技术产业需要转变文化,关注个人数据。我们应该停止把用户视为要优化的指标,而要记住他们是有尊严、权利和自主权的人。我们应该自我约束数据收集和处理实践,以建立和维护人们对我们软件的信任[111]。我们应该自己教育终端用户,让他们了解他们的数据如何被使用,而不是让他们一无所知。
We should allow each individual to maintain their privacy—i.e., their control over own data—and not steal that control from them through surveillance. Our individual right to control our data is like the natural environment of a national park: if we don’t explicitly protect and care for it, it will be destroyed. It will be the tragedy of the commons, and we will all be worse off for it. Ubiquitous surveillance is not inevitable—we are still able to stop it.
我们应该允许每个人保护自己的隐私——也就是掌控自己的数据——不能通过监控侵犯他们的掌控权。我们每个人掌控自己数据的权利就像国家公园的自然环境:如果我们不明确地保护和呵护它,它将被摧毁。这将是“公地悲剧”,我们所有人都会因此受到损害。无处不在的监控并非不可避免——我们仍然能够阻止它发生。
How exactly we might achieve this is an open question. To begin with, we should not retain data forever, but purge it as soon as it is no longer needed [ 111 , 112 ]. Purging data runs counter to the idea of immutability (see “Limitations of immutability” ), but that issue can be solved. A promising approach I see is to enforce access control through cryptographic protocols, rather than merely by policy [ 113 , 114 ]. Overall, culture and attitude changes will be necessary.
我们如何确切地达成这个目标仍然是个未知问题。首先,我们不应该永久地保留数据,而是在不再需要时立刻清除它[111,112]。清除数据违反了不可变性的概念(请参见“不可变性的限制”),但这个问题可以解决。我看到的一个有前途的方法是通过加密协议来执行访问控制,而不仅仅是通过策略[113,114]。总的来说,需要文化和态度上的改变。
Summary
In this chapter we discussed new approaches to designing data systems, and I included my personal opinions and speculations about the future. We started with the observation that there is no one single tool that can efficiently serve all possible use cases, and so applications necessarily need to compose several different pieces of software to accomplish their goals. We discussed how to solve this data integration problem by using batch processing and event streams to let data changes flow between different systems.
在本章中,我们讨论了设计数据系统的新方法,并包括了我个人对未来的看法和推测。我们从观察到没有一种单一的工具可以有效地为所有可能的用例服务开始,并且应用程序必须组合几个不同的软件来实现其目标。我们讨论了如何通过使用批处理和事件流解决这个数据集成问题,以让数据变化在不同的系统之间流动。
In this approach, certain systems are designated as systems of record, and other data is derived from them through transformations. In this way we can maintain indexes, materialized views, machine learning models, statistical summaries, and more. By making these derivations and transformations asynchronous and loosely coupled, a problem in one area is prevented from spreading to unrelated parts of the system, increasing the robustness and fault-tolerance of the system as a whole.
在这种方法中,某些系统被指定为记录系统,而其他数据则通过转换从它们派生。通过这种方式,我们可以维护索引、物化视图、机器学习模型、统计摘要等等。通过将这些派生和转换异步和松散耦合,防止一个区域的问题传播到系统的无关部分,从而增加整个系统的鲁棒性和容错性。
Expressing dataflows as transformations from one dataset to another also helps evolve applications: if you want to change one of the processing steps, for example to change the structure of an index or cache, you can just rerun the new transformation code on the whole input dataset in order to rederive the output. Similarly, if something goes wrong, you can fix the code and reprocess the data in order to recover.
将数据流表达为从一个数据集到另一个数据集的转换也有助于演变应用程序:如果您想更改其中一个处理步骤,例如更改索引或缓存的结构,您可以在整个输入数据集上重新运行新的转换代码,以便重新生成输出。同样,如果出现问题,您可以修复代码并重新处理数据以进行恢复。
These processes are quite similar to what databases already do internally, so we recast the idea of dataflow applications as unbundling the components of a database, and building an application by composing these loosely coupled components.
这些过程与数据库内部已经执行的过程非常相似,因此我们将数据流应用程序的理念重塑为解构数据库组件,并通过组合这些松散耦合的组件构建应用程序。
Derived state can be updated by observing changes in the underlying data. Moreover, the derived state itself can further be observed by downstream consumers. We can even take this dataflow all the way through to the end-user device that is displaying the data, and thus build user interfaces that dynamically update to reflect data changes and continue to work offline.
派生状态可以通过观察基础数据的变化来更新。此外,下游消费者还可以进一步观察派生状态本身。我们甚至可以将这种数据流传递到显示数据的最终用户设备上,因此构建动态更新以反映数据变化并继续在离线状态下运行的用户界面。
Next, we discussed how to ensure that all of this processing remains correct in the presence of faults. We saw that strong integrity guarantees can be implemented scalably with asynchronous event processing, by using end-to-end operation identifiers to make operations idempotent and by checking constraints asynchronously. Clients can either wait until the check has passed, or go ahead without waiting but risk having to apologize about a constraint violation. This approach is much more scalable and robust than the traditional approach of using distributed transactions, and fits with how many business processes work in practice.
接下来,我们讨论如何确保所有的处理在存在故障时都是正确的。我们发现强整体性保证可以通过异步事件处理进行可扩展实现,使用端到端操作标识符使操作幂等,并异步地检查约束条件。客户端可以等待检查通过,或者不等待,但存在约束违反的风险。这种方法比传统的使用分布式事务的方法更具可扩展性和鲁棒性,并符合许多业务流程的实际使用情况。
By structuring applications around dataflow and checking constraints asynchronously, we can avoid most coordination and create systems that maintain integrity but still perform well, even in geographically distributed scenarios and in the presence of faults. We then talked a little about using audits to verify the integrity of data and detect corruption.
通过围绕数据流构建应用程序,并异步检查约束条件,我们可以避免大部分协调工作,并创建保持完整性的系统,即使在地理分布的情况下和存在故障的情况下系统仍然表现良好。接下来我们会详细讨论使用审计来验证数据的完整性,并检测数据是否损坏。
Finally, we took a step back and examined some ethical aspects of building data-intensive applications. We saw that although data can be used to do good, it can also do significant harm: making justifying decisions that seriously affect people’s lives and are difficult to appeal against, leading to discrimination and exploitation, normalizing surveillance, and exposing intimate information. We also run the risk of data breaches, and we may find that a well-intentioned use of data has unintended consequences.
最后,我们退后一步,审视了建立数据密集型应用程序的一些伦理方面。我们发现,虽然数据可以用于做好事,但也可能造成相当严重的伤害:作出影响人们生活并难以上诉的决策,导致歧视和剥削,将监视通常化并暴露私密信息。我们还面临着数据泄露的风险,而我们可能会发现,对数据的善意使用具有意想不到的后果。
As software and data are having such a large impact on the world, we engineers must remember that we carry a responsibility to work toward the kind of world that we want to live in: a world that treats people with humanity and respect. I hope that we can work together toward that goal.
由于软件和数据对世界产生如此之大的影响,我们工程师必须记得,我们承担着推动我们想要生活的这种世界的责任——一个尊重人类和尊重的世界。我希望我们可以共同努力实现这个目标。
Footnotes
i Explaining a joke rarely improves it, but I don’t want anyone to feel left out. Here, Church is a reference to the mathematician Alonzo Church, who created the lambda calculus, an early form of computation that is the basis for most functional programming languages. The lambda calculus has no mutable state (i.e., no variables that can be overwritten), so one could say that mutable state is separate from Church’s work.
解释笑话很少能使它变得更好,但我不想让任何人感觉被遗漏。在这里,“Church”是对数学家Alonzo Church的引用,他创造了Lambda演算,这是大多数函数式编程语言的基础。Lambda演算没有可变状态(即不能被覆盖的变量),因此可以说可变状态与Church的工作是分开的。
ii In the microservices approach, you could avoid the synchronous network request by caching the exchange rate locally in the service that processes the purchase. However, in order to keep that cache fresh, you would need to periodically poll for updated exchange rates, or subscribe to a stream of changes—which is exactly what happens in the dataflow approach.
在微服务方法中,您可以通过在处理购买的服务中将汇率缓存在本地来避免同步网络请求。然而,为了保持缓存的新鲜,您需要定期轮询更新的汇率,或者订阅变化的流—这正是数据流方法所发生的。
iii Less facetiously, the set of distinct search queries with nonempty search results is finite, assuming a finite corpus. However, it would be exponential in the number of terms in the corpus, which is still pretty bad news.
非幽默地说,假设有限语料库存在,具有非空搜索结果的不同搜索查询集是有限的。然而,其在语料库中术语数量上是指数级增长的,这仍然是一个坏消息。
References
[ 1 ] Rachid Belaid: “ Postgres Full-Text Search is Good Enough! ,” rachbelaid.com , July 13, 2015.
[1] Rachid Belaid: “Postgres全文搜索足够好了!”, rachbelaid.com,2015年7月13日。
[ 2 ] Philippe Ajoux, Nathan Bronson, Sanjeev Kumar, et al.: “ Challenges to Adopting Stronger Consistency at Scale ,” at 15th USENIX Workshop on Hot Topics in Operating Systems (HotOS), May 2015.
[2] Philippe Ajoux,Nathan Bronson,Sanjeev Kumar等: “在规模上采用更强的一致性所面临的挑战”,于2015年5月在第15届USENIX操作系统热点话题研讨会(HotOS)上发表。
[ 3 ] Pat Helland and Dave Campbell: “ Building on Quicksand ,” at 4th Biennial Conference on Innovative Data Systems Research (CIDR), January 2009.
[3] Pat Helland和Dave Campbell:“在流沙上建立”,发表于2009年1月的第四届创新数据系统研究双年会(CIDR)。
[ 4 ] Jessica Kerr: “ Provenance and Causality in Distributed Systems ,” blog.jessitron.com , September 25, 2016.
[4] Jessica Kerr: "分布式系统中的溯源和因果关系," blog.jessitron.com, 2016年9月25日.
[ 5 ] Kostas Tzoumas: “ Batch Is a Special Case of Streaming ,” data-artisans.com , September 15, 2015.
【5】Kostas Tzoumas:“Batch是流数据的特例”,data-artisans.com,2015年9月15日。
[ 6 ] Shinji Kim and Robert Blafford: “ Stream Windowing Performance Analysis: Concord and Spark Streaming ,” concord.io , July 6, 2016.
[6] Shinji Kim和Robert Blafford: “流窗口性能分析:Concord和Spark Streaming”,concord.io,2016年7月6日。 [6] Shinji Kim和Robert Blafford: “流窗口性能分析:Concord和Spark Streaming”,concord.io,2016年7月6日。
[ 7 ] Jay Kreps: “ The Log: What Every Software Engineer Should Know About Real-Time Data’s Unifying Abstraction ,” engineering.linkedin.com , December 16, 2013.
[7] Jay Kreps: “日志:关于实时数据统一抽象的每个软件工程师都应该知道的事情”, engineering.linkedin.com,2013年12月16日。
[ 8 ] Pat Helland: “ Life Beyond Distributed Transactions: An Apostate’s Opinion ,” at 3rd Biennial Conference on Innovative Data Systems Research (CIDR), January 2007.
[8] Pat Helland: “超越分布式事务: 一个异端的观点”,于2007年1月举行的第三届创新数据系统研究会议(CIDR)上发表。
[ 9 ] “ Great Western Railway (1835–1948) ,” Network Rail Virtual Archive, networkrail.co.uk .
[9] “大西部铁路(1835年至1948年),” 英国网络铁路虚拟档案, networkrail.co.uk.
[ 10 ] Jacqueline Xu: “ Online Migrations at Scale ,” stripe.com , February 2, 2017.
[10] Jacqueline Xu:“大规模在线迁移”,stripe.com,2017年2月2日。
[ 11 ] Molly Bartlett Dishman and Martin Fowler: “ Agile Architecture ,” at O’Reilly Software Architecture Conference , March 2015.
[11] Molly Bartlett Dishman 和 Martin Fowler: "敏捷架构",于 O'Reilly 软件架构会议,2015 年 3 月。
[ 12 ] Nathan Marz and James Warren: Big Data: Principles and Best Practices of Scalable Real-Time Data Systems . Manning, 2015. ISBN: 978-1-617-29034-3
[12] Nathan Marz和James Warren: 《大数据:可伸缩实时数据系统的原则和最佳实践》。曼宁出版社,2015年。ISBN: 978-1-617-29034-3。
[ 13 ] Oscar Boykin, Sam Ritchie, Ian O’Connell, and Jimmy Lin: “ Summingbird: A Framework for Integrating Batch and Online MapReduce Computations ,” at 40th International Conference on Very Large Data Bases (VLDB), September 2014.
"[13] Oscar Boykin, Sam Ritchie, Ian O’Connell, and Jimmy Lin: “Summingbird: A Framework for Integrating Batch and Online MapReduce Computations,” at 40th International Conference on Very Large Data Bases (VLDB), September 2014." "[13] 奥斯卡·博伊金(Oscar Boykin),山姆·里奇(Sam Ritchie),伊恩·奥康纳(Ian O'Connell)和吉米·林(Jimmy Lin):``Summingbird:一种整合批处理和在线MapReduce计算的框架'',于2014年9月在第40届国际超大数据库会议(VLDB)上发表。"
[ 14 ] Jay Kreps: “ Questioning the Lambda Architecture ,” oreilly.com , July 2, 2014.
[14] Jay Kreps:“质疑Lamba架构”,oreilly.com,2014年7月2日。
[ 15 ] Raul Castro Fernandez, Peter Pietzuch, Jay Kreps, et al.: “ Liquid: Unifying Nearline and Offline Big Data Integration ,” at 7th Biennial Conference on Innovative Data Systems Research (CIDR), January 2015.
[15] Raul Castro Fernandez、Peter Pietzuch、Jay Kreps等人:“液态:统一近线和离线的大数据集成”,发表于第七届创新数据系统研究会议(CIDR),2015年1月。
[ 16 ] Dennis M. Ritchie and Ken Thompson: “ The UNIX Time-Sharing System ,” Communications of the ACM , volume 17, number 7, pages 365–375, July 1974. doi:10.1145/361011.361061
「[16] 丹尼斯·里奇和肯·汤普森:「UNIX 分时系统」,《ACM通讯》,卷17,第7期,365-375页,1974年7月。doi:10.1145/361011.361061」的簡體中文翻譯如下: 丹尼斯·里奇和肯·汤普森:「UNIX 分时系统」,ACM通讯,卷17,第7期,1974年7月,365-375页。doi:10.1145/361011.361061。
[ 17 ] Eric A. Brewer and Joseph M. Hellerstein: “ CS262a: Advanced Topics in Computer Systems ,” lecture notes, University of California, Berkeley, cs.berkeley.edu , August 2011.
[17] 艾瑞克·A·布鲁尔和约瑟夫·M·赫勒斯坦: “CS262a:计算机系统高级主题”,讲义,加州大学伯克利分校cs.berkeley.edu, 2011年8月。
[ 18 ] Michael Stonebraker: “ The Case for Polystores ,” wp.sigmod.org , July 13, 2015.
[18] 迈克尔·斯通布雷克:《Polystore的案例》,wp.sigmod.org,2015年7月13日。
[ 19 ] Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, et al.: “ The BigDAWG Polystore System ,” ACM SIGMOD Record , volume 44, number 2, pages 11–16, June 2015. doi:10.1145/2814710.2814713
【19】詹妮·达根、亚伦·J·埃尔莫、迈克尔·斯通布莱克等:《BigDAWG Polystore 系统》,ACM SIGMOD Record,第 44 卷,第 2 期,第 11-16 页,2015 年 6 月。doi:10.1145/2814710.2814713。
[ 20 ] Patrycja Dybka: “ Foreign Data Wrappers for PostgreSQL ,” vertabelo.com , March 24, 2015.
[20] Patrycja Dybka:“PostgreSQL的外部数据包装器”,vertabelo.com,2015年3月24日。
[ 21 ] David B. Lomet, Alan Fekete, Gerhard Weikum, and Mike Zwilling: “ Unbundling Transaction Services in the Cloud ,” at 4th Biennial Conference on Innovative Data Systems Research (CIDR), January 2009.
[21] David B. Lomet,Alan Fekete,Gerhard Weikum和Mike Zwilling:“在云中对交易服务进行解耦”,发表于第四届创新数据系统研究双年会(CIDR),2009年1月。
[ 22 ] Martin Kleppmann and Jay Kreps: “ Kafka, Samza and the Unix Philosophy of Distributed Data ,” IEEE Data Engineering Bulletin , volume 38, number 4, pages 4–14, December 2015.
[22] Martin Kleppmann和Jay Kreps:“Kafka、Samza和分布式数据的Unix哲学”,IEEE数据工程通报,卷38,第4期,页4-14,2015年12月。
[ 23 ] John Hugg: “ Winning Now and in the Future: Where VoltDB Shines ,” voltdb.com , March 23, 2016.
"[23] John Hugg:‘赢在现在和未来:VoltDB的优势’,voltdb.com,2016年3月23日。"
[ 24 ] Frank McSherry, Derek G. Murray, Rebecca Isaacs, and Michael Isard: “ Differential Dataflow ,” at 6th Biennial Conference on Innovative Data Systems Research (CIDR), January 2013.
[24] Frank McSherry, Derek G. Murray, Rebecca Isaacs和Michael Isard: “差异化数据流”, 发表于第六届创新数据系统研究双年会(CIDR),2013年1月。
[ 25 ] Derek G Murray, Frank McSherry, Rebecca Isaacs, et al.: “ Naiad: A Timely Dataflow System ,” at 24th ACM Symposium on Operating Systems Principles (SOSP), pages 439–455, November 2013. doi:10.1145/2517349.2522738
"[25] Derek G Murray, Frank McSherry, Rebecca Isaacs等:“Naiad:一种及时数据流系统”, 2013年11月,在第24届ACM操作系统原则研讨会(SOSP)上,第439-455页。doi:10.1145/2517349.2522738"
[ 26 ] Gwen Shapira: “ We have a bunch of customers who are implementing ‘database inside-out’ concept and they all ask ‘is anyone else doing it? are we crazy?’ ” twitter.com , July 28, 2016.
"[26] Gwen Shapira:我们有很多客户正在实施“内部数据库”概念,他们都会问'有其他公司这么做吗?我们是不是疯了?'。" 推特,2016年7月28日。"
[ 27 ] Martin Kleppmann: “ Turning the Database Inside-out with Apache Samza, ” at Strange Loop , September 2014.
[27] Martin Kleppmann:在奇怪的循环会议上,于2014年9月发表了题为“用Apache Samza将数据库颠覆”的演讲。
[ 28 ] Peter Van Roy and Seif Haridi: Concepts, Techniques, and Models of Computer Programming . MIT Press, 2004. ISBN: 978-0-262-22069-9
【28】Peter Van Roy和Seif Haridi:《计算机编程的概念、技术和模型》。麻省理工学院出版社,2004年。ISBN:978-0-262-22069-9。
[ 29 ] “ Juttle Documentation ,” juttle.github.io , 2016.
[29] "Juttle文档", juttle.github.io, 2016.
[ 30 ] Evan Czaplicki and Stephen Chong: “ Asynchronous Functional Reactive Programming for GUIs ,” at 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2013. doi:10.1145/2491956.2462161
"30. Evan Czaplicki和Stephen Chong:“用于GUI的异步函数响应式编程”,发表于第34届ACM SIGPLAN编程语言设计和实现会议(PLDI),2013年6月。 doi:10.1145 / 2491956.2462161"
[ 31 ] Engineer Bainomugisha, Andoni Lombide Carreton, Tom van Cutsem, Stijn Mostinckx, and Wolfgang de Meuter: “ A Survey on Reactive Programming ,” ACM Computing Surveys , volume 45, number 4, pages 1–34, August 2013. doi:10.1145/2501654.2501666
“反应式编程综述”,作者:Engineer Bainomugisha、Andoni Lombide Carreton、Tom van Cutsem、Stijn Mostinckx、Wolfgang de Meuter,发表于《ACM Computing Surveys》杂志,2013年8月,第45卷第4期,1-34页。doi:10.1145/2501654.2501666。
[ 32 ] Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and William R. Marczak: “ Consistency Analysis in Bloom: A CALM and Collected Approach ,” at 5th Biennial Conference on Innovative Data Systems Research (CIDR), January 2011.
[32] Peter Alvaro、Neil Conway、Joseph M. Hellerstein和William R. Marczak:「Consistency Analysis in Bloom: A CALM and Collected Approach」,于2011年1月在第5届创新数据系统研究双年会(CIDR)上发表。
[ 33 ] Felienne Hermans: “ Spreadsheets Are Code ,” at Code Mesh , November 2015.
Felienne Hermans:“电子表格是代码”,于2015年11月在Code Mesh上。
[ 34 ] Dan Bricklin and Bob Frankston: “ VisiCalc: Information from Its Creators ,” danbricklin.com .
[34] 丹·布里克林和鲍勃·弗兰克斯顿:“VisiCalc:从其创作者那里获得信息”,danbricklin.com。
[ 35 ] D. Sculley, Gary Holt, Daniel Golovin, et al.: “ Machine Learning: The High-Interest Credit Card of Technical Debt ,” at NIPS Workshop on Software Engineering for Machine Learning (SE4ML), December 2014.
[35] D. Sculley, Gary Holt, Daniel Golovin等人: “机器学习:技术债务中的高利贷”, 于2014年12月于NIPS机器学习软件工程研讨会(SE4ML)上发表。
[ 36 ] Peter Bailis, Alan Fekete, Michael J Franklin, et al.: “ Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity ,” at ACM International Conference on Management of Data (SIGMOD), June 2015. doi:10.1145/2723372.2737784
[36] Peter Bailis,Alan Fekete,Michael J Franklin等:“野生并发控制:现代应用完整性的实证研究”,发表于2015年6月ACM数据管理国际会议(SIGMOD),doi:10.1145/2723372.2737784。
[ 37 ] Guy Steele: “ Re: Need for Macros (Was Re: Icon) ,” email to ll1-discuss mailing list, people.csail.mit.edu , December 24, 2001.
[37] Guy Steele: “关于需要宏 (回复:Icon),” 发送至ll1-discuss邮件列表的电子邮件,people.csail.mit.edu,2001年12月24日。
[ 38 ] David Gelernter: “ Generative Communication in Linda ,” ACM Transactions on Programming Languages and Systems (TOPLAS), volume 7, number 1, pages 80–112, January 1985. doi:10.1145/2363.2433
[38] 大卫·格勒特纳:“Linda中的生成通信”,ACM编程语言和系统交易(TOPLAS),第7卷,第1号,第80-112页,1985年1月。doi:10.1145 / 2363.2433
[ 39 ] Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec: “ The Many Faces of Publish/Subscribe ,” ACM Computing Surveys , volume 35, number 2, pages 114–131, June 2003. doi:10.1145/857076.857078
[39] Patrick Th. Eugster,Pascal A. Felber,Rachid Guerraoui和Anne-Marie Kermarrec:“发布/订阅的多重面貌”,ACM计算机调查,第35卷,第2号,页114-131,2003年6月。doi:10.1145/857076.857078。
[ 40 ] Ben Stopford: “ Microservices in a Streaming World ,” at QCon London , March 2016.
[40] Ben Stopford: “流式世界中的微服务”,于2016年3月在QCon伦敦举行。
[ 41 ] Christian Posta: “ Why Microservices Should Be Event Driven: Autonomy vs Authority ,” blog.christianposta.com , May 27, 2016.
[41] Christian Posta:“微服务应该是事件驱动的:自治与权限”,blog.christianposta.com,2016年5月27日。
[ 42 ] Alex Feyerke: “ Say Hello to Offline First ,” hood.ie , November 5, 2013.
[42] Alex Feyerke:“跟离线优先说个hello”(Say Hello to Offline First), hood.ie, 2013年11月5日。
[ 43 ] Sebastian Burckhardt, Daan Leijen, Jonathan Protzenko, and Manuel Fähndrich: “ Global Sequence Protocol: A Robust Abstraction for Replicated Shared State ,” at 29th European Conference on Object-Oriented Programming (ECOOP), July 2015. doi:10.4230/LIPIcs.ECOOP.2015.568
[43] Sebastian Burckhardt, Daan Leijen, Jonathan Protzenko, and Manuel Fähndrich:“全局序列协议: 用于复制共享状态的强大抽象”,发表于2015年7月的第29届欧洲面向对象编程会议(ECOOP)。doi:10.4230/LIPIcs.ECOOP.2015.568。
[ 44 ] Mark Soper: “ Clearing Up React Data Management Confusion with Flux, Redux, and Relay ,” medium.com , December 3, 2015.
[44] Mark Soper: “Flux,Redux和Relay带来的React数据管理混乱的澄清”,medium.com,2015年12月3日。
[ 45 ] Eno Thereska, Damian Guy, Michael Noll, and Neha Narkhede: “ Unifying Stream Processing and Interactive Queries in Apache Kafka ,” confluent.io , October 26, 2016.
Eno Thereska、Damian Guy、Michael Noll和Neha Narkhede:“将流处理和交互式查询统一于Apache Kafka中”,confluent.io网站,2016年10月26日。
[ 46 ] Frank McSherry: “ Dataflow as Database ,” github.com , July 17, 2016.
[46] Frank McSherry:“数据流作为数据库”,github.com,2016年7月17日。
[ 47 ] Peter Alvaro: “ I See What You Mean ,” at Strange Loop , September 2015.
Peter Alvaro:“我明白你的意思”在奇怪的循环中,2015年9月。
[ 48 ] Nathan Marz: “ Trident: A High-Level Abstraction for Realtime Computation ,” blog.twitter.com , August 2, 2012.
[48] Nathan Marz: “Trident:一个用于实时计算的高层抽象”,blog.twitter.com,2012年8月2日。
[ 49 ] Edi Bice: “ Low Latency Web Scale Fraud Prevention with Apache Samza, Kafka and Friends ,” at Merchant Risk Council MRC Vegas Conference , March 2016.
"低延迟的Web规模欺诈预防,采用Apache Samza、Kafka和伙伴技术",于2016年3月在商家风险协会MRC Vegas会议上发表,由Edi Bice。"
[ 50 ] Charity Majors: “ The Accidental DBA ,” charity.wtf , October 2, 2016.
[50] 慈善·梅乔斯: 《意外DBA》, charity.wtf, 2016年10月2日.
[ 51 ] Arthur J. Bernstein, Philip M. Lewis, and Shiyong Lu: “ Semantic Conditions for Correctness at Different Isolation Levels ,” at 16th International Conference on Data Engineering (ICDE), February 2000. doi:10.1109/ICDE.2000.839387
[51] Arthur J. Bernstein, Philip M. Lewis, and Shiyong Lu:“不同隔离级别的正确性的语义条件”,于2000年2月第16届国际数据工程会议(ICDE)。doi:10.1109/ICDE.2000.839387。
[ 52 ] Sudhir Jorwekar, Alan Fekete, Krithi Ramamritham, and S. Sudarshan: “ Automating the Detection of Snapshot Isolation Anomalies ,” at 33rd International Conference on Very Large Data Bases (VLDB), September 2007.
[52] Sudhir Jorwekar, Alan Fekete,Krithi Ramamritham和S. Sudarshan:“自动检测快照隔离异常”,于2007年9月在第33届国际大型数据库会议(VLDB)上发表。
[ 53 ] Kyle Kingsbury: Jepsen blog post series , aphyr.com , 2013–2016.
[53] Kyle Kingsbury: Jepsen博客文章系列,aphyr.com,2013年至2016年。 [53] 凯尔·金斯伯里: Jepsen博客文章系列,aphyr.com,2013年至2016年。
[ 54 ] Michael Jouravlev: “ Redirect After Post ,” theserverside.com , August 1, 2004.
[54] 迈克尔·乔拉夫列夫: “在发布后重定向”, theserverside.com,2004年8月1日。 [54] Michael Jouravlev: “发布后重定向”, theserverside.com,2004年8月1日。
[ 55 ] Jerome H. Saltzer, David P. Reed, and David D. Clark: “ End-to-End Arguments in System Design ,” ACM Transactions on Computer Systems , volume 2, number 4, pages 277–288, November 1984. doi:10.1145/357401.357402
[55] Jerome H. Saltzer, David P. Reed和David D. Clark:“系统设计中的端到端论据”,ACM计算机系统交易,卷2,编号4,页277-288,1984年11月。doi:10.1145/357401.357402。
[ 56 ] Peter Bailis, Alan Fekete, Michael J. Franklin, et al.: “ Coordination-Avoiding Database Systems ,” Proceedings of the VLDB Endowment , volume 8, number 3, pages 185–196, November 2014.
“避免协调的数据库系统” VLDB论文集第8卷第3期,2014年11月,185-196页。 作者:Peter Bailis,Alan Fekete,Michael J. Franklin等
[ 57 ] Alex Yarmula: “ Strong Consistency in Manhattan ,” blog.twitter.com , March 17, 2016.
[57] Alex Yarmula:“曼哈顿的强一致性”,blog.twitter.com,2016年3月17日。
[ 58 ] Douglas B Terry, Marvin M Theimer, Karin Petersen, et al.: “ Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ,” at 15th ACM Symposium on Operating Systems Principles (SOSP), pages 172–182, December 1995. doi:10.1145/224056.224070
“Bayou:弱连接复制存储系统中更新冲突的管理”,作者:Douglas B Terry,Marvin M Theimer,Karin Petersen等,发表于1995年12月第15届ACM操作系统原理研讨会(SOSP),页码:172-182,doi:10.1145/224056.224070。
[ 59 ] Jim Gray: “ The Transaction Concept: Virtues and Limitations ,” at 7th International Conference on Very Large Data Bases (VLDB), September 1981.
[59] 吉姆·格雷:「交易概念:优点与局限」,发表于1981年9月的第七届国际大型数据库会议(VLDB)。
[ 60 ] Hector Garcia-Molina and Kenneth Salem: “ Sagas ,” at ACM International Conference on Management of Data (SIGMOD), May 1987. doi:10.1145/38713.38742
[60] 赫克托·加西亚-莫利纳和肯尼斯·萨勒姆:「Sagas,」于ACM数据管理国际会议(SIGMOD),1987年5月。doi:10.1145/38713.38742。
[ 61 ] Pat Helland: “ Memories, Guesses, and Apologies ,” blogs.msdn.com , May 15, 2007.
[61] Pat Helland:“记忆、猜想和道歉”,blogs.msdn.com,2007年5月15日。
[ 62 ] Yoongu Kim, Ross Daly, Jeremie Kim, et al.: “ Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors ,” at 41st Annual International Symposium on Computer Architecture (ISCA), June 2014. doi:10.1145/2678373.2665726
[62] Yoongu Kim,Ross Daly,Jeremie Kim等:“不访问内存即可翻转位:关于DRAM干扰误差的实验研究”,发表于2014年6月的第41届国际计算机体系结构研讨会(ISCA),doi:10.1145/2678373.2665726。
[ 63 ] Mark Seaborn and Thomas Dullien: “ Exploiting the DRAM Rowhammer Bug to Gain Kernel Privileges ,” googleprojectzero.blogspot.co.uk , March 9, 2015.
[63] Mark Seaborn和Thomas Dullien:「利用DRAM Rowhammer漏洞获取内核特权」,googleprojectzero.blogspot.co.uk,2015年3月9日。
[ 64 ] Jim N. Gray and Catharine van Ingen: “ Empirical Measurements of Disk Failure Rates and Error Rates ,” Microsoft Research, MSR-TR-2005-166, December 2005.
[64] Jim N. Gray 和 Catharine van Ingen:《磁盘故障率和错误率的经验测量》,微软研究,MSR-TR-2005-166,2005 年 12 月。
[ 65 ] Annamalai Gurusami and Daniel Price: “ Bug #73170: Duplicates in Unique Secondary Index Because of Fix of Bug#68021 ,” bugs.mysql.com , July 2014.
“Bug #73170: 唯一二级索引存在重复的问题 因为 Bug#68021 的修复而引起的”,bugs.mysql.com,2014年7月。
[ 66 ] Gary Fredericks: “ Postgres Serializability Bug ,” github.com , September 2015.
[66] Gary Fredericks: “Postgres串行化漏洞”, github.com,2015年9月。
[ 67 ] Xiao Chen: “ HDFS DataNode Scanners and Disk Checker Explained ,” blog.cloudera.com , December 20, 2016.
[67] 肖琛:“HDFS 数据节点扫描器和磁盘检查器解释”,blog.cloudera.com,2016 年 12 月 20 日。
[ 68 ] Jay Kreps: “ Getting Real About Distributed System Reliability ,” blog.empathybox.com , March 19, 2012.
"真正理解分布式系统可靠性",Jay Kreps,blog.empathybox.com,2012年3月19日。"
[ 69 ] Martin Fowler: “ The LMAX Architecture ,” martinfowler.com , July 12, 2011.
[69] Martin Fowler:“LMAX 架构”,martinfowler.com,2011年7月12日。
[ 70 ] Sam Stokes: “ Move Fast with Confidence ,” blog.samstokes.co.uk , July 11, 2016.
[70] Sam Stokes:“怀着信心快速前进”,blog.samstokes.co.uk,2016年7月11日。
[ 71 ] “ Sawtooth Lake Documentation ,” Intel Corporation, intelledger.github.io , 2016.
【71】“锯齿湖文档”,英特尔公司,intelledger.github.io,2016年。
[ 72 ] Richard Gendal Brown: “ Introducing R3 Corda™: A Distributed Ledger Designed for Financial Services ,” gendal.me , April 5, 2016.
"R3 Corda™:专为金融服务设计的分布式账本",理查德·根达尔·布朗(Richard Gendal Brown)于2016年4月5日在gendal.me上发表。"
[ 73 ] Trent McConaghy, Rodolphe Marques, Andreas Müller, et al.: “ BigchainDB: A Scalable Blockchain Database ,” bigchaindb.com , June 8, 2016.
[73] Trent McConaghy, Rodolphe Marques, Andreas Müller等: “BigchainDB:可扩展的区块链数据库”,bigchaindb.com,2016年6月8日。
[ 74 ] Ralph C. Merkle: “ A Digital Signature Based on a Conventional Encryption Function ,” at CRYPTO ’87 , August 1987. doi:10.1007/3-540-48184-2_32
"“基于常规加密功能的数字签名”,Ralph C. Merkle于1987年8月在CRYPTO '87上发表。doi:10.1007/3-540-48184-2_32"
[ 75 ] Ben Laurie: “ Certificate Transparency ,” ACM Queue , volume 12, number 8, pages 10-19, August 2014. doi:10.1145/2668152.2668154
[75] Ben Laurie:“证书透明性”,ACM Queue,第12卷第8期,10-19页,2014年8月。DOI:10.1145/2668152.2668154
[ 76 ] Mark D. Ryan: “ Enhanced Certificate Transparency and End-to-End Encrypted Mail ,” at Network and Distributed System Security Symposium (NDSS), February 2014. doi:10.14722/ndss.2014.23379
[76] Mark D. Ryan:「增強的證書透明度和端到端加密郵件」,於2014年2月的網絡和分佈式系統安全研討會(NDSS)中發表。doi:10.14722/ndss.2014.23379。
[ 77 ] “ Software Engineering Code of Ethics and Professional Practice ,” Association for Computing Machinery, acm.org , 1999.
「[77]“软件工程道德和职业实践准则”,计算机协会,acm.org,1999。」
[ 78 ] François Chollet: “ Software development is starting to involve important ethical choices ,” twitter.com , October 30, 2016.
"软件开发正开始涉及重要的伦理选择。"- François Chollet:“Software development is starting to involve important ethical choices,” twitter.com,2016年10月30日。
[ 79 ] Igor Perisic: “ Making Hard Choices: The Quest for Ethics in Machine Learning ,” engineering.linkedin.com , November 2016.
"79. Igor Perisic: "机器学习中的道德追求:做出艰难选择”,engineering.linkedin.com,2016年11月。"
[ 80 ] John Naughton: “ Algorithm Writers Need a Code of Conduct ,” theguardian.com , December 6, 2015.
【80】约翰·诺顿: “算法作者需要行为准则”,theguardian.com,2015年12月6日。
[ 81 ] Logan Kugler: “ What Happens When Big Data Blunders? ,” Communications of the ACM , volume 59, number 6, pages 15–16, June 2016. doi:10.1145/2911975
"大数据失误会带来什么后果?",刊于ACM交流杂志,2016年6月,第59卷,第6期,第15-16页,作者为Logan Kugler。doi:10.1145/2911975。"
[ 82 ] Bill Davidow: “ Welcome to Algorithmic Prison ,” theatlantic.com , February 20, 2014.
[82] 比尔·戴维多夫: “欢迎来到算法监狱”,theatlantic.com,2014年2月20日。
[ 83 ] Don Peck: “ They’re Watching You at Work ,” theatlantic.com , December 2013.
[83] 唐·佩克:《他们在观察你的工作》,theatlantic.com,2013年12月。
[ 84 ] Leigh Alexander: “ Is an Algorithm Any Less Racist Than a Human? ” theguardian.com , August 3, 2016.
“算法比人类更不具种族主义吗?” - 来自theguardian.com的Leigh Alexander,2016年8月3日。
[ 85 ] Jesse Emspak: “ How a Machine Learns Prejudice ,” scientificamerican.com , December 29, 2016.
"机器如何学习偏见", 2016年12月29日,科学美国人网站,Jesse Emspak,编号85。
[ 86 ] Maciej Cegłowski: “ The Moral Economy of Tech ,” idlewords.com , June 2016.
[86] Maciej Cegłowski: “技术的道德经济”, idlewords.com,2016年6月。
[ 87 ] Cathy O’Neil: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy . Crown Publishing, 2016. ISBN: 978-0-553-41881-1
[87] 凯西·奥尼尔: 《数学武器:大数据如何加剧不平等,威胁民主》。皇冠出版社,2016年。 ISBN:978-0-553-41881-1。
[ 88 ] Julia Angwin: “ Make Algorithms Accountable ,” nytimes.com , August 1, 2016.
"让算法负责任",朱莉娅·安格温(Julia Angwin),nytimes.com,2016年8月1日。
[ 89 ] Bryce Goodman and Seth Flaxman: “ European Union Regulations on Algorithmic Decision-Making and a ‘Right to Explanation’ ,” arXiv:1606.08813 , August 31, 2016.
[89] Bryce Goodman和Seth Flaxman:“欧洲联盟关于算法决策和‘解释权’的规定”,arXiv:1606.08813,2016年8月31日。
[ 90 ] “ A Review of the Data Broker Industry: Collection, Use, and Sale of Consumer Data for Marketing Purposes ,” Staff Report, United States Senate Committee on Commerce, Science, and Transportation , commerce.senate.gov , December 2013.
90.《数据经纪业的回顾:消费者数据的收集、使用和销售用于营销目的》,参议院商务、科学和运输委员会工作报告,commerce.senate.gov,2013年12月。
[ 91 ] Olivia Solon: “ Facebook’s Failure: Did Fake News and Polarized Politics Get Trump Elected? ” theguardian.com , November 10, 2016.
"Facebook的失敗:虛假新聞和極端化政治是否使特朗普當選?",出自Olivia Solon的文章,日期為2016年11月10日,刊登於theguardian.com網站。"
[ 92 ] Donella H. Meadows and Diana Wright: Thinking in Systems: A Primer . Chelsea Green Publishing, 2008. ISBN: 978-1-603-58055-7
[92] 唐妮拉•H•梅多斯 和戴安娜•怀特:《系统思维入门》。切尔西格林出版社,2008年。ISBN: 978-1-603-58055-7。
[ 93 ] Daniel J. Bernstein: “ Listening to a ‘big data’/‘data science’ talk ,” twitter.com , May 12, 2015.
[93] 丹尼尔·J·伯恩斯坦:“聆听一场‘大数据/数据科学’演讲” ,twitter.com,2015年5月12日。
[ 94 ] Marc Andreessen: “ Why Software Is Eating the World ,” The Wall Street Journal , 20 August 2011.
[94] 马克·安德里森: “为什么软件正在吞噬世界”,《华尔街日报》,2011年8月20日。
[ 95 ] J. M. Porup: “ ‘Internet of Things’ Security Is Hilariously Broken and Getting Worse ,” arstechnica.com , January 23, 2016.
[95] J. M. Porup: “‘物联网’安全已经显然破裂并且越来越糟糕了”,arstechnica.com,2016年1月23日。
[ 96 ] Bruce Schneier: Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World . W. W. Norton, 2015. ISBN: 978-0-393-35217-7
[96] 布鲁斯·施奈尔(Bruce Schneier):《数据与巨人:收集你的数据和控制你的世界的隐藏战斗》。W·W·诺顿,2015年。ISBN:978-0-393-35217-7。
[ 97 ] The Grugq: “ Nothing to Hide ,” grugq.tumblr.com , April 15, 2016.
[97] The Grugq: “没有什么可以隐藏的”, grugq.tumblr.com,2016年4月15日。
[ 98 ] Tony Beltramelli: “ Deep-Spying: Spying Using Smartwatch and Deep Learning ,” Masters Thesis, IT University of Copenhagen, December 2015. Available at arxiv.org/abs/1512.05616
[98] Tony Beltramelli:“Deep-Spying:使用智能手表和深度学习进行间谍活动”,哥本哈根IT大学硕士论文,2015年12月。可以在arxiv.org/abs/1512.05616查阅。
[ 99 ] Shoshana Zuboff: “ Big Other: Surveillance Capitalism and the Prospects of an Information Civilization ,” Journal of Information Technology , volume 30, number 1, pages 75–89, April 2015. doi:10.1057/jit.2015.5
[99] Shoshana Zuboff:“大他者:监视资本主义和信息文明的前景”,《信息技术》杂志,第30卷,第1期,第75-89页,2015年4月。doi:10.1057/jit.2015.5
[ 100 ] Carina C. Zona: “ Consequences of an Insightful Algorithm ,” at GOTO Berlin , November 2016.
[100] Carina C. Zona:“一种精辟算法的后果”,于2016年11月在GOTO Berlin发表演讲。
[ 101 ] Bruce Schneier: “ Data Is a Toxic Asset, So Why Not Throw It Out? ,” schneier.com , March 1, 2016.
【101】布鲁斯·舒内尔:「数据是一种有毒资产,为何不将其扔掉?」,schneier.com,2016年3月1日。 数据是一种有毒资产,为何不将其扔掉?
[ 102 ] John E. Dunn: “ The UK’s 15 Most Infamous Data Breaches ,” techworld.com , November 18, 2016.
[102] 约翰·E·邓恩:“英国15起最臭名昭著的数据泄露事件”,techworld.com,2016年11月18日。
[ 103 ] Cory Scott: “ Data is not toxic - which implies no benefit - but rather hazardous material, where we must balance need vs. want ,” twitter.com , March 6, 2016.
“数据并不是有害物质——这意味着没有好处——而是危险品,我们必须平衡需求与欲望。” - Cory Scott, 推特,2016年3月6日。
[ 104 ] Bruce Schneier: “ Mission Creep: When Everything Is Terrorism ,” schneier.com , July 16, 2013.
"任务蔓延:当一切皆为恐怖主义",作者Bruce Schneier,网址schneier.com,发表于2013年7月16日。
[ 105 ] Lena Ulbricht and Maximilian von Grafenstein: “ Big Data: Big Power Shifts? ,” Internet Policy Review , volume 5, number 1, March 2016. doi:10.14763/2016.1.406
【105】Lena Ulbricht和Maximilian von Grafenstein:“大数据:权力转移?”,《互联网政策评论》第5卷,第1期,2016年3月。doi:10.14763 / 2016.1.406
[ 106 ] Ellen P. Goodman and Julia Powles: “ Facebook and Google: Most Powerful and Secretive Empires We’ve Ever Known ,” theguardian.com , September 28, 2016.
[106] Ellen P. Goodman和Julia Powles:“Facebook和Google:我们所知道的最强大和最神秘的帝国”,theguardian.com,2016年9月28日。
[ 107 ] Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data , Official Journal of the European Communities No. L 281/31, eur-lex.europa.eu , November 1995.
[107] 《关于个人数据处理保护和数据自由流动的规定》,欧盟公报第 L 281/31 号,eur-lex.europa.eu,1995 年 11 月。
[ 108 ] Brendan Van Alsenoy: “ Regulating Data Protection: The Allocation of Responsibility and Risk Among Actors Involved in Personal Data Processing ,” Thesis, KU Leuven Centre for IT and IP Law, August 2016.
[108] Brendan Van Alsenoy:“调节数据保护:个人数据处理中相关参与者的责任和风险分配”,论文,卢汶大学信息技术与知识产权法律中心,2016年8月。
[ 109 ] Michiel Rhoen: “ Beyond Consent: Improving Data Protection Through Consumer Protection Law ,” Internet Policy Review , volume 5, number 1, March 2016. doi:10.14763/2016.1.404
[109] Michiel Rhoen:“通过消费者保护法改进数据保护”,《互联网政策评论》第5卷,第1期,2016年3月。doi:10.14763/2016.1.404
[ 110 ] Jessica Leber: “ Your Data Footprint Is Affecting Your Life in Ways You Can’t Even Imagine ,” fastcoexist.com , March 15, 2016.
"[110] Jessica Leber: "你不曾意识到的数据足迹影响了你的生活", fastcoexist.com, 2016年3月15日。"
[ 111 ] Maciej Cegłowski: “ Haunted by Data ,” idlewords.com , October 2015.
"麦克杰·塞格洛夫斯基(Maciej Cegłowski):“数据的幽灵”,idlewords.com,2015年10月。"
[ 112 ] Sam Thielman: “ You Are Not What You Read: Librarians Purge User Data to Protect Privacy ,” theguardian.com , January 13, 2016.
[112] 萨姆·席尔曼: "你不是你所读的书:图书馆管理员清除用户数据以保护隐私," theguardian.com,2016年1月13日。
[ 113 ] Conor Friedersdorf: “ Edward Snowden’s Other Motive for Leaking ,” theatlantic.com , May 13, 2014.
[113] 康纳·弗里德斯多夫(Conor Friedersdorf):“爱德华·斯诺登泄露的另一个动机”,theatlantic.com,2014年5月13日。
[ 114 ] Phillip Rogaway: “ The Moral Character of Cryptographic Work ,” Cryptology ePrint 2015/1162, December 2015.
[114] 菲利普·罗盖韦: 《密码学工作的道德性格》,密码学电子印刷品2015/1162号,2015年12月。