
Fixing Random, part 40 of 40 - azhenley
https://ericlippert.com/2019/07/26/fixing-random-part-40-of-40/
======
inglor
Thanks for this article!

> the connection between await as an operator and sampleas an operator is
> deeper than you might think.

Can anyone explain this to me?

Await plays the part of "specialized haskell like `do`" with tasks being a
continuation monad (and co-monad). `sample` does the same playing the same
role of "do" \- where is the "deeper connection"?

~~~
Faark
> Can anyone explain this to me?

Awaits / yields produce state machines. See [0] for an example of how that
works behind the scene (or just decompile your stuff with ILSpy/dotPeek), but
basically the code between yields / awaits is split into separate blocks in a
new object, all variables used across multiple blocks are stored in that
object along side of some state indicating where to continue on next call.

Lets pretend we only use immutable data/vars for now. We could just reset the
state machines position to some earlier state and re-run it. Only with our
await-like sample operator returning a different value this time. Add another
operator and this would make it very easy to implement backtracking while re-
using a lot of existing behind the scene infrastructure. But personally I'm
not sure the result is that much easier readable than a few nested for loops.
It might be, for some complicated code spread across a larger code base. But
adding a new language operator is a quite high price to pay...

[0] [https://mycodingplace.wordpress.com/2017/07/19/csharp-
behind...](https://mycodingplace.wordpress.com/2017/07/19/csharp-behind-the-
scenes-yield/)

------
codetrotter
> The abstraction that we’re missing is to make “value drawn from a random
> distribution” a part of the type system, the same way that “function that
> returns a value”, or “sequence of values”, or “value that might be null” is
> part of the type system.

In a similar vain to this, I've been wishing that I could express the
difference between "value readable by client" and "value internal to server"
in the type system.

Currently I manage this aspect manually in my code by having structs for the
internal representation of data, intermediate structs for data returned by the
methods of the impls of the internal structs, and structs declared in the
module for the JSON API endpoint.

Here are some snippets of code in Rust to illustrate what I mean:

Internal to the server I have a representation of user accounts like such:

    
    
      pub struct Accounts
      {
        user_ids:                       Vec<UserId>,
        bcrypt_password_hashes:         Vec<BcryptPasswordHash>,
        ts_accounts_created:            Vec<TsAccountCreated>,
        ts_account_data_last_modified:  Vec<TsAccountDataLastModified>,
        // [...]
      }
    

And then I have this intermediate internal struct:

    
    
      pub struct ClientReadablePrivateAccountData
      {
        // user id is excluded
        // password hash is excluded
        pub ts_account_created: TsAccountCreated,
        pub ts_account_data_last_modified: TsAccountDataLastModified,
        // [...]
      }
    

A user-agent acting on behalf of an authenticated user can retrieve the
ClientReadablePrivateAccountData representation of their account through our
JSON API provided that they submit a bearer token with
"read_private_account_data" scope along with the request. (The value of the
bearer token itself, by the way, is a 144 bit random value encoded with
Base64. The scope and user id mapping of bearer token values is kept server-
side at all times.)

The ClientReadablePrivateAccountData representation includes all the data for
the account in question except the internal UserId and BcryptPasswordHash
data.

When a client requests this data from the /account/me.json path, the request
is handled in three steps (in terms of what my code does – of course there are
many more steps if you look at the entirety of what happens from when the
TCP/IP packets reach our server, and until the framework that I am using hands
data to my code). Common to all requests is the first two steps which are two
pieces of middleware that I have written.

The first piece of middleware implements code that makes it possible to put
the server in read-only mode, in which new requests are allowed through only
if they are of the GET or HEAD method. For any other request method (POST,
PUT, PATCH, etc.), the client will get a 503 Service Unavailable response with
a JSON object telling it the reason for this ("Service is undergoing
maintenance. Please try again in a few minutes."). This allows us to complete
handling of the side-effects-having requests (POST, PUT, PATCH, etc.) before
restarting the server while minimizing the impact of the service interruption.

The second piece of middleware looks for an Authorization header. If the
Authorization header is not provided, the identity of the user is
Identity::Anonymous. If the Authorization header is provided then it is
decoded resulting in either an error (either malformed data, or
invalid/expired bearer token) or Identity::Authenticated(authenticated_uid).

Identity enum:

    
    
      pub enum Identity
      {
        Anonymous,
        Authenticated
        {
          authenticated_uid: UserId,
        }
      }
    

The authenticated_uid UserId is retrieved by the middleware from a hash map
that maps each of the non-expired bearer tokens to the corresponding internal
UserId.

The middleware attaches a Result<Identity, MiddlewareForwardedError> to the
request and then calls the next step in the request handling chain.

The third step is the request handler for the path and method. In this case,
the GET request handler for /account/me.json. Each request handler has one of
three possible procedural macros relating to authorization attached to them,
so that at compile-time the function bodies of all request handlers are
wrapped with code that extracts the identity result set by the middleware and
then matches on them to either:

1\. Immediately reject the request if the result was an error, returning the
appropriate HTTP status code (400 Bad Request for malformed data, 401
Unauthorized for well-formed but invalid or expired token) and a JSON encoded
message indicating the reason for the error. In the case of 401, a www-
authenticate header with value "Bearer" followed by required scope is included
in the response.

2\. If the result was identity, what happens next depends on which of the
three procedural macros are being used (in other words, the three of them
insert different code corresponding to how the identity is to be used):

2a. Allow Anonymous access only. Identity::Anonymous matches to execute the
wrapped function body. Identity::Authenticated(_) matches to return 400 Bad
Request with a JSON encoded message indicating the reason for the error.

2b. Allow Authenticated access only. Identity::Anonymous matches to return 400
Bad Request with a JSON encoded message indicating the reason for the error.
Identity::Authenticated(authenticated_uid) matches to execute the wrapped
function body.

2c. Allow access both to Anonymous and to Authenticated. The wrapped function
body is executed.

Function bodies wrapped 2b make use of the authenticated_uid value when
performing their work. In the case of the request handler I am talking about
here it passes the authenticated_uid as argument to a method of the impl of
the internal struct Accounts, fn get_private_account_data(&self, uid: UserId)
-> ClientReadablePrivateAccountData, and the data returned from that method is
copied over into another struct by the same name, but which is declared
separately in the module of the JSON API endpoint.

JSON API endpoint declaration and impl of ClientReadablePrivateAccountData:

    
    
      #[derive(Serialize)]
      pub struct ClientReadablePrivateAccountData
      {
        pub ts_account_created: TsAccountCreated,
        pub ts_account_data_last_modified: TsAccountDataLastModified,
        // [...]
      }
    
      impl ClientReadablePrivateAccountData
      {
        pub fn from_internal_representation (
          private_account_data_internal: internal_representation::accounts::ClientReadablePrivateAccountData) -> Self
        {
          Self
          {
            ts_account_created:             private_account_data_internal.ts_account_created,
            ts_account_data_last_modified:  private_account_data_internal.ts_account_data_last_modified,
            // [...]
          }
        }
      }
    

In this manner, data returned by the JSON API endpoint is by convention
declared in the JSON API endpoint module, and the impl of the struct
transforms data from the internal intermediate struct of the same name and of
same fields, where the internal intermediate struct is in turn returned from a
public method of the Accounts struct that is holding the internal
representation of all data relating to accounts.

The fields of the internal struct are private. Still though, some internal
data, such as the internal UserId, is accessed by the middleware and the
request handlers, so a bit of attention is needed at all times to ensure that
we keep the internal data internal to the server and never end up sending it
to the client.

In conclusion, keeping internal data internal is a matter of manual book-
keeping and code review. If the property "value internal to server" could be
encoded by means of the type system, and the type system could somehow require
that any data built from "value internal to server" data would also need to be
"value internal to server", then mistakenly including "value internal to
server" data in a response would become a compile-time error if we could
express that "value internal to server" data could not be returned from any
function within a given module, thus decreasing the risk that this could
happen. This would be highly desirable.

