This is one of my favorite features/roles in OpsMgr 2007. When I refer to the Gateway Server I am referring to role in OpsMgr which can be used to monitor un-trusted domains/DMZs and not the spotted cow company that builds servers. If you are not leveraging the Gateway Server role then here are my 10 reasons why you should.
1) In our performance tests we have noticed that the data sent from a gateway server to a management server is compressed by almost 50 percent. So if a company that has a couple of hundred agents on a remote site with low network bandwidth it is best practice to install a gateway on the remote site have all the agents report to that gateway server.
2) The supported number of agents reporting to a gateway server has increased from 200 in RTM to 800 in Service Pack 1.
3) In MOM 2005 if you had un-trusted domains and you did not want to turn mutual authentication off then you had to install a new management group in the un-trusted domain. With the gateway server and using certificates you can monitor agents in un-trusted domains and DMZs.
4) If there is a firewall between the servers you only need to open one port (5723) for communication between the gateway server and management server. Instead of having all the agents communicating to the management server you can have just a single channel of communication across over network.
5) Gateway Servers are good virtualization candidates because they have a small foot print compared to a management server. For the Gateway Server I/O write activity is usually high since queue data is persisted to disk, so users need to plan for this.
6) You can setup multiple gateway servers and configure them as failover servers for agents. So customers have a good HA /DR solution.
7) Gateway Servers can use Certificate Authority or any 3rd party certificates as far as they follow the PKCS format.
8) Gateway servers can report to a clustered Root Management Server. The key for using with the clustered RMS is to create the cert using the cluster's virtual server name. But note if you have a gateway reporting to an RMS then should not have other agents reporting to the RMS because in a scenario where there may be on outage when the systems come back online the RMS or MS will poll from the gateway the same way as it would an agent without recognizing that the gateway has more data backed up in its queue than an agent.
9) A Management Server license is required for each instance of the gateway server role. I know this is not really a reason but I thought I would call it out.
10) The Audit Collector can be installed on a Gateway Server but it will still need to report directly to the Audit Database.
Marc a Dev lead on our team had sent an email with the technical details for the first reason to use a Gateway Server. Below is the explanation of why we scale better in SP1 and are able to compress data a lot better as well.
The overall logical amount of data sent across the WAN is very similar between using a gateway or having the agents directly report to the collection management servers. When using a gateway it handles heartbeat processing so we don't need to send those across the WAN but that should be a small amount of the overall traffic. The big difference appears to be in increased compression efficiency by having all the data being routed through one node. Compression algorithms do not have very good compression rates when very small blocks of data are compressed. In one test (this was over some static xml text and isn't likely to be reflective of actual compression rates in the channel) on compression efficiency when using 128 byte blocks we saw a 38% reduction in size while when using 4096 bytes blocks over the same data we saw a 81% reduction in size. When not using a gateway each agent is compressing its own small amount of data it is sending up. When using a gateway all the data is routed through one node so we have much bigger blocks of data we are compressing at once. Since the bandwidth savings are due to increased compression efficiencies the amount of savings will go up with the number of agents. For a small number of agents I would expect to see a very minimal reduction in bandwidth utilization. Our numbers were based on a test pass having hundreds of agents behind a gateway.
We weren't expecting the difference in compression rates to be this large which is where the original guidance in the perf and scale document comes from. During one of our scale tests late in SP1 we took a look at the relative network usage and were pleasantly surprised to see the reduction in bandwidth utilization. The bandwidth reduction should be the same between SP1 and RTM. However in RTM you could only run 200 agents behind a gateway which would limit the savings.